锘??xml version="1.0" encoding="utf-8" standalone="yes"?>亚洲国产综合AV在线观看,亚洲AV无码成人精品区日韩,亚洲天天在线日亚洲洲精http://www.tkk7.com/honeybee/category/30619.htmlzh-cnThu, 10 Apr 2008 03:58:35 GMTThu, 10 Apr 2008 03:58:35 GMT60Paper Learning: Data-Intensive Supercomputing: The case for DISChttp://www.tkk7.com/honeybee/articles/191770.htmlsunsunThu, 10 Apr 2008 02:21:00 GMThttp://www.tkk7.com/honeybee/articles/191770.htmlhttp://www.tkk7.com/honeybee/comments/191770.htmlhttp://www.tkk7.com/honeybee/articles/191770.html#Feedback0http://www.tkk7.com/honeybee/comments/commentRss/191770.htmlhttp://www.tkk7.com/honeybee/services/trackbacks/191770.htmlRecently, I have been studying something on DISC, the inspiration for which comes from Google's success that have been used to support search over the worldwide web. According to learning Data-Intensive Supercomputing: The case for DISC, maybe we can turn the idea of constructing a Google's infrastructure like system into reality, that is DISC.

DISC can be developed as a prototype system of Google's instructure, we can divide it into two types of partitions: one for application development, and the other for system research.
For the program development partitions, we can use available software, such as the open source code from the Hadoop project, to implement the file system and support for application programming.

For the systems research partitions, we can create our own design, studying the different kinds of design patterns. (e.g.: high-end hardware, low-cost component).


The paper Data-Intensive Supercomputing: The case for DISC gives me an entire impression of a new form of high-performance computing facility, and there are many other aspects that deeply attract me, I've taken notes on this paper as follows:



闃呰Paper錛?/span>

Data-Intensive Supercomputing: The case for DISC  

Randal E. Bryant  May 10, 2007 CMU-CS-07-128

 

Question錛?/span>How can university researchers demonstrate the credibility of their work without having comparable computing facilities available?

1 Background

Describe a new form of high-performance computing facility (Data-Intensive Super Computer) that places emphasis on data, rather than raw computation, as the core focus of the system.

The author inspiration for DISC: comes from the server infrastructures that have been developed to support search over the worldwide web.

This paper outlines the case for DISC as an important direction for large-scale computing systems.

1.1 Motivation

The common role in the computations:

Web search without language barriers. (No matter in which language they type the query)

Inferring biological function from genomic sequences.

Predicting and modeling the effects of earthquakes.

Discovering new astronomical phenomena from telescope imagery data.

Synthesizing realistic graphic animations.

Understanding the spatial and temporal patterns of brain behavior based on MRI data.


2 Data-Intensive Super Computing

Conventional (Current) supercomputers:

are evaluated largely on the number of arithmetic operations they can supply each second to the application programs.

Advantage: highly structured data requires large amounts of computation.

Disadvantage:

1. It creates misguided priorities in the way these machines are designed, programmed, and operated;

2. Disregarding the importance of incorporating computation-proximate, fast-access data storage, and at the same time creating machines that are very difficult to program effectively;

3. The range of computational styles is restricted by the system structure.

The key principles of DISC:

1.       Intrinsic, rather than extrinsic data.

2.       High-level programming models for expressing computations over the data.

3.       Interactive access.

4.       Scalable mechanisms to ensure high reliability and availability. (error detection and handling)



3 Comparison to Other Large-Scale Computer Systems

3.1 Current Supercomputers

3.2 Transaction Processing Systems

3.3 Grid Systems



4 Google: A DISC Case Study

1. The Google system actively maintains cached copies of every document it can find on the Internet.

The system constructs complex index structures, summarizing information about the documents in forms that enable rapid identification of the documents most relevant to a particular query.

When a user submits a query, the front end servers direct the query to one of the clusters, where several hundred processors work together to determine the best matching documents based on the index structures. The system then retrieves the documents from their cached locations, creates brief summaries of the documents, orders them with the most relevant documents first, and determines which sponsored links should be placed on the page.

2. The Google hardware design is based on a philosophy of using components that emphasize low cost and low power over raw speed and reliability. Google keeps the hardware as simple as possible.

They make extensive use of redundancy and software-based reliability.

These failed components are removed and replaced without turning the system off.

Google has significantly lower operating costs in terms of power consumption and human labor than do other data centers.

3. MapReduce, that supports powerful forms of computation performed in parallel over large amounts of data.

Two function: a map function that generates values and associated keys from each document, and a reduction function that describes how all the data matching each possible key should be combined.

MapReduce can be used to compute statistics about documents, to create the index structures used by the search engine, and to implement their PageRank algorithm for quantifying the relative importance of different web documents.

4. BigTable: a distributed data structures, provides capabilities similar to those seen in database systems.


5 Possible Usage Model

The DISC operations could include user-specified functions in the style of Google’s MapReduce programming framework. As with databases, different users will be given different authority over what operations can be performed and what modifications can be made.

 

6 Constructing a General-Purpose DISC System

The open source project Hadoop implements capabilities similar to the Google file system and support for MapReduce.

Constructing a General-Purpose DISC System錛?/span>

Hardware Design.

There are a wide range of choices;

We need to understand the tradeoffs between the different hardware configurations and how well the system performs on different applications.

Google has made a compelling case for sticking with low-end nodes for web search applications, and the Google approach requires much more complex system software to overcome the limited performance and reliability of the components. But it might not be the most cost-effective solution for a smaller operation when personnel costs are considered.

Programming Model.

1. One important software concept for scaling parallel computing beyond 100 or so processors is to incorporate error detection and recovery into the runtime system and to isolate programmers from both transient and permanent failures as much as possible.

Work on providing fault tolerance in a manner invisible to the application programmer started in the context of grid-style computing, but only with the advent of MapReduce and in recent work by Microsoft has it become recognized as an important capability for parallel systems.

2. We want programming models that dynamically adapt to the available resources and that perform well in a more asynchronous execution environment.

e.g.: Google’s implementation of MapReduce partitions a computation into a number of map and reduce tasks that are then scheduled dynamically onto a number of “worker” processors.

Resource Management.

Problem: how to manage the computing and storage resources of a DISC system.

We want it to be available in an interactive mode and yet able to handle very large-scale computing tasks.

Supporting Program Development.

Developing parallel programs is difficult, both in terms of correctness and to get good performance.

As a consequence, we must provide software development tools that allow correct programs to be written easily, while also enabling more detailed monitoring, analysis, and optimization of program performance.

System Software.

System software is required for a variety of tasks, including fault diagnosis and isolation, system resource control, and data migration and replication.

 

Google and its competitors provide an existence proof that DISC systems can be implemented using available technology. Some additional topics include:

How should the processors be designed for use in cluster machines?

How can we effectively support different scientific communities in their data management and applications?

Can we radically reduce the energy requirements for large-scale systems?

How do we build large-scale computing systems with an appropriate balance of performance and cost?

How can very large systems be constructed given the realities of component failures and repair times?

Can we support a mix of computationally intensive jobs with ones requiring interactive response?

How do we control access to the system while enabling sharing?

Can we deal with bad or unavailable data in a systematic way?

Can high performance systems be built from heterogenous components?


7 Turning Ideas into Reality

7.1 Developing a Prototype System

Operate two types of partitions: some for application development, focusing on gaining experience with the different programming techniques, and others for systems research, studying fundamental issues in system design.

For the program development partitions:

Use available software, such as the open source code from the Hadoop project, to implement the file system and support for application programming.

For the systems research partitions:

Create our own design, studying the different layers of hardware and system software required to get high performance and reliability. (e.g.: high-end hardware, low-cost component)

7.2 Jump Starting

Begin application development by renting much of the required computing infrastructure:

1. network-accessible storage: Simple Storage System (S3) service

2. computing cycles: Elastic Computing Cloud (EC2) service

(The current pricing for storage is $0.15 per gigabyte per day ($1,000 per terabyte per year), with addition costs for reading or writing the data. Computing cycles cost $0.10 per CPU hour ($877 per year) on a virtual Linux machine.)

Renting problems:

1. The performance of such a configuration is much less than that of a dedicated facility.

2. There is no way to ensure that the S3 data and the EC2 processors will be in close enough proximity to provide high speed access.

3. We would lose the opportunity to design, evaluate, and refine our own system.

7.3 Scaling Up


8 Conclusion

1. We believe that DISC systems could change the face of scientific research worldwide.

2. DISC will help realize the potential all these data such as the combination of sensors and networks to collect data, inexpensive disks to store data, and the benefits derived by analyzing data provides.

 



sun 2008-04-10 10:21 鍙戣〃璇勮
]]>
DISC(Data Intensive Super Computing 鏁版嵁瀵嗛泦鍨嬭秴綰ц綆?http://www.tkk7.com/honeybee/articles/190844.htmlsunsunFri, 04 Apr 2008 15:43:00 GMThttp://www.tkk7.com/honeybee/articles/190844.htmlhttp://www.tkk7.com/honeybee/comments/190844.htmlhttp://www.tkk7.com/honeybee/articles/190844.html#Feedback0http://www.tkk7.com/honeybee/comments/commentRss/190844.htmlhttp://www.tkk7.com/honeybee/services/trackbacks/190844.htmlData Intensive System(DIS)

System Challenges錛?/span>

Data distributed over many disks

Compute using many processors

Connected by gigabit Ethernet (or equivalent)

System Requirements:

Lots of disks

Lots of processors

Located in close proximity

System Comparison:

(i)                Data

Conventional  Supercomputers

DISC

Data stored in separate repository

No support for collection or management

Brought into system for computation

Time consuming

Limits interactivity

System collects and maintains data

Shared, active data set

Computation colocated with storage

Faster access

(ii)              Programing Models

Conventional  Supercomputers

DISC

Programs described at very low level

Specify detailed control of processing & communications

Rely on small number of software packages

Written by specialists

Limits classes of problems & solution methods

Application programs written in terms of high-level operations on data

Runtime system controls scheduling, load balancing, …

(iii)            Interaction

Conventional  Supercomputers

DISC

Main Machine: Batch Access

Priority is to conserve machine resources

User submits job with specific resource requirements

Run in batch mode when resources available

Offline Visualization

Move results to separate facility for interactive use

Interactive Access

Priority is to conserve human resources

User action can range from simple query to complex computation

System supports many simultaneous users

Requires flexible programming and runtime environment

(iv)             Reliability

Conventional  Supercomputers

DISC

“Brittle” Systems

Main recovery mechanism is to recompute from most recent checkpoint

Must bring down system for diagnosis, repair, or upgrades

Flexible Error Detection and Recovery

Runtime system detects and diagnoses errors

Selective use of redundancy and dynamic recomputation

Replace or upgrade components while system running

Requires flexible programming model & runtime environment

Comparing with Grid Computing:

Grid: Distribute Computing and Data

(i)                   Computation: Distribute problem across many machines

Generally only those with easy partitioning into independent subproblems

(ii)                 Data: Support shared access to large-scale data set

DISC: Centralize Computing and Data

(i)                   Enables more demanding computational tasks

(ii)                 Reduces time required to get data to machines

(iii)                Enables more flexible resource management

A Commercial DISC

Netezza Performance Server (NPS)

Designed for “data warehouse” applications

Heavy duty analysis of database

Data distributed over up to 500 Snippet Processing Units

Disk storage, dedicated processor, FPGA controller

User “programs” expressed in SQL

Constructing DISC

Hardware: Rent from Amazon

Elastic Compute Cloud (EC2)

Generic Linux cycles for $0.10 / hour ($877 / yr)

Simple Storage Service (S3)

Network-accessible storage for $0.15 / GB / month ($1800/TB/yr)

Software: utilize open source

Hadoop Project

Open source project providing file system and MapReduce

Supported and used by Yahoo

Implementing System Software

Programming Support

Abstractions for computation & data representation

E.g., Google: MapReduce & BigTable

Usage models

Runtime Support

Allocating processing and storage

Scheduling multiple users

Implementing programming model

Error Handling

Detecting errors

Dynamic recovery

Identifying failed components



sun 2008-04-04 23:43 鍙戣〃璇勮
]]>
主站蜘蛛池模板: 久久久青草青青亚洲国产免观 | 日韩国产欧美亚洲v片| 亚洲国产无线乱码在线观看| 色屁屁www影院免费观看视频| 国内精品一级毛片免费看| 在线a毛片免费视频观看| 亚洲综合无码AV一区二区| ASS亚洲熟妇毛茸茸PICS| 久久久久久毛片免费看| 成人au免费视频影院| 91在线亚洲综合在线| 青柠影视在线观看免费高清| 日韩一级在线播放免费观看| 99久久精品国产亚洲| 一级做受视频免费是看美女| 成人A级毛片免费观看AV网站| 亚洲精品午夜无码专区| 久久亚洲精品11p| 免费人成年轻人电影| 亚洲精品国产综合久久久久紧| 好男人看视频免费2019中文| 亚洲AV日韩AV无码污污网站| www国产亚洲精品久久久| 亚洲日本成本人观看| 永久免费视频网站在线观看| 国产AV无码专区亚洲Av| 1000部无遮挡拍拍拍免费视频观看| 亚洲国产美女福利直播秀一区二区| 免费国产99久久久香蕉| 亚洲成在人线电影天堂色| 夭天干天天做天天免费看| 一级特黄a免费大片| 久久精品国产亚洲av高清漫画 | 亚洲乱亚洲乱少妇无码| 亚洲av色香蕉一区二区三区蜜桃| www.亚洲色图.com| 一级毛片免费视频| 亚洲黄色在线观看| 午夜网站在线观看免费完整高清观看| 亚洲国产精品免费在线观看| 国产高清免费的视频|