您的位置:首页 > 其它

Sector/Sphere:High Performance Distributed File System and Parallel Data Processing Engine

2011-10-24 22:49 369 查看
1. Overview

sector/sphere was created by Dr. Yunhong Gu in 2006 and it is now maintained by a group of open source developers, available from : http://sector.sourceforge.net/
sector : Distrubuted file system

sphere: parallel data processing framework

There is a test, in some cases,sector/sphere is about twice as fast as Hadoop

2. Sector

Sector system architecture:



the figure shows the overall architecture of the sector system, which consistsof three parts:

Security Server
: maintains user accounts, user passwd, file access infomation, ip addresses of the authorized slave nodes

Master:
maintains the metadata of the files stored in the syste, controls the running of all slave nodes, responds to users' requests

Slaves
: the nodes that store the files managed by the system and process the data upon the request of a sector client

The clients includes:

1. sector file system client api: access sector files in applications using the c++ api

2. sector system tools

3. FUSE: mount sector file system as a local directory

4. sphere programming api

A more detail figure:



Feature:

1. Compared to Hadoop, sector does not split user files into blocks, instead, every sector slice is stored as one single file in the native file system

2. Sector runs an independent security server, this design allows different security service providers to be deployed. In addition, multiple sector masters can user the same security service

3. Topology aware and application aware

4. uses UDP for message passing and UDT for transfer

Replication:

1. provide software level falut tolerance(no hardware RAID is required)

2. all files are replicated to a specific number by defalut

3. by default, replication is created on furthest node

UDT:

A high performance data transfer protocol designed for transferring large volumetric datasets over high speed wide area networks. Such settings are typically disadvantageous for the more common TCP
protocol.

UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. The new protocol can transfer data at a much higher speed than TCP does.

Limitations:

1. File size if limited by available space individual storage nodes

2. Users my need to split their datasets into proper sizes

3. Sector is designed to provide high throughput on large datases, rather than extreme low latency on small files

3. Sphere

Sphere is a parallel data processing engine integrated in Sector and it can be used to process data stored in Sector in parallel,

Sphere users a stream processing computing paradigm. A stream is an abstraction in sphere and it represents either a dataset or a part of a dataset(A sector dataset consists of one of more physical files)

This figure illustrates how sphere processes the segments in a stream.

SPE: Sphere Proccessing Engine



This figure illustrates the basic model that sphere supports. sphere also supports some extensions of this model, which occur quite frequently

1. Processing multiple input streams.

2. Shuffling input streams.

Interested guys
can refer to: “Sector and Sphere: The Design and Implementation of a High Performance Data Cloud”

4. References

Sector and Sphere: The Design and Implementation of a High Performance Data Cloud
http://sector.sourceforge.net/ http://en.wikipedia.org/wiki/Sector/Sphere http://dongxicheng.org/mapreduce/streaming-mapreduce-sphere/ http://en.wikipedia.org/wiki/UDP-based_Data_Transfer_Protocol http://udt.sourceforge.net/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐