实时数据分析Real-time data analysis frameworks (or stream system)
2011-12-17 01:41
567 查看
最近的工作中涉及要设计一个系统可以实时的监控系统的状态,比如hadoop任务的执行情况,服务器的健康等。这个系统需要实时的处理对象产生的信息,并发送给用户。
这个系统显然需要具备如下特性:
可靠性
大数据处理
实时性
显然这将是一个基于Hadoop上的项目,目前可供参考的有
Kafka: Kafka is a messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn’s activity stream processing pipeline.
Nice talk
S4: S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams
of data.
Hedwig: Hedwig is a
publish-subscribe system designed to carry large amounts of data across the internet in a
guaranteed-delivery fashion from those who produce it (publishers) to those who are interested in it (subscribers).
Storm: Storm is a distributed, reliable, and fault-tolerant stream processing system. Its use cases are so broad that we consider it to be a fundamental new primitive for data
processing.
Introduction slide
Flume: Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications
to Apache Hadoop’s HDFS.
Scribe: Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures.
随着项目的跟进,我会继续更新。
这个系统显然需要具备如下特性:
可靠性
大数据处理
实时性
显然这将是一个基于Hadoop上的项目,目前可供参考的有
Kafka: Kafka is a messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn’s activity stream processing pipeline.
Nice talk
S4: S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams
of data.
Hedwig: Hedwig is a
publish-subscribe system designed to carry large amounts of data across the internet in a
guaranteed-delivery fashion from those who produce it (publishers) to those who are interested in it (subscribers).
Storm: Storm is a distributed, reliable, and fault-tolerant stream processing system. Its use cases are so broad that we consider it to be a fundamental new primitive for data
processing.
Introduction slide
Flume: Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications
to Apache Hadoop’s HDFS.
Scribe: Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures.
随着项目的跟进,我会继续更新。
相关文章推荐
- MPAndroidChart 教程:动态和实时数据 Dynamic & Realtime Data(八)
- MPAndroidChart 教程:动态和实时数据 Dynamic & Realtime Data(八)
- Logstash + DataHub + MaxCompute/StreamCompute 进行实时数据分析
- Logstash + DataHub + MaxCompute/StreamCompute 进行实时数据分析
- Logstash + DataHub + MaxCompute/StreamCompute 进行实时数据分析
- 采用Excel RTD(Excel Real-Time Data)技术实时刷新Excel单元格的数据
- 【Python-ML】探索式数据分析EDA(Exploratory Data Analysis)
- 微阵列数据分析(Microarray Data Analysis)
- Data-Intensive Systems:Real-time Stream Processing
- DataTorrent 将数据分析速度从“实时”提升至“现在时”
- 基于akka和data-sketch技术的实时数据流分析服务
- R Exploratory Data Analysis探索性数据分析基础部分
- 大数据数据分析云模型 Big Data Analysis Cloud(MapReduce) Model
- RTSP(Real Time Stream Protocol,实时流协议)
- RealTimeArrayCollection---flex实时获取数据并以LInechart的形式显示
- 微阵列数据分析(Microarray Data Analysis)
- Easy, Real-Time Big Data Analysis Using Storm
- 如何用微软StreamInsight 处理和分析实时数据
- Segger SystemView: Realtime Analysis and Visualization for FreeRTOS
- Real-time model scoring for streaming data – a prototype based on Oracle Stream Explorer and Oracle