您的位置:首页 > 其它

Kafka and Samza: Real-time stream processing

2016-03-23 11:00 363 查看
As we known, for big data analysis, we have those two already learned[1]: 



Batch Processing is map-reduce. And Iterative Processing is Spark. These two have one thing in common which is what they are processing is a fixed data. Once the processing job starts, you cannot change the input data at all. This gives some disadvantage
for real time data analysis.  

Now, for real time analysis, we introduce stream processing. Here is a concept of stream processing[1]: 



In our situation of Kafka + Samza, Samza is the processing framework. Kafka only is a source of organising stream as topics and messages. Now, let's take a look of the details.

 


Here is some concepts in Kafka:



Here are some basic concepts about Samza: 



NM = Node Manager; RM = Resource Manager.

Here is a typical job of Samza: 



In general, one task in Samza is one consumer in Kafka. One stream in the input streams is one partition of topic in kafka. 

Reference:

[1] 15619 Cloud Computing CMU
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息