您的位置:首页 > 理论基础 > 计算机网络

Spark学习笔记 --- SparkStreaming 实现对 TCP 数据源处理

2017-03-30 15:58 441 查看
package demo1

import org.apache.spark._
import org.apache.spark.streaming._
//import org.apache.spark.streaming.StreamingContext._  (spark1.3 upper is not necessary)

/*
Using this context, we can create a DStream that represents streaming data from a TCP source,
specified  as hostname (eg.localhost) and port.This lines DStream represents the stream of data
that will be received from the data server. Each record in this DStream is a line of text.
Next, we want to split the lines by space characters into words.
flatMap is a one-to-many DStream operation that creates a new DStream by generating multiple
new records from each record in the source DStream. In this case, each line will be split into
multiple words and the stream of words is represented as the words DStream. Next, we want to
count these words.
*/
object SparkStreaming {
def main(args: Array[String]): Unit = {
//Create a local StreamingContext with two working thread and batch interval of 1 second.
//The mast requires 2 cores to prevent from a starvation scenario.
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
//Create a DStream that will connect to hostname:port. like localhost:9999
val lines = ssc.socketTextStream("localhost", 9999)
//Split each line into words
val words = lines.flatMap(_.split(" "))
//Count every word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_+_)
//Print the first ten element of each RDD generated in this DStream to the console
wordCounts.print()
//Start the computation
ssc.start()
//Wait for the computation to terminate
ssc.awaitTermination()
}
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: