spark文档学习1 Spark Streaming Programming Guide
2016-10-17 15:57
411 查看
一、 Overview
定义:Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of livedata streams.
工作原理:Spark Streaming receives live input data
streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches. 接受实时的输入数据,划分为batches,经过spark engine处理后产生最终的结果。
DStream is represented as a sequence of RDDs.
三、Basic
Concepts
3.1 Linking<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>2.0.1</version> </dependency>
For ingesting data from sources like Kafka, Flume, and Kinesis that are not present in the Spark Streaming core API, you will have to add the corresponding
artifact
spark-streaming-xyz_2.11to
the dependencies. For example, some of the common ones are as follows.
Source | Artifact |
---|---|
Kafka | spark-streaming-kafka-0-8_2.11 |
Flume | spark-streaming-flume_2.11 |
Kinesis | spark-streaming-kinesis-asl_2.11 [Amazon Software License] |
Points to remember
When running a Spark Streaming program locally, do not use “local” or “local[1]” as the master URL. Either of these means that only one thread will be used for running tasks locally. If you are using an input DStream based on a receiver (e.g. sockets, Kafka,
Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > number of receivers to run (see Spark
Properties for information on how to set the master).
Extending the logic to running on a cluster, the number of cores allocated to the Spark Streaming application must be more than the number of receivers. Otherwise the system will receive data, but not be able to process it.
3.3 Basic Sources
相关文章推荐
- Structured Streaming Programming Guide官方文档再次阅读理解强化学习
- Spark官方文档《Spark Programming Guide》解读
- 文档学习:Table View Programming Guide
- Apache Spark 2.2.0 中文文档 - GraphX Programming Guide | ApacheCN
- Spark学习之路 (二十三)SparkStreaming的官方文档
- Spark Streaming Programming Guide
- <<Spark Streaming Programming Guide>> - Part 1 综述
- Spark 2.1.0 -- Spark Streaming Programming Guide
- Apache Spark 2.2.0 中文文档 - GraphX Programming Guide | ApacheCN
- spark-streaming-[8]-Spark Streaming + Kafka Integration Guide0.8.2.1学习笔记
- Spark学习之路 (二十二)SparkStreaming的官方文档
- <<Spark Streaming Programming Guide>> - Part 3 转换操作
- spark streaming programming guide 基础概念之linking(三a)
- Apache Spark 2.2.0 中文文档 - GraphX Programming Guide | ApacheCN
- spark官方文档之——Spark Streaming Programming Guid spark streaming编程指南
- spark streaming programming guide 综述(一)
- spark streaming programming guide 基础概念之初始化Discretized Streams(DStream)(三c)
- Spark1.1.0 Spark Streaming Programming Guide
- spark第五篇:Spark Streaming Programming Guide
- Apache Spark 2.2.0 中文文档 - GraphX Programming Guide | ApacheCN