SparkStreaming学习札记4-2020-2-15--SparkStreaming实时流处理项目实战
12-8 -通过定时调度工具每一分钟产生一批数据
1.在线工具
crontab -e
*/1 * * * * /hadoop/data/project/log_generator.sh
如果要取消用#注释掉
2.对接python日志产生器输出的日志到Flume
定义名字为streaming_project.conf
选型:access.log ==>控制台输出
exec
memory
logger
streaming_project.conf文件具体配置:
exec-memory-logger.sources = exec-source
exec-memory-logger.sinks = logger-sink
exec-memory-logger.channels = memory-channel
exec-memory-logger.sources.exec-source.type = exec
exec-memory-logger.sources.exec-source.command = tail -F /home/hadoop/data/project/logs/access.log
exec-memory-logger.sources.exec-source.shell = /bin/sh -c
exec-memory-logger.channels.memory-channel.type = memory
exec-memory-logger.sinks.logger-sink.type = logger
exec-memory-logger.sources.exec-source.channels = memory-channel
exec-memory-logger.sinks.logger-sink.channel = memory-channel
启动命令:
flume-ng agent --name exec-memory-logger --conf $FLUME_HOME/conf --conf-fi
le /home/hadoop/data/project/streaming_project.conf -Dflume.root.logger=INFO,console
3日志 == 》Kafka
(1)启动zk:
进入目录
cd /home/hadoop/app/zookeeper-3.4.5-cdh5.7.0/bin
启动命令
./zkServer.sh start
(2)启动Kafka Server:
进入目录:cd /home/hadoop/app/kafka_2.11-0.9.0.0/bin/
启动命令:./kafka-server-start.sh -daemon /home/hadoop/app/kafka_2.11-0.9.0.0/config/server.properties
修改flume配置文件使得Flume sink数据到Kafka,修改如下并以streaming_project2.conf命名
exec-memory-kafka.sources = exec-source
exec-memory-kafka.sinks = kafka-sink
exec-memory-kafka.channels = memory-channel
exec-memory-kafka.sources.exec-source.type = exec
exec-memory-kafka.sources.exec-source.command = tail -F /home/hadoop/data/project/logs/access.log
exec-memory-kafka.sources.exec-source.shell = /bin/sh -c
exec-memory-kafka.channels.memory-channel.type = memory
exec-memory-kafka.sinks.kafka-sink.type = org.apache.flume.sink.kafka.KafkaSink
exec-memory-kafka.sinks.kafka-sink.brokerList = hadoop000:9092
exec-memory-kafka.sinks.kafka-sink.topic = streamingtopic
exec-memory-kafka.sinks.kafka-sink.batchSize = 5
exec-memory-kafka.sinks.kafka-sink.requiredAcks = 1
exec-memory-kafka.sources.exec-source.channels = memory-channel
exec-memory-kafka.sinks.kafka-sink.channel = memory-channel
(3)开启Kafka消费者查看
kafka-console-consumer.sh --zookeeper hadoop000:2181 --topic streamingtopic
(4)启动flume
flume-ng agent --name exec-memory-kafka --conf $FLUME_HOME/conf --conf-file /home/hadoop/data/project/streaming_project2.conf -Dflume.root.logger=INFO,console
- 点赞
- 收藏
- 分享
- 文章举报
- Spark Streaming实时流处理项目实战笔记04
- Spark Streaming实时流处理项目实战
- Spark Streaming实时流处理项目实战
- Spark Streaming实时流处理项目实战笔记03
- 某某最新《 Spark Streaming实时流处理项目实战》
- Spark Streaming 实时流处理项目实战 笔记八
- 最新基于Flume+Kafka+Spark Streaming打造实时流处理项目实战
- Spark Streaming实时流处理项目实战笔记01
- Spark Streaming实时流处理项目实战笔记02
- spark streaming 实时流处理实战笔记五
- Spark 实战, 第 2 部分:使用 Kafka 和 Spark Streaming 构建实时数据处理系统
- Spark Streaming实时项目实战(Java版)
- Spark Streaming实时流处理实战笔记七
- 视频干货Spark企业级项目实战,源码深度剖析,实时流处理,机器学习,数据分析
- Spark Streaming实时流项目实战 笔记十
- Spark企业级项目实战,源码深度剖析,实时流处理,机器学习
- SparkStreaming项目实战系列——1.实时流概述
- spark streaming 实时流处理实战笔记五
- Spark 实战, 第 2 部分:使用 Kafka 和 Spark Streaming 构建实时数据处理系统
- Spark Streaming实时流处理笔记(4)—— 分布式消息队列Kafka