Flume实现日志文件夹数据加载到HDFS
2017-11-21 16:13
218 查看
Flume是一种分布式,可靠和可用的服务,用于高效收集,聚合和移动大量日志数据。 它具有基于数据流的简单和可伸缩的架构。 它具有可靠性机制和故障切换和恢复机制的鲁棒性和容错能力。
vi corp_base_info.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir=/home/flume/testdata/test
a1.sources.r1.includePattern=^AUEIC.C_CONS([0-9a-zA-Z]|[._-])*$
a1.sources.r1.ignorePattern=^.*COMPLETED$
a1.sources.r1.inputCharset=UTF-8
a1.sources.r1.pollDelay=300000 #5分针采集一次
加粗的属性1.7以上才有
#Use a channel which buffers events in memory
a1.channels=c1
a1.channels.c1.capacity=1000000
a1.channels.c1.transactionCapacity=1000000
a1.channels.c1.type=memory
#Describe the sink
a1.sinks=k1
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.path=hdfs://mynameservice/apps/hive/warehouse/flume.db/corp_base_info/ymd=%Y%m%d
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.rollInterval=0
a1.sinks.k1.hdfs.rollSize=10240000
a1.sinks.k1.hdfs.idleTimeout=60
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.type=hdfs
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
vi corp_base_info.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir=/home/flume/testdata/test
a1.sources.r1.includePattern=^AUEIC.C_CONS([0-9a-zA-Z]|[._-])*$
a1.sources.r1.ignorePattern=^.*COMPLETED$
a1.sources.r1.inputCharset=UTF-8
a1.sources.r1.pollDelay=300000 #5分针采集一次
加粗的属性1.7以上才有
#Use a channel which buffers events in memory
a1.channels=c1
a1.channels.c1.capacity=1000000
a1.channels.c1.transactionCapacity=1000000
a1.channels.c1.type=memory
#Describe the sink
a1.sinks=k1
a1.sinks.k1.channel=c1
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.path=hdfs://mynameservice/apps/hive/warehouse/flume.db/corp_base_info/ymd=%Y%m%d
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.rollInterval=0
a1.sinks.k1.hdfs.rollSize=10240000
a1.sinks.k1.hdfs.idleTimeout=60
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.type=hdfs
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
相关文章推荐
- flume实现kafka到hdfs实时数据采集 - 有负载均衡策略
- 大数据可视化之Nginx服务器日志分析及可视化展示(Nginx+flume+HDFS+Spark+Highcharts)
- flume将log4j日志数据写入到hdfs
- flume学习(三):flume将log4j日志数据写入到hdfs
- flume按照日志时间写hdfs实现
- 使用Flume向HDFS持久化数据(日志)
- flume的导日志数据到hdfs
- flume学习(三):flume将log4j日志数据写入到hdfs
- flume学习(二):flume将log4j日志数据写入到hdfs
- FLume监控文件夹,将数据发送给Kafka以及HDFS的配置文件详解
- flume--03-flume读取web应用某个文件夹下日志到hdfs
- flume学习(五):flume将log4j日志数据写入到hdfs
- Spark中加载本地(或者hdfs)文件以及 spark使用SparkContext实例的textFile读取多个文件夹(嵌套)下的多个数据文件
- flume学习(五):flume将log4j日志数据写入到hdfs
- flume学习(三):flume将log4j日志数据写入到hdfs(转)
- 实现无刷新加载数据(asp+ajax)
- AndroidのLoaderManager管理Loader实现异步动态加载数据
- 使用ociuldr工具实现快速卸载和加载数据
- 通过Sqoop实现Mysql / Oracle 与HDFS / Hbase互导数据
- HDFS-RAID使用Erasure Code来实现HDFS的数据冗余