您的位置：首页 > 大数据 > Hadoop

flume采集数据到hdfs

2015-03-03 23:39 387 查看

说明：flume1.5，hadoop2.2

1、配置JAVA_HOME和HADOOP_HOME

说明：HADOOP_HOME用于获取flume操作hdfs所需的jar和配置文件，如果不配置，也可以手动拷贝jar包和配置文件

2、解压flume，执行bin目录下的flume-ng

flume-ng agent -f /master/env/fc/a4.conf -n a4 -c /master/env/flume/conf -Dflume.root.logger=INFO,console

命令解释:

agent run a Flume agent

global options:

–conf,-c use configs in directory、(配置文件的路径)

-Dproperty=value sets a Java system property value（java参数）

agent options:

–conf-file,-f specify a config file (required)（agent的启动配置文件）

–name,-n the name of this agent（agent的名称，必须和a4.conf中的agent名称一致）

a4.conf，以下是a4.conf

#定义agent名， source、channel、sink的名称
a4.sources = r1
a4.channels = c1
a4.sinks = k1

#具体定义source
a4.sources.r1.type = spooldir
a4.sources.r1.spoolDir = /home/hadoop/logs

#具体定义channel
a4.channels.c1.type = memory
a4.channels.c1.capacity = 10000
a4.channels.c1.transactionCapacity = 100

#定义拦截器，为消息添加时间戳
a4.sources.r1.interceptors = i1
a4.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

#具体定义sink
a4.sinks.k1.type = hdfs
a4.sinks.k1.hdfs.path = hdfs://ns1/flume/%Y%m%d
a4.sinks.k1.hdfs.filePrefix = events-
a4.sinks.k1.hdfs.fileType = DataStream
#不按照条数生成文件
a4.sinks.k1.hdfs.rollCount = 0
#HDFS上的文件达到128M时生成一个文件
a4.sinks.k1.hdfs.rollSize = 134217728
#HDFS上的文件达到60秒生成一个文件
a4.sinks.k1.hdfs.rollInterval = 60

#组装source、channel、sink
a4.sources.r1.channels = c1
a4.sinks.k1.channel = c1

具体使用方法，参考官方文档http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航