您的位置:首页 > 大数据 > Hadoop

flume采集数据到hdfs

2015-03-03 23:39 387 查看
说明:flume1.5,hadoop2.2

1、配置JAVA_HOME和HADOOP_HOME

说明:HADOOP_HOME用于获取flume操作hdfs所需的jar和配置文件,如果不配置,也可以手动拷贝jar包和配置文件

2、解压flume,执行bin目录下的flume-ng

flume-ng agent -f /master/env/fc/a4.conf -n a4 -c /master/env/flume/conf -Dflume.root.logger=INFO,console


命令解释:

agent run a Flume agent

global options:

–conf,-c use configs in directory、(配置文件的路径)

-Dproperty=value sets a Java system property value(java参数)

agent options:

–conf-file,-f specify a config file (required)(agent的启动配置文件)

–name,-n the name of this agent(agent的名称,必须和a4.conf中的agent名称一致)

a4.conf,以下是a4.conf

#定义agent名, source、channel、sink的名称
a4.sources = r1
a4.channels = c1
a4.sinks = k1

#具体定义source
a4.sources.r1.type = spooldir
a4.sources.r1.spoolDir = /home/hadoop/logs

#具体定义channel
a4.channels.c1.type = memory
a4.channels.c1.capacity = 10000
a4.channels.c1.transactionCapacity = 100

#定义拦截器,为消息添加时间戳
a4.sources.r1.interceptors = i1
a4.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

#具体定义sink
a4.sinks.k1.type = hdfs
a4.sinks.k1.hdfs.path = hdfs://ns1/flume/%Y%m%d
a4.sinks.k1.hdfs.filePrefix = events-
a4.sinks.k1.hdfs.fileType = DataStream
#不按照条数生成文件
a4.sinks.k1.hdfs.rollCount = 0
#HDFS上的文件达到128M时生成一个文件
a4.sinks.k1.hdfs.rollSize = 134217728
#HDFS上的文件达到60秒生成一个文件
a4.sinks.k1.hdfs.rollInterval = 60

#组装source、channel、sink
a4.sources.r1.channels = c1
a4.sinks.k1.channel = c1


具体使用方法,参考官方文档http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: