Flume 操作示例
一、案例1之 Spool
Spool 监测配置的目录下新增的文件,并将文件中的数据读取出来。需要注意两点:
-
拷贝到 spool 目录下的文件不可以再打开编辑。
-
spool 目录下不可包含相应的子目录。
配置文件 jobs/spool.conf
a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /opt/apache-flume-1.6.0-bin/logs a1.sources.r1.fileHeader = true a1.sources.r1.channels = c1 a1.sinks.k1.type = logger a1.sinks.k1.channel = c1
启动命令
bin/flume-ng agent \
-c conf \
-f jobs/spool.conf \
-n a1 \
-Dflume.root.logger=INFO,console
测试
$ echo "hello world" > logs/spool.log $ more logs/spool.log.COMPLETED hello world
二、案例2之 Exec
EXEC 执行一个给定的命令获得输出的源
配置文件 jobs/exec.conf
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.type = exec a1.sources.r1.command = tail -F /opt/apache-flume-1.6.0-bin/logs/log_exec_tail a1.sources.r1.channels = c1 a1.sinks.k1.type = logger a1.sinks.k1.channel = c1
测试
for i in {1..1000}
do
echo "exec tail$i" >> /opt/apache-flume-1.6.0-bin/logs/log_exec_tail
done
三、案例3之 JSONHanlder
从远程客户端接收数据
配置文件
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.type = org.apache.flume.source.http.HTTPSource a1.sources.r1.port = 8888 a1.sources.r1.channels = c1 a1.sinks.k1.type = logger a1.sinks.k1.channel = c1
测试
curl -X POST \
-d '[{
"headers" :{"a" : "a1","b" : "b1"},
"body" : "shiyanlou.org_body"
}]' http://localhost:8888
四、案例4之 Syslogtcp
把数据写入 HDFS
配置文件
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = syslogtcp a1.sources.r1.port = 4444 a1.sources.r1.host = localhost a1.sources.r1.channels = c1 a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = hdfs://localhost:9000/user/hadoop/syslogtcp a1.sinks.k1.hdfs.filePrefix = Syslog a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute a1.sinks.k1.channel = c1 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100
测试
$ echo "hello flume" | nc localhost 4444
检测输出
$ hadoop fs -lsr /user/hadoop
五、案例5之 File Roll Sink
写入稍微复杂的文件数据,把动态生成的时间戳和数据一同写入 HDFS。
配置文件
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = syslogtcp a1.sources.r1.port = 5556 a1.sources.r1.host = localhost a1.sinks.k1.type = file_roll a1.sinks.k1.sink.directory = /opt/apache-flume-1.6.0-bin/logs a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
测试
$ echo "Hello world!"|nc localhost 5556
查看 /opt/apache-flume-1.6.0-bin/logs 下是否生成文件,默认每 30 秒生成一个新文件。
$ ls -alh /opt/apache-flume-1.6.0-bin/logs/
233
浙公网安备 33010602011771号