Filebeat+Kafka+Logstash+ElasticSearch+Kibana 日志采集方案

参考:https://www.cnblogs.com/willpan-z/p/10307967.html

filebeat 同步mysql慢日志到es
1、查看filebeat中mysql模块是否开启
filebeat modules list
使filebeat支持mysql模块
filebeat modules enable mysql

2、在配置文件/etc/filebeat/filebeat.yml启用输出到elasticsearch和kibana的功能。

output.elasticsearch:
# Array of hosts to connect to.
hosts: ["127.0.0.1:9200"]

# Optional protocol and basic auth credentials.
#protocol: "https"
#username: "elastic"
#password: "changeme"

 

FileBeat替换@timestamp的四种方法

参考:https://blog.csdn.net/zcy_wxy/article/details/116526053


问题:
1、解决Exception in thread “main“ joptsimple.UnrecognizedOptionException: zookeeper is not a recognized问题
在较新版本(2.2 及更高版本)的 Kafka 不再需要 ZooKeeper 连接字符串,即- -zookeeper localhost:2181。使用 Kafka Broker的 --bootstrap-server localhost:9092来替代- -zookeeper localhost:2181。

2、WARN [Consumer clientId=consumer-console-consumer-80772-1, groupId=console-consumer-80772] Bootstrap broker localhost:2181 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)

这是因为你的配置文件中的PLAINTEXT跟你请求的内容不同。举例来说,我在配置文件里配置的listeners=PLAINTEXT://10.127.96.151:9092,但是我想测试的时候请求的是./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic topic1 --from-beginning
正确的应该是./kafka-console-consumer.sh --bootstrap-server 10.127.96.151:9092 --topic topic1 --from-beginning

 

3、es分页查询默认10000,超出设置

PUT 238-apache-accesslog-20220211/_settings
{
"index":{
"max_result_window":1000000
}
}

Zookeeper单机安装配置

Step 1:下载压缩包并解压
1 >wget https://mirrors.cnnic.cn/apache/zookeeper/stable/apache-zookeeper-3.6.3-bin.tar.gz
2 >tar -zxvf apache-zookeeper-3.6.3-bin.tar.gz
3 >cd apache-zookeeper-3.6.3-bin
Step 2:修改配置文件
先复制模板配置文件,并重命名,然后里面存放数据的路径dataDir可以自己定义

1 >cp -rf conf/zoo_sample.cfg conf/zoo.cfg
2 >vim zoo.cfg

Step 3:启动服务
1 >./bin/zkServer.sh start
2
3 >./bin/zkServer.sh status


kafka相关
启动:
./bin/kafka-server-start.sh config/server.properties
后台启动:
nohup bin/kafka-server-start.sh config/server.properties 1>/usr/local/soft/kafka_2.13-3.0.0/logs/kafka.log 2>&1 &

kafka停止命令
./bin/kafka-server-stop.sh

kafka重启
systemctl restart kafka

创建主题:
./bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test

查看主题:
./bin/kafka-topics.sh --list --bootstrap-server localhost:9092

删除主题:
./bin/kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic test


查看主题详细信息
./bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test

 

Kafka Consumer重置Offset

更新Offset由三个维度决定:Topic的作用域、重置策略、执行方案。

Topic的作用域

–all-topics:为consumer group下所有topic的所有分区调整位移)
–topic t1 --topic t2:为指定的若干个topic的所有分区调整位移
–topic t1:0,1,2:为指定的topic分区调整位移

重置策略

–to-earliest:把位移调整到分区当前最小位移
–to-latest:把位移调整到分区当前最新位移
–to-current:把位移调整到分区当前位移
–to-offset <offset>: 把位移调整到指定位移处
–shift-by N: 把位移调整到当前位移 + N处,注意N可以是负数,表示向前移动
–to-datetime <datetime>:把位移调整到大于给定时间的最早位移处,datetime格式是yyyy-MM-ddTHH:mm:ss.xxx,比如2017-08-04T00:00:00.000
–by-duration <duration>:把位移调整到距离当前时间指定间隔的位移处,duration格式是PnDTnHnMnS,比如PT0H5M0S
–from-file <file>:从CSV文件中读取调整策略

确定执行方案

什么参数都不加:只是打印出位移调整方案,不具体执行
–execute:执行真正的位移调整
–export:把位移调整方案按照CSV格式打印,方便用户成csv文件,供后续直接使用

注意事项

consumer group状态必须是inactive的,即不能是处于正在工作中的状态
不加执行方案,默认是只做打印操作

常用示例

更新到当前group最初的offset位置
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test-group --reset-offsets --all-topics --to-earliest --execute

更新到指定的offset位置
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test-group --reset-offsets --all-topics --to-offset 500000 --execute

更新到当前offset位置(解决offset的异常)
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test-group --reset-offsets --all-topics --to-current --execute

offset位置按设置的值进行位移
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test-group --reset-offsets --all-topics --shift-by -100000 --execute

offset设置到指定时刻开始
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group test-group --reset-offsets --all-topics --to-datetime 2017-08-04T14:30:00.000

 

查看主题所有组

./bin/kafka-consumer-groups.sh --bootstrap-server  localhost:9092 --list

查看主题组消费数据

./bin/kafka-consumer-groups.sh --describe --bootstrap-server localhost:9092 --group test

查看主题消费数据

 ./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic test --time -1 --broker-list localhost:9092

创建发布者
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

创建订阅者
./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning


查看topic接收的消息:
./bin/kafka-console-consumer.sh --bootstrap-server 10.50.198.21:9092 --topic a05 --from-beginning

 

 

logstash从kafka读json格式日志输入es

参考:https://www.cnblogs.com/bixiaoyu/p/9638505.html

grok正则参考:https://www.cnblogs.com/chenjw-note/articles/10929682.html,https://www.cnblogs.com/Orgliny/p/5592186.html

试例:

https m.chinazz.cn - [12/Feb/2022:23:59:39 +0800] GET /hy/chinaaoxin/sell-232550.html HTTP/1.1 200 7418 - "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 116.179.32.17 0.027

 

grok正则规则

https://www.cnblogs.com/chenjw-note/articles/10929682.html


%{WORD:xforwardedproto} %{IPORHOST:host} (?:(-|%{IPORHOST:http_ali_cdn_real_ip})) \[%{HTTPDATE:timestamp}\] %{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion} %{NUMBER:status} %{NUMBER:response} (?:(%{URI:referer}|-)) "%{QS:agent}" %{IPV4:remote_ip} %{NUMBER:uptime}

 

input{
   kafka{
    bootstrap_servers => ["localhost:9092"]
        client_id => "test"
        group_id => "test"
        auto_offset_reset => "latest"
        consumer_threads => 5
        decorate_events => true
        topics => ["test"]
        codec => json {
         charset => "UTF-8"
        }
        type => "238-apache-accesslog"
  }
}


input{
   kafka{
    bootstrap_servers => ["localhost:9092"]
        client_id => "mysqlslowlogs"
        group_id => "mysqlslowlogs"
        auto_offset_reset => "latest"
        consumer_threads => 5
        decorate_events => true
        topics => ["mysqlslowlogs"]
        type => "238-mysqlslowlogs"
  }
}

input{
   kafka{
    bootstrap_servers => ["localhost:9092"]
        client_id => "accesslog"
        group_id => "accesslog"
        auto_offset_reset => "latest"
        consumer_threads => 5
        decorate_events => true
        topics => ["accesslog"]
        type => "accesslog"
  }
}

filter {
        if ([message]== "")
        {
            drop {}
        }

        if ([type] == "238-apache-accesslog") {
           mutate {
              add_tag => ["238-apache-accesslog"]
              add_field => { "logtime" => "%{accesstime}" }
              remove_field => [ "agent","log","@version","ecs","host" ]
          }
          date {
              timezone => "Asia/Chongqing"
              match => ["logtime","yyyy/MM/dd HH:mm:ss SSS"]
              target => "@timestamp"
              remove_field => [ "logtime" ]
          }
        }

        if ([type] == "238-mysqlslowlogs") {
          json {
            source => "message"
          }
          grok {
               match => [ "message", "^#\s+User@Host:\s+%{USER:user}\[[^\]]+\]\s+@\s+(?:(?<clienthost>\S*) )?\[(?:%{IP:clientip})?\]\s+Id:\s+%{NUMBER:id}\n# Query_time: %{NUMBER:query_time}\s+Lock_time: %{NUMBER:lock_time}\s+Rows_sent: %{NUMBER:rows_sent}\s+Rows_examined: %{NUMBER:rows_examined}\nuse\s(?<dbname>\w+);\nSET\s+timestamp=%{NUMBER:timestamp_mysql};\n(?<query>[\s\S]*)" ]
               match => [ "message", "^#\s+User@Host:\s+%{USER:user}\[[^\]]+\]\s+@\s+(?:(?<clienthost>\S*) )?\[(?:%{IP:clientip})?\]\s+Id:\s+%{NUMBER:id}\n# Query_time: %{NUMBER:query_time}\s+Lock_time: %{NUMBER:lock_time}\s+Rows_sent: %{NUMBER:rows_sent}\s+Rows_examined: %{NUMBER:rows_examined}\nSET\s+timestamp=%{NUMBER:timestamp_mysql};\n(?<query>[\s\S]*)" ]
               match => [ "message", "^#\s+User@Host:\s+%{USER:user}\[[^\]]+\]\s+@\s+(?:(?<clienthost>\S*) )?\[(?:%{IP:clientip})?\]\n# Query_time: %{NUMBER:query_time}\s+Lock_time: %{NUMBER:lock_time}\s+Rows_sent: %{NUMBER:rows_sent}\s+Rows_examined: %{NUMBER:rows_examined}\nuse\s(?<dbname>\w+);\nSET\s+timestamp=%{NUMBER:timestamp_mysql};\n(?<query>[\s\S]*)" ]
               match => [ "message", "^#\s+User@Host:\s+%{USER:user}\[[^\]]+\]\s+@\s+(?:(?<clienthost>\S*) )?\[(?:%{IP:clientip})?\]\n# Query_time: %{NUMBER:query_time}\s+Lock_time: %{NUMBER:lock_time}\s+Rows_sent: %{NUMBER:rows_sent}\s+Rows_examined: %{NUMBER:rows_examined}\nSET\s+timestamp=%{NUMBER:timestamp_mysql};\n(?<query>[\s\S]*)" ]
          }
          date {
              match => ["timestamp_mysql","UNIX"]
              target => "@timestamp"
          }
          mutate {
                add_tag => ["238-mysqlslowlogs"]
                remove_field => [ "agent","log","@version","ecs","host" ]
                remove_field => "@version"
                remove_field => "message"
          }
        }
 if ([type] == "accesslog") {
          json {
            source => "message"
          }
          grok {
               #match => [ "message", '%{WORD:xforwardedproto} %{IPORHOST:host} (?:(-|%{IPORHOST:http_ali_cdn_real_ip})) \[%{HTTPDATE:timestamp}\] %{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion} %{NUMBER:status} %{NUMBER:response} (?:(%{URI:referer}|-)) "%{DATA:agent}" %{IPV4:remote_ip} %{NUMBER:uptime}' ]
             match => [ "message", '%{WORD:xforwardedproto} %{IPORHOST:site} (?:(-|%{IPORHOST:http_ali_cdn_real_ip})) \[%{HTTPDATE:timestamp}\] %{WORD:method} %{URIPATHPARAM:urlpath} HTTP/%{NUMBER:httpversion} %{NUMBER:status} %{NUMBER:bytes} (?:(%{URI:referer}|-)) "%{DATA:useragent}" %{IPV4:clientip} %{NUMBER:duration}' ]
 
      } date { match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ] target => "@timestamp" } mutate { add_tag => ["accesslog"] remove_field => [ "agent","log","@version","ecs","host","timestamp" ] remove_field => "message" } } } output { if "238-apache-accesslog" in [tags] { elasticsearch { hosts => ["10.168.6.89:9800"] # ElasticSearch的地址加端口 index => "238-apache-accesslog-%{+YYYYMM}11" # ElasticSearch的保存文档的index名称, } } if "238-mysqlslowlogs" in [tags] { elasticsearch { hosts => ["10.168.6.89:9800"] # ElasticSearch的地址加端口 index => "238-mysqlslowlogs" # ElasticSearch的保存文档的index名称, } } if "accesslog" in [tags] { elasticsearch { hosts => ["10.168.6.89:9800"] # ElasticSearch的地址加端口 index => "accesslog" # ElasticSearch的保存文档的index名称, } } }

  

设置es索引分页大小

PUT accesslog/_settings
{
  "index":{
    "max_result_window":1000000
  }
}

  

posted @ 2022-02-11 10:05  飞鹰之歌  阅读(276)  评论(0)    收藏  举报