导航

使用 Heka 导入自定义的nginx日志到Elasticsearch

Posted on 2016-01-20 14:14  蝈蝈俊  阅读(1162)  评论(0编辑  收藏  举报

重置Heka执行进度

heka的进度配置文件存在配置项 base_dir 设置的目录,只需要删除这个文件夹下面的内容,就可以完全重置heka的进度。

base_dir 配置项默认是在下面目录: ‘/var/cache/hekad’  或‘c:\var\cache\hekad’

参考:http://hekad.readthedocs.org/en/latest/getting_started.html#global-configuration 

删除Elasticsearch数据

我们在调整导入策略后,数据需要重算,这时候就需要清除之前的数据,ES常用的几个插件就具有删除功能,用起来比较简单。

如下面截图:

image

上图这个工具是下面这个:

https://mobz.github.io/elasticsearch-head/    默认部署它的地址是: http://ip:9200/_plugin/head/

另外还推荐这个: http://www.elastichq.org/     git地址在: https://github.com/royrusso/elasticsearch-HQ  默认它的部署地址是: http://ip:9200/_plugin/hq/

解析并读取nginx日志

由于我们nginx日志是自定义格式的,这时候我们就要用灵活度最高的 PayloadRegexDecoder 来定义正则表达式来提取数据。

参考: http://hekad.readthedocs.org/en/latest/config/decoders/payload_regex.html

由于Heka是go研发的, 它的正则表达式语法是 syntax 的语法, 简单地go正则表达式试用工具可以用 https://regoio.herokuapp.com/ 

复杂的可以用 RegexBuddy(http://www.regexbuddy.com/download.html)。

Timestamp

默认Timestamp是当前时间,正则表达式中需要匹配出来的名字也是 Timestamp 才能被提取。

另外,还有两个参数定时提取的规则。

timestamp_layout

定义提取时间的字符串表述,注意,这里是go的time格式定义。

A formatting string instructing hekad how to turn a time string into the actual time representation used internally. Example timestamp layouts can be seen in Go’s time documentation. In addition to the Go time formatting, special timestamp_layout values of “Epoch”, “EpochMilli”, “EpochMicro”, and “EpochNano” are supported for Unix style timestamps represented in seconds, milliseconds, microseconds, and nanoseconds since the Epoch, respectively.

一些静态的参数如下:

        ANSIC       = "Mon Jan _2 15:04:05 2006"
        UnixDate    = "Mon Jan _2 15:04:05 MST 2006"
        RubyDate    = "Mon Jan 02 15:04:05 -0700 2006"
        RFC822      = "02 Jan 06 15:04 MST"
        RFC822Z     = "02 Jan 06 15:04 -0700" // RFC822 with numeric zone
        RFC850      = "Monday, 02-Jan-06 15:04:05 MST"
        RFC1123     = "Mon, 02 Jan 2006 15:04:05 MST"
        RFC1123Z    = "Mon, 02 Jan 2006 15:04:05 -0700" // RFC1123 with numeric zone
        RFC3339     = "2006-01-02T15:04:05Z07:00"
        RFC3339Nano = "2006-01-02T15:04:05.999999999Z07:00"
        Kitchen     = "3:04PM"
        // Handy time stamps.
        Stamp      = "Jan _2 15:04:05"
        StampMilli = "Jan _2 15:04:05.000"
        StampMicro = "Jan _2 15:04:05.000000"
        StampNano  = "Jan _2 15:04:05.000000000"
参考: https://golang.org/pkg/time/#pkg-constants

timestamp_location

时区定义,如果timestamp_layout中没有定义时区信息时,这个配置才起作用。

Time zone in which the timestamps in the text are presumed to be in. Should be a location name corresponding to a file in the IANA Time Zone database (e.g. “America/Los_Angeles”), as parsed by Go’stime.LoadLocation() function (see http://golang.org/pkg/time/#LoadLocation). Defaults to “UTC”. Not required if valid time zone info is embedded in every parsed timestamp, since those can be parsed as specified in the timestamp_layout. This setting will have no impact if one of the supported “Epoch*” values is used as the timestamp_layout setting.

一个配置的例子如下:

[SphinxRequestDecoder]
type = "PayloadRegexDecoder"
match_regex = '.+ (?P<Hostname>\S+) sphinx: (?P<Timestamp>.+) \[(?P<Uuid>.+)\] REQUEST: path=(?P<Path>\S+) remoteaddr=(?P<Remoteaddr>\S+) (?P<Headers>.+)'
timestamp_layout = "2006/01/02 15:04:05"

参考: https://github.com/mozilla-services/heka/wiki/How-to-convert-a-PayloadRegex-MultiDecoder-to-a-SandboxDecoder-using-an-LPeg-Grammar

 

导入数据到 Elasticsearch

导出数据到Elasticsearch,这时候我们就需要用 ElasticSearchOutput 了,这个output只是定义了 Elasticsearch 连接的一些属性,具体导出时的映射关系是下面三个 Encoder 定义的: ElasticSearch JSON Encoder, ElasticSearch Logstash V0 Encoder, or ElasticSearch Payload Encoder.

这三个 Encoder的区别

如下图:

ElasticSearch JSON Encoder ElasticSearch Logstash V0 Encoder ElasticSearch Payload Encoder

Plugin Name: ESJsonEncoder

Plugin Name: ESLogstashV0Encoder

Plugin Name: SandboxEncoder
File Name: lua_encoders/es_payload.lua

This encoder serializes a Heka message into a clean JSON format,
preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing.

This encoder serializes a Heka message into a JSON format,
preceded by a separate JSON structure containing information required for ElasticSearch BulkAPI indexing.

The message JSON structure uses the original (i.e. “v0”) schema popularized by Logstash.

Using this schema can aid integration with existing Logstash deployments.

This schema also plays nicely with the default Logstash dashboard provided by Kibana.

Prepends ElasticSearch BulkAPI index JSON to a message payload.

The JSON serialization is done by hand, without the use of Go’s stdlib JSON marshalling.

This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

The JSON serialization is done by hand, without using Go’s stdlib JSON marshalling.

This is so serialization can succeed even if the message contains invalid UTF-8 characters, which will be encoded as U+FFFD.

 
  与 Logstash 的高度仿真 lua 插件

ESJsonEncoder 为例,我们 timestamp 要用自己配置的时间,而不是消息产生的时间, 需要把它设置成 true。

es_index_from_timestamp (bool):

When generating the index name use the timestamp from the message instead of the current time. Defaults to false.

 

注意这里 的 timestamp 设置目前我还没看到哪里在用,之前导入ES的数据时间以为是这里设置的,但是其实不是。

ElasticSearchOutput 的一些设置

ElasticSearchOutput 有两个下面参数,来确定按照什么频率给服务器发送请求。

flush_interval (int):
Interval at which accumulated messages should be bulk indexed into ElasticSearch, in milliseconds. Defaults to 1000 (i.e. one second).

flush_count (int):
Number of messages that, if processed, will trigger them to be bulk indexed into ElasticSearch. Defaults to 10.

上面2个参数会同时生效,当队列中积攒了 flush_count 个消息或者定时延迟超过了 flush_interval 毫秒时, 如果有新消息,则发送给 ElasticSearch 。

发送的地址是 http://10.30.0.32:9200/_bulk  。 随机抽取的一段发送的json数据如下:

 

POST http://10.30.0.32:9200/_bulk HTTP/1.1
Host: 10.30.0.32:9200
User-Agent: Go 1.1 package http
Content-Length: 9374
Accept: application/json
Accept-Encoding: gzip

{"index":{"_index":"nginx-2016.01.06","_type":"nginx"}}
{"Uuid":"12b6e9b3-d593-4cf4-b473-761ae7e982b0","Timestamp":"2016-01-06T01:31:51","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.213 - - [06/Jan/2016:09:31:51 +0800] \u0022POST /simcard/uploadSimcardStatus HTTP/1.0\u0022 200 61 \u0022-\u0022 \u0022Apache-HttpClient/4.5 (Java/1.7.0_67)\u0022 122.97.213.5 0.166\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","responseCode":"<responseCode>","status":"200","http_referer":"-","request_time":"0.166","http_user_agent":"Apache-HttpClient/4.5 (Java/1.7.0_67)","upstream_response_time":"","remote_addr":"10.159.191.213","request":"POST /simcard/uploadSimcardStatus HTTP/1.0","hostname":"-","timestamp":"06/Jan/2016:09:31:51 +0800","http_x_forwarded_for":"122.97.213.5","remote_user":"-","body_bytes_sent":"61"}
{"index":{"_index":"nginx-2016.01.05","_type":"nginx"}}
{"Uuid":"6ff51dd8-ba9c-4440-b567-3de391cdac2b","Timestamp":"2016-01-05T07:36:45","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.90 - - [05/Jan/2016:15:36:45 +0800] \u0022POST /soa/mfderchant/list HTTP/1.0\u0022 200 926 \u0022-\u0022 \u0022Java/1.7.0_71\u0022 123.56.134.28 0.012\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_user_agent":"Java/1.7.0_71","timestamp":"05/Jan/2016:15:36:45 +0800","remote_addr":"10.159.191.90","request":"POST /soa/merttchant/list HTTP/1.0","upstream_response_time":"","remote_user":"-","body_bytes_sent":"926","responseCode":"<responseCode>","http_referer":"-","http_x_forwarded_for":"123.56.134.28","hostname":"-","status":"200","request_time":"0.012"}
{"index":{"_index":"nginx-2015.12.17","_type":"nginx"}}
{"Uuid":"58eb317c-2729-4037-a82e-d475e68324fd","Timestamp":"2015-12-17T14:03:26","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [17/Dec/2015:22:03:26 +0800] \u0022GET /creepers/creepers/pubddlic/images/cardCoupon/cardCoupon1.png HTTP/1.0\u0022 404 296 \u0022http://ewr.wangpos.com/creepersplatfofrm/index.xhtml\u0022 \u0022Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36\u0022 61.51.252.82 0.004\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","request":"GET /creepders/crefepers/public/images/cardCoupon/cardCoupon1.png HTTP/1.0","responseCode":"<responseCode>","http_referer":"http://rre.wangpos.com/creepersplatform/index.xhtml","upstream_response_time":"","http_x_forwarded_for":"61.51.252.82","timestamp":"17/Dec/2015:22:03:26 +0800","body_bytes_sent":"296","remote_user":"-","http_user_agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36","status":"404","request_time":"0.004","hostname":"-","remote_addr":"10.171.20.136"}
{"index":{"_index":"nginx-2015.12.14","_type":"nginx"}}
{"Uuid":"969f2737-0a21-4c27-908a-29a22f1a1475","Timestamp":"2015-12-14T10:01:02","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [14/Dec/2015:18:01:02 +0800] \u0022POST /wxcaddrddeal/cashAccess/sendCard HTTP/1.0\u0022 200 48 \u0022-\u0022 \u0022Java/1.7.0_71\u0022 123.56.134.28 0.016\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_user_agent":"Java/1.7.0_71","hostname":"-","status":"200","body_bytes_sent":"48","http_x_forwarded_for":"123.56.134.28","upstream_response_time":"","request":"POST /wxcarddeal/cashAccess/sendCard HTTP/1.0","remote_addr":"10.171.20.136","remote_user":"-","http_referer":"-","responseCode":"<responseCode>","timestamp":"14/Dec/2015:18:01:02 +0800","request_time":"0.016"}
{"index":{"_index":"nginx-2016.01.08","_type":"nginx"}}
{"Uuid":"80ff4701-85ad-4ecc-816c-833dbaded8df","Timestamp":"2016-01-08T07:27:11","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [08/Jan/2016:15:27:11 +0800] \u0022GET /uploadify/jquery.uploadify-3.1.min.js HTTP/1.0\u0022 304 0 \u0022http://www.wadngpos.com/batchCheck2Code?posMerId=1823cf1eba79411a9d32a3cb8dd3b821\u0022 \u0022Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36\u0022 61.51.252.82 0.004\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","http_x_forwarded_for":"61.51.252.82","remote_user":"-","upstream_response_time":"","timestamp":"08/Jan/2016:15:27:11 +0800","status":"304","hostname":"-","responseCode":"<responseCode>","http_referer":"http://65.wangpos.com/batchCheckCode?posMerId=1823cf1eba79411a9d32a3cb8dd3b821","request":"GET /uplfoadify/jquery.uploadify-3.1.min.js HTTP/1.0","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36","body_bytes_sent":"0","remote_addr":"10.171.20.136","request_time":"0.004"}
{"index":{"_index":"nginx-2015.12.10","_type":"nginx"}}
{"Uuid":"9c09fb0a-3fee-475c-bfad-04efd3a2f44e","Timestamp":"2015-12-10T11:32:26","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [10/Dec/2015:19:32:26 +0800] \u0022POST /usfer/getSpuerUserByQulificationId HTTP/1.0\u0022 200 182 \u0022-\u0022 \u0022Java/1.7.0_71\u0022 123.56.134.28 0.022\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_addr":"10.171.20.136","timestamp":"10/Dec/2015:19:32:26 +0800","responseCode":"<responseCode>","http_referer":"-","upstream_response_time":"","request":"POST /user/getSpuerUserByQulificationId HTTP/1.0","http_user_agent":"Java/1.7.0_71","body_bytes_sent":"182","status":"200","hostname":"-","http_x_forwarded_for":"123.56.134.28","request_time":"0.022","remote_user":"-"}
{"index":{"_index":"nginx-2015.12.17","_type":"nginx"}}
{"Uuid":"d2c08886-cdd1-4dbb-b508-7bdec4d27460","Timestamp":"2015-12-17T07:20:29","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [17/Dec/2015:15:20:29 +0800] \u0022GET /weipossoa/ HTTP/1.0\u0022 200 3460 \u0022-\u0022 \u0022curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2\u0022 10.173.16.251 0.003\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_addr":"10.171.20.136","responseCode":"<responseCode>","status":"200","remote_user":"-","timestamp":"17/Dec/2015:15:20:29 +0800","http_referer":"-","request_time":"0.003","http_x_forwarded_for":"10.173.16.251","hostname":"-","http_user_agent":"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2","upstream_response_time":"","request":"GET /weipossoa/ HTTP/1.0","body_bytes_sent":"3460"}
{"index":{"_index":"nginx-2015.12.28","_type":"nginx"}}
{"Uuid":"344bec04-268c-455d-94af-e44f72e50104","Timestamp":"2015-12-28T09:00:34","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.68 - - [28/Dec/2015:17:00:34 +0800] \u0022GET /weipossoa/accessToken/check?providerAppCode=100028&accessToken=5680c5d301070742efba15ba HTTP/1.0\u0022 200 60 \u0022-\u0022 \u0022Java/1.8.0_65\u0022 61.51.252.82 0.003\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","remote_user":"-","responseCode":"<responseCode>","hostname":"-","status":"200","http_referer":"-","timestamp":"28/Dec/2015:17:00:34 +0800","request_time":"0.003","http_user_agent":"Java/1.8.0_65","request":"GET /weipossoa/accessToken/check?providerAppCode=100028&accessToken=5680c5d301070742efba15ba HTTP/1.0","upstream_response_time":"","remote_addr":"10.159.191.68","body_bytes_sent":"60","http_x_forwarded_for":"61.51.252.82"}
{"index":{"_index":"nginx-2016.01.08","_type":"nginx"}}
{"Uuid":"0034bae1-6d16-486c-94fa-113d3cc15c42","Timestamp":"2016-01-08T22:20:25","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.171.20.136 - - [09/Jan/2016:06:20:25 +0800] \u0022GET /wxcard/jsp/common.jsp HTTP/1.0\u0022 200 1407 \u0022-\u0022 \u0022curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2\u0022 123.57.53.143 0.005\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","request":"GET /wxcard/jsp/common.jsp HTTP/1.0","upstream_response_time":"","timestamp":"09/Jan/2016:06:20:25 +0800","responseCode":"<responseCode>","status":"200","http_referer":"-","request_time":"0.005","remote_addr":"10.171.20.136","http_x_forwarded_for":"123.57.53.143","http_user_agent":"curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.16.2.3 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2","hostname":"-","remote_user":"-","body_bytes_sent":"1407"}
{"index":{"_index":"nginx-2016.01.02","_type":"nginx"}}
{"Uuid":"7775c7fd-d7bb-4a80-89fa-03fda682ca62","Timestamp":"2016-01-02T09:19:55","Type":"nginx","Logger":"nginx-access","Severity":7,"Payload":"10.159.191.97 - - [02/Jan/2016:17:19:55 +0800] \u0022POST /PosBusiness/pos/biz/service HTTP/1.0\u0022 200 117 \u0022-\u0022 \u0022Apache-HttpClient/4.1.3 (java 1.5)\u0022 10.173.53.128 0.017\u000a","EnvVersion":"","Pid":0,"Hostname":"localhost.localdomain","body_bytes_sent":"117","status":"200","request":"POST /PosBusiness/pos/biz/service HTTP/1.0","timestamp":"02/Jan/2016:17:19:55 +0800","http_referer":"-","remote_user":"-","responseCode":"<responseCode>","upstream_response_time":"","http_x_forwarded_for":"10.173.53.128","hostname":"-","http_user_agent":"Apache-HttpClient/4.1.3 (java 1.5)","remote_addr":"10.159.191.97","request_time":"0.017"}

这里是满足10条,所以就发送了一次。