ELK日志收集

目前日志的痛点

运维要经常登陆到服务器上拿日志给开发、测试
每次都是出问题后才去看日志，不能提前通过日志预判问题
如果是集群服务，日志将要从多台机器取
开发人员搞出来的日志不规范，没有标准。日志目录不统一、日志类型也不明确（系统日志、错误日志、访问日志、运行日志、设备日志、debug日志）

以上痛点可以使用ELK解决，
要想让日志发挥作用，要有4个阶段，

收集
存储
搜索和展现
日志分析，做到故障预警和业务拓展

使用 elasticsearch logstash kibana 可以解决前3个阶段的问题

es：存储，搜索
logstash: 收集
kibanna: 展现

es 和 logstash都是使用java语言开发的，运行时使用jvm，所以运行环境要安装jdk(open-jdk,据说安卓系统将改用open-jdk,弃用sun-jdk,让安卓系统更轻一些)
es安装及配置
es安装的最佳实践是使用yum安装（也可以用源码安装，就是下载一个tar包，解压运行即可，好处是更新版本时很方便）
https://www.elastic.co/guide/en/elasticsearch/reference/current/rpm.html

1.Download and install the public signing key:

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

2.Create a file called elasticsearch.repo in the /etc/yum.repos.d/ directory for RedHat based distributions
[elasticsearch-6.x]
name=Elasticsearch repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

3.And your repository is ready for use. You can now install Elasticsearch with one of the following
sudo yum install elasticsearch

配置：
es要配置的地方不多，集群cluster名称（很重要），节点名称（很重要），是否锁住内存， data path, log path ,监听网络的IP ，监听网络的接口

grep "^[1]" /etc/elasticsearch/elasticsearch.yml

cluster.name: oldgirl
node.name: linux-node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 0.0.0.0
http.port: 9200

这里bootstrap.memory_lock: true 是锁内存，启动的时候会报错，导致服务无法启动，那是因为limit.conf没开启锁的权限按照日志报错提示进行添加
2018-07-01T14:15:44,143][WARN ][o.e.b.JNANatives ] Increase RLIMIT_MEMLOCK, soft limit: 65536, hard limit: 65536
[2018-07-01T14:15:44,144][WARN ][o.e.b.JNANatives ] These can be adjusted by modifying /etc/security/limits.conf, for example:
# allow user 'elasticsearch' mlockall
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
[2018-07-01T14:15:44,144][WARN ][o.e.b.JNANatives

至此一个单节点的es安装完成，可以访问测试 http://IP:9200
{
"name" : "linux-node-1",
"cluster_name" : "oldgirl",
"cluster_uuid" : "5hmMNxc5QxG6q-2t2VNqrg",
"version" : {
"number" : "6.3.0",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "424e937",
"build_date" : "2018-06-11T23:38:03.357887Z",
"build_snapshot" : false,
"lucene_version" : "7.3.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
看到以上结果，说明一个es已经搭建成功，es搭建成功后接下来就是往es里存数据了。
如何和es交互？两种大的方法
一种是java API 一种是resful api

我们使用restfulapi，以json数据格式与es交互
比如在shell环境中执行：
curl -H Content-Type:application/json -i -X GET 'http://127.0.0.1:9200/_count?pretty' -d '
{
"query": {
"match_all": {}
}
}'
返回结果
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 114

{
"count" : 0,
"_shards" : {
"total" : 0,
"successful" : 0,
"skipped" : 0,
"failed" : 0
}
}
-X GET 请求的方法
加-i是把响应头显示出来
这里要加-H Content-Type:application/json ，告诉服务器用json格式解析请求数据，否则会报如下错误：
HTTP/1.1 406 Not Acceptable
content-type: application/json; charset=UTF-8
content-length: 109

{
"error" : "Content-Type header [application/x-www-form-urlencoded] is not supported",
"status" : 406
}

这样使用shell命令行curl访问 es的restfulapi，但是不方便，es提供了很多插件，我们来使用官方推荐的插件，提供一个web管理的形式，来和es的restfulapi进行交互

官方推荐的插件在 elasticsearch 6.x版本不在支持，我们用开源的elasticsearch-head github地址：https://github.com/mobz/elasticsearch-head
安装方法：
Running with built in server
git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head
npm install
npm run start
open http://localhost:9100/

然后去修改elasticsearch的配置文件
vim /etc/elasticsearch/elasticsearch.yml
最后添加如下两行
http.cors.enabled: true
http.cors.allow-origin: "*"

然后访问
打开http://localhost:9100/
添加http://localhost:9200
至此我们就可以使用web方式与elasticsearch的restfulapi进行交互了

接下来就是做一个elasticsearch集群
安装都是一样的，就在配置文件里把cluster name 设置成一样。
启动后es用多播或者组播对外宣称自己是哪个集群的。这里要注意的是，多播形式在6.x版本不好用，建议使用组播。组播的配置方式

discovery.zen.ping.unicast.hosts: ["host1", "host2"] 这里最好填写ip

这里并不需要把所有的节点名称都添加进去，只需要添加1到2个。因为他们会传播的。

如何判断是否加入集群了，两种方式，一种看elasticsearch-head 概述里能看到。
另外一种是通过看elasticsearch的日志，日志的名称为集群的名称。

还有就是监控插件bigdesk 很可惜从2.0后就不支持了。还有一个kopf插件3.0也不支持，总之现在es在做平台化，我们这里学习了解即可，，生产尽量使用平台产品。少很多运维成本。
常用的插件就这3个，有2个已经不能使用了。

es集群安装配置成功后，基本的使用和概念了解后，我们就开始学习logstash ，es的使用有很多知识，但是对于我们运维来说，最重要的是收集日志，所以接下来重点学习logstash的使用。

logstash的安装
是不是要在每一台服务器上安装logstash，不一定如果通过网络收就不需要。要是收集文本文件，那就是了。
https://www.elastic.co/guide/en/logstash/current/installing-logstash.html

YUM
Download and install the public signing key:

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

Add the following in your /etc/yum.repos.d/ directory in a file with a .repo suffix, for example logstash.repo
vim /etc/yum.repos.d/logstash.repo
[logstash-6.x]
name=Elastic repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

And your repository is ready for use. You can install it with:

sudo yum install logstash

logstash使用gruby开发的。启动会有些慢
/usr/share/logstash/bin/logstash -e 'input { stdin{} } output { stdout{} }'

-e 执行

一个input 一个output
stdin{} ,stdout{} 是两个插件
运行需要等1分钟左右
[root@node2 elasticsearch]# /usr/share/logstash/bin/logstash -e 'input { stdin{} } output { stdout{} }'
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2018-07-01 15:03:59.682 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2018-07-01 15:04:00.629 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.3.0"}
[INFO ] 2018-07-01 15:04:03.885 [Converge PipelineAction::Create

] pipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
The stdin plugin is now waiting for input:
[INFO ] 2018-07-01 15:04:04.098 [Converge PipelineAction::Create

] pipeline - Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x1b16cf42 run>"}
[INFO ] 2018-07-01 15:04:04.225 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2018-07-01 15:04:04.547 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
hello world
{
"@version" => "1",
"@timestamp" => 2018-07-01T07:04:13.785Z,
"message" => "hello world",
"host" => "node2.shared"
}
hehehe
{
"@version" => "1",
"@timestamp" => 2018-07-01T07:04:20.411Z,
"message" => "hehehe",
"host" => "node2.shared"
}

以上就是标准输入输出的例子。
/usr/share/logstash/bin/logstash -e 'input { stdin{} } output { stdout{ codec => rubydebug } }'
...
hello
{
"message" => "hello",
"@version" => "1",
"@timestamp" => 2018-07-01T07:08:02.456Z,
"host" => "node2.shared"
}

我们把logstash进来的每条数据叫做事件，不叫一行，多行数据可能表示一个事件，比如一个报错肯定不止一行信息。

把内容写到es中
输入还是用标准，输出改下

/usr/share/logstash/bin/logstash -e 'input { stdin{} } output { elasticsearch { hosts => ["10.211.55.8:9200"] } }'

相关官方文档https://www.elastic.co/guide/en/logstash/current/index.html

输出到es 就是那么简单。
能不能同时输出到es和前端，可以，不是负载均衡是同时。一个input，可以有多个output

/usr/share/logstash/bin/logstash -e 'input { stdin{} } output { elasticsearch { hosts => ["10.211.55.8:9200"] } stdout { codec => rubydebug } }'
什么作用呢？生产上写到es的时候同时写到文本。文本保留是最好的，3个好处 1.最简单 2.可以2次加工 3. 压缩比最高日志记什么好？文本

接下来我们就要学习写logstash的配置文件，不能一直在命令行写，写到配置文件方便。

最简单的配置文件：
vim /etc/logstash/conf.d/logstash-simple.conf
input { stdin { } }
output {
elasticsearch { hosts => ["10.211.55.8:9200"] }
stdout { codec => rubydebug }
}

然后启动
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/logstash-simple.conf

我们主要学习logstash的配置语法

This is a comment. You should use comments to describe

parts of your configuration.

input {
...
}

filter {
...
}

output {
...
}

input{},output{}是必须的，filter{}是可选的

input {
file {
path => "/var/log/messages"
type => "syslog"
}

file {
path => "/var/log/apache/access.log"
type => "apache"
}
}

案例 1
最常见的就是从文件输入

vim /etc/logstash/conf.d/file.conf
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
}

output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}

接下来不仅收集系统日志而且要收集java日志
案例 2
vim /etc/logstash/conf.d/file.conf
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}

file {
    path => "/var/log/elasticsearch/oldgirl.log"
    type => "es-error"
    start_position => "beginning"
}

}

这样通过type 字段做if判断。
6.x中file插件文档没写type属性，但是能用，还不能换成其他的
这里要注意的是我们还没有给massge信息里做域，域中是有type属性的，那么这时候你再在file里使用type用于判断那就会失效了。
当然也可以在一台服务器上启动多个logstash程序去实现不同服务的日志。不过占用cpu和内存
Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>6}
启动时提示信息，告诉我们配置文件在file里设置的type并不是es 数据浏览中的_type

这样去elasticsearch中查看日志会有一个问题，就是一个错误信息应该是一个事件，显示在一个事件里才是最好的，但是从文件里读取导致这个数据被切成了多行。这样是很不方便的。怎么把它收集到一个事件里呢。该引入codec了

案例3
input {
stdin {
codec => multiline {
pattern => "pattern, a regexp"
negate => "true" or "false"
what => "previous" or "next"
}
}
}

上面三个参数的解释
pattern 正则，在什么情况下和并
negate
what
input {
stdin {
codec => multiline {
pattern => "^["
negate => "true"
what => "previous"
}
}
}
output {
stdout {
codec => rubydebug
}
}
以[开头的为一个事件，不以[开头的就合并到上一个事件去
vim /etc/logstash/conf.d/all.conf
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}

file {
    path => "/var/log/elasticsearch/oldgirl.log"
    type => "es-error"
    start_position => "beginning"
        codec => multiline {
             pattern => "^\["
             negate => "true"
             what => "previous"
           }
}

}

/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/all.conf

接下来从elastic-head查看不方便，就要引用我们的kibana服务
kibana是elasticsearch的可视化平台
https://www.elastic.co/guide/en/kibana/current/index.html
kibana 一开始PHP，改为ruby 又改成gruby 现在改成nodejs

wget https://artifacts.elastic.co/downloads/kibana/kibana-6.3.0-linux-x86_64.tar.gz
shasum -a 512 kibana-6.3.0-linux-x86_64.tar.gz
tar -xzf kibana-6.3.0-linux-x86_64.tar.gz
mv kibana-6.3.0-linux-x86_64/ /usr/local/
cd /usr/local/
ln -s kibana-6.3.0-linux-x86_64/ kibana

更改kibana配置文件
cd /usr/local/kibana/config
vim kibana.yml
4个地方修改
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.url: "http://10.211.55.8:9200"
kibana.index: ".kibana"
kibana.index值得注意，kibana没有数据库，但数据总要又个地方存储，那么既然和es是生死之交，那就用es,直接告诉你帮我创建一个.kibana的索引，用来存储kibana数据
配置完成后，直接启动kibana

我们收集了system日志，java 的日志（es的运行日志），接下来我们收集nginx的日志。
es里有域的概念，域可以理解成表中的字段。 index 索引理解成数据库实例，_type 理解成数据库里的表，而域就是字段即把 message里的内容搞成key:value的形式

nginx 的日志通过配置nginx.conf文件，可以让ngingx的日志格式统一输出为json文件格式。而logstash 传递给es,es可以直接把这种json数据格式解析成k:v的形式，这样将为以后使用elk中的kibana进行搜索增加效率。
nginx配置日志使用json的方式如下：nginx.org
http://nginx.org/en/docs/http/ngx_http_log_module.html 查看nginx官网的关于日志模块的配置
其中
Syntax: log_format name [escape=default|json|none] string ...;
Default: log_format combined "...";
Context: http

我们只需要在nginx中的http配置块中添加
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
log_format json '{"@timestamp":"$time_iso8601",'
'"@version":"1",'
'"url":"$uri",'
'"status":"$status",'
'"domain":"$host",'
'"host":"$server_addr",'
'"size":$body_bytes_sent,'
'"responsetime":$request_time,'
'"referer": "$http_referer",'
'"ua": "$http_user_agent"'
'}';
access_log /var/log/nginx/access_json.log json;

access_log /var/log/nginx/access.log main;

启动nginx,访问产生日志，并且确认是json格式的
此时写一个json.conf文件
vim /etc/logstash/conf.d/json.conf
input {
file {
path => "/var/log/nginx/access_json.log"
codec => json
}
}

output {
stdout {
codec => rubydebug
}
}

执行结果如下：
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/json.conf
[INFO ] 2018-07-01 22:22:36.797 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2018-07-01 22:22:37.539 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
{
"domain" => "10.211.55.8",
"@version" => "1",
"host" => "10.211.55.8",
"responsetime" => 0.0,
"@timestamp" => 2018-07-01T14:23:24.000Z,
"size" => 0,
"status" => "304",
"path" => "/var/log/nginx/access_json.log",
"ua" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
"url" => "/index.html",
"referer" => "-"
}

接下来我们就可以添加到all.conf中了
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
file {
path => "/var/log/nginx/access_json.log"
type => "nginx-log"
start_position => "beginning"
codec => json
}

file {
    path => "/var/log/elasticsearch/oldgirl.log"
    type => "es-error"
    start_position => "beginning"
        codec => multiline {
             pattern => "^\["
             negate => "true"
             what => "previous"
           }
}

}

output {
if [type] == "system" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}
if [type] == "es-error" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "es-error-%{+YYYY.MM.dd}"
}
}
if [type] == "nginx-log" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "nginx-log-%{+YYYY.MM.dd}"
}
}
}
这样就可以在elasticsearch-head中查看到新的index
在kibana中添加新的索引，然后就可以进行查询了

message日志的收集
前面我们也收集了message日志，但是我们使用的是file插件，
我们知道系统的日志是由syslog程序生成，syslog是可以将日志写到远程的
所以我们应该使用logstash 监听一个端口，syslog直接将日志写到监听端口就行了。
最好的是生产上所有的业务都用syslog进行写日志，那就相当于不需要在每台机器上安装logstash进行抓取日志，只需要搞一个logstash服务端口
nginx 也有支持写到syslog,原生的不支持，淘宝开源的支持，还有nginx lua 支持

在 input 插件列表中能找到syslog

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-syslog.html

vim /etc/logstash/conf.d/syslog.conf
input {
syslog {
type => "system-syslog"
host => "10.211.55.8"
port => "514"
}

}

output {
stdout {
codec => "rubydebug"
}
}
/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/syslog.conf

启动后确认514端口是开放的

接下来就是更改系统的rsyslog.conf配置文件

vim /etc/rsyslog.conf
找到

. @@remote-host:514

去掉#改成：
. @@10.211.55.8:514
然后重启rsyslog服务
systemctl restart rsyslog
重启下你就会立马看到日志
{
"pid" => "20915",
"severity" => 5,
"logsource" => "node2",
"facility_label" => "security/authorization",
"timestamp" => "Jul 2 20:56:43",
"type" => "system-syslog",
"program" => "polkitd",
"@timestamp" => 2018-07-02T12:56:43.000Z,
"facility" => 10,
"host" => "10.211.55.8",
"@version" => "1",
"message" => "Unregistered Authentication Agent for unix-process:1927:9050003 (system bus name :1.1149, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale zh_CN.UTF-8) (disconnected from bus)\n",
"priority" => 85,
"severity_label" => "Notice"
}

然后我们就可以把syslog.conf的配置写在all.conf配置文件中
input {
file {
path => "/var/log/messages"
type => "system"
start_position => "beginning"
}
file {
path => "/var/log/nginx/access_json.log"
type => "nginx-log"
start_position => "beginning"
codec => json
}

file {
    path => "/var/log/elasticsearch/oldgirl.log"
    type => "es-error"
    start_position => "beginning"
        codec => multiline {
             pattern => "^\["
             negate => "true"
             what => "previous"
           }
}
syslog {
	type => "system-syslog"
	host => "10.211.55.8"
	port => "514"
}

}

output {
if [type] == "system" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "system-%{+YYYY.MM.dd}"
}
}
if [type] == "es-error" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "es-error-%{+YYYY.MM.dd}"
}
}
if [type] == "nginx-log" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "nginx-log-%{+YYYY.MM.dd}"
}
}
if [type] == "system-syslog" {
elasticsearch {
hosts => ["10.211.55.8:9200"]
index => "sysetm-syslog-%{+YYYY.MM.dd}"
}
}
}

启动后
logger "hallo 1"
logger "hallo 1"
logger "hallo 1"
logger "hallo 1"
logger "hallo 1"
logger "hallo 1"
进行测试

上面这个可以当作生产的模版。
还有一个常见的logstash插件，tcp插件
system-syslog可以监听syslog日志，假如有应用程序不想把日志写到文件中，就可以用logstash直接启动tcp监听端口
这样，程序可以将日志直接写到tcp监听端口。
写法如下：
vim tcp.conf
input {
tcp {
host => "10.211.55.8"
port => "6666"
}
}

output {
stdout {
codec => "rubydebug"
}
}

启动 /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/tcp.conf

然后用nc进行测试

nc 10.211.55.8 6666 < /etc/resolv.conf

{
"host" => "node2.shared",
"message" => "# Generated by NetworkManager",
"@timestamp" => 2018-07-02T13:20:27.921Z,
"port" => 44257,
"@version" => "1"
}
{
"host" => "node2.shared",
"message" => "search localdomain shared",
"@timestamp" => 2018-07-02T13:20:27.943Z,
"port" => 44257,
"@version" => "1"
}
{
"host" => "node2.shared",
"message" => "nameserver 10.211.55.1",
"@timestamp" => 2018-07-02T13:20:27.944Z,
"port" => 44257,
"@version" => "1"
}

echo "hehe" | nc 10.211.55.8 6666

{
"host" => "node2.shared",
"message" => "hehe",
"@timestamp" => 2018-07-02T13:21:39.242Z,
"port" => 44259,
"@version" => "1"
}

echo "oldgirl" > /dev/tcp/10.211.55.8/6666

{
"host" => "node2.shared",
"message" => "oldgirl",
"@timestamp" => 2018-07-02T13:23:23.936Z,
"port" => 44260,
"@version" => "1"
}

a-z ↩︎

posted @ 2018-07-02 21:30 zhming 阅读(589) 评论(0) 收藏举报

刷新页面返回顶部

zhming