Elastisearch笔记
es 和 关系型数据库的简单对比
| RDBMS | Elasticsearch | 
|---|---|
| Table | Index(Type) | 
| Row | Doucment | 
| Column | Filed | 
| Schema | Mapping | 
| SQL | DSL | 
## 索引相关信息
GET kibana_sample_data_ecommerce
## 文档总数
GET kibana_sample_data_ecommerce/_count
## _cat indices API
## 模糊匹配
GET /_cat/indices/kibana_*
## 按照文档个数排序
GET /_cat/indices?v&s=docs.count:desc
## 查看文档的一些基本信息
GET /_cat/indices/kibana_sample_data_ecommerce?v
集群的名字默认为 elasticsearch
分片分为 Primary Shard & Replica Shard
创建分片索引时指定主分片数,后续不允许修改,除非 Reindex
副本分片数量可以动态调整
## 集群健康状况
GET _cluster/health
GET _cat/nodes?v
GET _cat/shards?v
index                        shard prirep state   docs   store ip         node
.apm-agent-configuration     0     p      STARTED    0    208b 172.18.0.2 12b52a46e43f
.kibana_1                    0     p      STARTED   94 967.7kb 172.18.0.2 12b52a46e43f
kibana_sample_data_ecommerce 0     p      STARTED 4675   4.5mb 172.18.0.2 12b52a46e43f
.apm-custom-link             0     p      STARTED    0    208b 172.18.0.2 12b52a46e43f
.kibana_task_manager_1       0     p      STARTED    5  55.2kb 172.18.0.2 12b52a46e43f
简单的 CRUD
## 自动生成id
POST my_index/_doc/
{
  "user":"xiaoting",
  "comment":"you know for search"
}
## 用户指定id,多次 PUT 会更新 version
PUT my_index/_doc/2
{
  "user":"xiaoting",
  "comment":"you know for search"
}
## 读取
GET my_index/_doc/2
## 查询
GET my_index/_search
{
  "query":{
    "match_all":{}
  }
}
## 在原文档上面增加字段,如果用 put,就必须全部指定,不然会缺失字段
POST my_index/_update/2
{
  "doc":{
    "post_date":"2020-05-21"
  }
}
## 删除
DELETE my_index/_doc/2
## 批量读取
GET _mget
{
  "docs": [
    {
      "_index": "my_index",
      "_id": 1
    },
    {
      "_index": "my_index",
      "_id": 2
    }
  ]
}
倒排索引
正排索引——目录页
倒排索引——索引页
分词器 Analysis
三部分组成
Character Filters Tokenizer Token Filters
## 直接指定 Analysis 进行分词
GET /_analyze
{
  "analyzer": "standard",
  "text": "liuchenglong is a student"
}
## 指定索引的字段进行分词,可以模拟分词器对该字段是合种分词结果
GET my_index/_analyze
{
  "field": "user",
  "text": "xiaoting"
}
## 自定义分词器进行分词
GET /_analyze
{
  "tokenizer": "standard",
  "filter": [
    "lowercase"
  ],
  "text": "liuchenglong is a student"
}
Standard Analyzer 是默认的分词器
GET /_analyze
{
  "analyzer": "standard",
  "text": "Liuchenglong in the house"
}
GET /_analyze
{
  "analyzer": "simple",
  "text": "Liuchenglong in the house"
}
GET /_analyze
{
  "analyzer": "whitespace",
  "text": "Liuchenglong in the house"
}
GET /_analyze
{
  "analyzer": "stop",
  "text": "Liuchenglong in the house"
}
GET /_analyze
{
  "analyzer": "keyword",
  "text": "Liuchenglong in the house"
}
GET /_analyze
{
  "analyzer": "pattern",
  "text": "Liuchenglong in the house"
}
GET /_analyze
{
  "analyzer": "english",
  "text": "Liuchenglong in the house"
}
## 中文分词器插件 ik(需要额外安装下载)
GET /_analyze
{
  "analyzer": "ik_max_word",
  "text": "江苏省无锡市滨湖区溪北新村"
}
GET /_analyze
{
  "analyzer": "ik_smart",
  "text": "江苏省无锡市滨湖区溪北新村"
}
Search API
1.URL Search,使用 q 指定查询字符串
2.Request Body Search,使用 get 或者 post,可以在请求体中使用 es 的 DSL 语法
/_search
/index1/_search
/index1,index2/_search
/index*/_search
URL Search
## q 指定查询内容,df 指定查询的字段
GET my_index/_search?q=chenglong&df=user
GET my_index/_search?q=user:chenglong
## 带上 profile:true 可以查看这次查询的计算方式
GET my_index/_search?q=chenglong&df=user
{
  "profile": "true"
}
## PhraseQuery
GET my_index/_search?q=comment:"you know"
## BooleanQuery
GET my_index/_search?q=comment:you know
## term query,要用()将其包裹
GET my_index/_search?q=comment:(you know)
## "comment:you comment:and comment:know"
GET my_index/_search?q=comment:(you and know)
## comment:you comment:not comment:know"
GET my_index/_search?q=comment:(you not know)
## "comment:you +comment:know"   %2B 就是 + 号
GET my_index/_search?q=comment:(you %2Bknow)
## 范围查询
GET my_index/_search?q=year>2020
## 通配符查询
GET my_index/_search?q=user:ch*
## 模糊匹配,可以匹配上 chenglong
GET my_index/_search?q=user:chengleng~1
## 可以查询出 you know for search
GET my_index/_search?q=comment:"you for"~2
Request Body Search
## 分页查询
GET my_index/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 20
}
## 按照指定字段排序
GET my_index/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {"_score": {"order": "desc"}}
  ]
}
## 只查询指定的字段
GET my_index/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["user"]
}
## matchQuery TermQuery
GET my_index/_search
{
  "query": {
    "match": {
      "user":"Chenglong"
    }
  }
}
## 指定查询方式
GET my_index/_search
{
  "query": {
    "match": {
      "user":{
        "query": "Chenglong",
        "operator": "and"
      }
    }
  }
}
## match_phrase 可以指定模糊几个单词,下面的查询可以查询出 you know for search
GET my_index/_search
{
  "query": {
    "match_phrase": {
      "comment":{
        "query": "you for",
        "slop": 1
      }
    }
  }
}
脚本字段
GET my_index/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "userName": {
      "script": {
        "lang": "painless",
        "source": "doc['user'].value + 's'"
      }
    }
  }
}
Mapping
有点类似数据库中的 schema 的定义。
- 简单类型
Text / Keyword
Date
Integer / Floating
Boolean
IPv4 & IPv6
- 复杂类型 - 对象和嵌套对象
对象类型 / 嵌套类型
- 特殊类型
geo_point & geo_shape / percolator
Dynamic Mapping
在写入文档的时候,如果索引不存在,会自动创建索引
## 查看 mapping
GET my_index/_mapping
如果字段已经存在,则不允许修改字段的类型,必须使用 Reindex API 进行重建
## 可以在创建 index 的时候指定 mappings 的额类型,默认为 true
PUT movies
{
  "mappings": {
    "_doc": {
      "dynamic": "true | false | strict"
    }
  }
}
自定义 Mapping
## 创建一个 index,其中 mobile 不进行索引
PUT movies
{
  "mappings": {
    "properties": {
      "firstName": {
        "type": "text"
      },
      "lastName": {
        "type": "text"
      },
      "mobile": {
        "type": "text",
        "index": false
      }
    }
  }
}
## 插入数据
PUT movies/_doc/1
{
  "firstName": "Liu",
  "lastName": "Chenglong",
  "mobile": "1234567890"
}
## 尝试查询会报错
## failed to create query: Cannot search on field [mobile] since it is not indexed.
POST /movies/_search
{
  "query": {
    "match": {
      "mobile": "123"
    }
  }
}
## null_value
PUT movies
{
  "mappings": {
    "properties": {
      "firstName": {
        "type": "text"
      },
      "lastName": {
        "type": "text"
      },
      "mobile": {
        "type": "keyword",
        "null_value": "NULL"
      }
    }
  }
}
PUT movies/_doc/1
{
  "firstName": "Liu",
  "lastName": "Chenglong",
  "mobile": null
}
PUT movies/_doc/2
{
  "firstName": "Liu",
  "lastName": "Chenglong2"
}
## 可以搜索到 mobile 是 null 的数据,但是搜索不到没有 mobile 的数据
POST /movies/_search
{
  "query": {
    "match": {
      "mobile": "NULL"
    }
  }
}
## copy to
PUT movies
{
  "mappings": {
    "properties": {
      "firstName": {
        "type": "text",
        "copy_to": "fullName"
      },
      "lastName": {
        "type": "text",
        "copy_to": "fullName"
      }
    }
  }
}
PUT movies/_doc/1
{
  "firstName": "Liu",
  "lastName": "Chenglong"
}
## 可以直接查询 fullName,虽然 movies 里面并没有这个字段
## _source 中并没有 fullName
POST movies/_search
{
  "query": {
    "match": {
      "fullName": "chenglong"
    }
  }
}
数组类型本身是 text,所以如果原来一个字段是 text,那么可以直接插入一个数组
PUT movies/_doc/1
{
  "firstName": "Liu",
  "lastName": "Chenglong"
}
PUT movies/_doc/3
{
  "firstName": "Liu",
  "lastName": ["Chenglong"]
}
多字段属性
- 实现名字精确查询匹配
增加一个 keyword 字段
- 使用不同的 analyzer
Exact Value(不需要进行分词处理)
包括 日期、数字、具体的一个字符串(Apple Store)
Full Text
es 中的 text
Character Filters
可以在 Tokenizer 之前对文本进行处理,例如增加删除、替换文本
## 可以去除文本中的 html 标签,可以处理网络爬虫爬出来的数据
GET _analyze
{
  "tokenizer": "keyword",
  "char_filter": [
    "html_strip"
  ],
  "text": "<b>hello world</b>"
}
## 替换文字
GET _analyze
{
  "tokenizer": "standard",
  "char_filter": [
    {
      "type": "mapping",
      "mappings": [
        "- => _"
      ]
    }
  ],
  "text": "hello-world"
}
## 按照路径进行分词
GET _analyze
{
  "tokenizer": "path_hierarchy",
  "text": "user/local/nginx/conf"
}
## 按照空格进行分词,并且去除一些副词进行过滤
## 这里只能查询出 You house
GET _analyze
{
  "tokenizer": "whitespace",
  "filter": ["stop"], 
  "text": "You are in the house."
}
## 添加一个 lowercase 的 filter,就可以将单词变成小写
GET _analyze
{
  "tokenizer": "whitespace",
  "filter": [
    "stop",
    "lowercase"
  ],
  "text": "You are in the house."
}
聚合搜索 Aggregation
Bucket 一些满足结果的文档集合
Metric 进行数学运算
Pipeline 对其他聚合结果进行二次聚合
Matrix 支持多个字段操作并提供一个结果矩阵
Bucket 有些像 SQL 中的 group
Metric 有些像 SQL 中的聚合函数
## 性别统计
GET kibana_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "flight_dest": {
      "terms": {
        "field": "customer_gender"
      }
    }
  }
}
## 查询结果
"buckets" : [
  {
    "key" : "FEMALE",
    "doc_count" : 2433
  },
  {
    "key" : "MALE",
    "doc_count" : 2242
  }
]
## 对分组结果继续进行分组
GET kibana_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "flight_dest": {
      "terms": {
        "field": "day_of_week"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "products.base_price"
          }
        }
      }
    }
  }
}
查询
Term 是表达语义的最小单位
## 添加几条数据
POST /product/_doc/1
{
  "productId":"XHDK-12-#f",
  "desc":"iPhone"
}
POST /product/_doc/2
{
  "productId":"BHDK-22-#f",
  "desc":"iPad"
}
POST /product/_doc/3
{
  "productId":"CHDK-32-#f",
  "desc":"MBP"
}
## 由于 term 不会对搜索进行处理,而插入的数据会被分词,iPhone => iphone
## 所以这里查询不到任何数据
POST /product/_search
{
  "query": {
    "term": {
      "desc": {
        "value": "iPhone"
        "value": "iphone" ## 这样才能查询出来
      }
    }
  }
}
## 这样也可以查询出来
POST /product/_search
{
  "query": {
    "term": {
      "desc.keyword": {
        "value": "iPhone"
      }
    }
  }
}
## 分词
POST /_analyze
{
  "analyzer": "standard",
  "text": ["iPhone"]
}
{
  "tokens" : [
    {
      "token" : "iphone",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}
## 将 Query 转换为 Filter,可以忽略算分的计算,避免不必要的开销
## Filter 可以有效的使用缓存,调高多次的查询效率
POST /product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "desc.keyword": "iPhone"
        }
      },
      "boost": 1.2
    }
  }
}
Match Query / Match Phrase Query / Query String Query
索引和搜索时会进行分词,查询时先分词然后再生成一个供查询的词项列表
POST movies/_search
{
  "query": {
    "match": {
      "name": "chenglong"
    }
  }
}
结构化搜索
日期、布尔类型、数字都是结构化的数据
可以用 Term、Prefix前缀查询
## 添加一些数据
POST /product/_bulk
{ "index":{"_id":1}}
{"price":10,"avaliable":true,"date":"2020-05-22","productId":"XXX-1","tag":"one"}
{ "index":{"_id":2}}
{"price":20,"avaliable":false,"date":"2019-05-22","productId":"XXX-2","tag":["one","two"]}
{ "index":{"_id":3}}
{"price":30,"avaliable":false,"productId":"XXX-3"}
## term 查询 boolean
POST /product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "avaliable": true
        }
      }
    }
  }
}
## range 查询 数字
POST /product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 10,
            "lte": 20
          }
        }
      }
    }
  }
}
## range 查询 日期
y 年
M 月
w 周
d 天
H/h 小时
m 分钟
s 秒
POST /product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "date": {
            "gte": "now-1y"
          }
        }
      }
    }
  }
}
## 通过 exists 查询字段存在的数据
POST /product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "exists": {
          "field": "date"
        }
      }
    }
  }
}
## term 对多字段查询是包含关系,而不是精确匹配
## 这样会查询出 one 和 one two 两条数据
POST /product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "tag.keyword": "one"
        }
      }
    }
  }
}
## 只想查询出 one
## 增加一个 tag_count 字段,再结合 bool query 进行查询
搜索的相关性算分
TF-IDF
BM25
在查询中添加 "explan": true 可以在结果中查询分数的计算方式
bool Query
must 必须匹配,贡献算分
should 选择性匹配,贡献算分
must_not 必须不匹配
filter 必须匹配,不贡献算分
bool 查询可以嵌套
通过修改嵌套结构,可以影响算分
## 可以通过 boost 修改得分
## 通过修改 tag 和 price 的字段得分,会影响最后查询出来结果的顺序
POST /product/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "tag": {
              "query": "one",
              "boost": 1
            }
          }
        },
        {
          "match": {
            "price": {
              "query": "30",
              "boost": 1
            }
          }
        }
      ]
    }
  }
}
## 使用 boosting 可以提升某个值的分数、降低某个值的分数
POST /product/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "tag": "one"
        }
      },
      "negative": {
         "match": {
          "tag": "two"
        }
      },
      "negative_boost": 0.2
    }
  }
}
单字符串多字段
POST /product/_bulk
{ "index":{"_id":1}}
{"title":"Quick brown rabbits","body":"Brown rabbits are commonly seen"}
{ "index":{"_id":2}}
{"title":"Keeping pets healthy","body":"My quick brown fox eats rabbits on a regular basis"}
POST /product/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "Brown fox"
          }
        },
        {
          "match": {
            "body": "Brown fox"
          }
        }
      ]
    }
  }
}
POST /product/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "Quick fox"
          }
        },
        {
          "match": {
            "body": "Quick fox"
          }
        }
      ]
    }
  }
}
## 如果查询出来有评分相同的,可以添加一个 tie_breaker 系数,让评分产生差异
## tie_breaker 是一个介于 0-1 之间的浮点数
## 0 表示使用最佳匹配
## 1 表示所有语句同等重要
POST /product/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "title": "Quick pets"
          }
        },
        {
          "match": {
            "body": "Quick pets"
          }
        }
      ],
      "tie_breaker": 0.7
    }
  }
}
multi_match 查询
//LCLTODO 整个还不是很理解
POST /product/_search
{
  "query": {
    "multi_match": {
      "query": "brown",
      "fields": ["title","body"]
    }
  }
}
中文分词器
hanlp
icu
ik
pingyin
Search Template
解耦
## 创建一个 search template
POST _scripts/queryProduct
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "multi_match": {
          "query": "{{q}}",
          "fields": [
            "title"
          ]
        }
      }
    }
  }
}
GET _scripts/queryProduct
## 使用 template 进行查询
POST product/_search/template
{
  "id":"queryProduct",
  "params": {
    "q":"pets"
  }
}
Funcation Score Query
可以在查询结束后,对每一个匹配的文档进行一系列的重新算分,根据新生成的分数进行排序
默认的几种排序方式:
- 
Weight 为每个文档设置一个简单而不规范化的权重 
- 
Field Value Factor 使用该数值修改 _score 
- 
Random Score 
- 
衰减函数 以某个字段的值作为标准,距离某个值越近,得分越高 
- 
Script Score 自定义脚本完全控制得分逻辑 
PUT shop/_doc/1
{
  "title": "Apple pie",
  "price": 8
}
PUT shop/_doc/2
{
  "title": "Orange pie",
  "price": 3
}
PUT shop/_doc/1
{
  "title": "Watermelon pie",
  "price": 6
}
POST /shop/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "e",
          "fields": "title"
        }
      },
      "field_value_factor": {
        "field": "price"
      }
    }
  }
}
 
                    
                     
                    
                 
                    
                
 
                
            
         
         浙公网安备 33010602011771号
浙公网安备 33010602011771号