Elasticsearch Tutorial

Elasticsearch Tutorial

Concepts

Mapping concepts across SQL and Elasticsearch

While SQL and Elasticsearch have different terms for the way the data is organized, essentially their purpose is the same.

SQL ElasticSearch Description
column field In both cases, at the lowest level, data is stored in named entries, of a variety of data types, containing one value.
row document Columns and fields do not exist by themselves; they are part of a row or a document.
table index The target against which queries, whether in SQL or Elasticsearch get executed against.
database cluster In SQL, catalog or database are used interchangeably and represent a set of schemas that is, a number of tables. In Elasticsearch the set of indices available are grouped in a cluster.

Field Data Type

Common types

type description
binary Binary value encoded as a Base64 string.
boolean true and false values.
Keywords The keyword family, including keyword, constant_keyword, and wildcard.
Numbers Numeric types, such as long and double, used to express amounts.
Dates Date types, including date and date_nanos.
Text A field to index full-text values.

Mapping

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.

Each document is a collection of fields, which each have their own data type. When mapping your data, you create a mapping definition, which contains a list of fields that are pertinent to the document.

Dynamic mapping

Dynamic mapping allows you to experiment with and explore data when you’re just getting started. Elasticsearch adds new fields automatically, just by indexing a document.

Explicit mapping

Explicit mapping allows you to precisely choose how to define the mapping definition. For example,

{
  "mappings": {
    "properties": {
      "uuid": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      },
      "main_body": {
        "type": "text",
        "index": "false"
      }
    }
  }
}

The index type "keyword" indicates this field should be searched by term query, which means do not be analyzed.

The index type "text" indicates this field should be searched by match query, and it is going to be analyzed.

The "index:false" specify this field should not be indexed, meanwhile, this field could not be searched.

Query and filter contextedit

Relevance scoresedit

By default, Elasticsearch sorts matching search results by relevance score, which measures how well each document matches a query.

Query context

In the query context, a query clause answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a relevance score in the _score metadata field.

Filter context

In a filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated.

Query DSL

Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries.

Leaf query clauses

query type description
match Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching.
term Returns documents that contain an exact term in a provided field.
range Returns documents that contain terms within a provided range.

Compound query clauses

query type description
bool A query that matches documents matching boolean combinations of other queries. It is built using one or more boolean clauses, each clause with a typed occurrence.
dis_max Returns documents matching one or more wrapped queries, called query clauses or clauses. If a returned document matches multiple query clauses, the dis_max query assigns the document the highest relevance score from any matching clause, plus a tie breaking increment for any additional matching subqueries.
constant_score Wraps a filter query and returns every matching document with a relevance score equal to the boost parameter value.

Allow expensive queries

query type description
script queries Filters documents based on a provided script. The script query is typically used in a filter context.
fuzzy queries Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
regexp queries Returns documents that contain terms matching a regular expression.
prefix queries Returns documents that contain a specific prefix in a provided field.
wildcard queries Returns documents that contain terms matching a wildcard pattern. A wildcard operator is a placeholder that matches one or more characters.
range queries Returns documents that contain terms within a provided range.
Joining queries Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive.
Geo-shape query Filter documents indexed using the geo_shape or geo_point type.
Script score query Uses a script to provide a custom score for returned documents. The script_score query is useful if, for example, a scoring function is expensive and you only need to calculate the score of a filtered set of documents.
Percolate query The percolate query can be used to match queries stored in an index. The percolate query itself contains the document that will be used as query to match with the stored queries.

Python3 ElasticSearch in Action

Index

create

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    body = {
      "mappings": {
        "properties": {
          "uuid": {
            "type": "keyword"
          },
          "title": {
            "type": "text"
          },
          "main_body": {
            "type": "text"
          }
        }
      }
    }
    ret = es.indices.create(index="forward", body=body)
    pprint(ret)


if __name__ == '__main__':
    main()
    

delete

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    ret = es.indices.delete(index="forward")
    pprint(ret)


if __name__ == '__main__':
    main()
    

update

Update mapping API

Adds new fields to an existing data stream or index. You can also use this API to change the search settings of existing fields.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    body = {
      "properties": {
        "uuid": {
          "type": "keyword"
        },
        "title": {
          "type": "text"
        },
        "main_body": {
          "type": "text"
        },
        "publish_date": {
          "type": "keyword"
        }
      }
    }
    ret = es.indices.put_mapping(index=args.name, body=body)
    pprint(ret)


if __name__ == '__main__':
    main()
    
Reindex API

Copies documents from a source to a destination.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    body = {
      "source": {
        "index": "forward"
      },
      "dest": {
        "index": "document"
      }
    }
    ret = es.reindex(body=body)
    pprint(ret)


if __name__ == '__main__':
    main()

get

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    ret = es.indices.get(index="forward")
    pprint(ret)


if __name__ == '__main__':
    main()

Document

create

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    body = {
      "uuid": "1000",
      "title": "中国银行在港交所上市挂牌成功",
      "main_body": "中国银行在港交所上市挂牌成功,成为中国大陆首家在国际市场上市的银行。"
    }
    es = Elasticsearch()
    ret = es.index(index="forward", body=body)
    pprint(ret)


if __name__ == '__main__':
    main()
    

delete

# encoding=utf-8

import argparse
from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    ret = es.delete(index="forward", id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

update

To fully replace an existing document, use the index API, which is designed to creates or updates a document in an index.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
		body = {
      "uuid": "1000",
      "title": "<<中国银行在港交所上市挂牌成功>>",
      "main_body": "<<成为中国大陆首家在国际市场上市的银行>>"
    }
    ret = es.index(index="forward", body=body, id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

Updates a document with a script or partial document.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
		body = {
      "uuid": "1000",
      "title": "<<中国银行在港交所上市挂牌成功>>",
      "main_body": "<<成为中国大陆首家在国际市场上市的银行>>"
    }
    ret = es.index(index="forward", body=body, id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

Updates a document using the specified script.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
		body = {
      "script" : {
        "source": "ctx._source.counter += params.count",
        "lang": "painless",
        "params" : {
          "count" : 4
        }
      }
    }
    ret = es.update(index="forward", body=body, id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

get

Returns a document.

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    ret = es.get(index="forward", id="WRemuHkBd6vf16HuHzHq")
    pprint(ret)


if __name__ == '__main__':
    main()

match_phrase query,可以实现基于字的中文布尔检索,实现中文精准匹配、中文精准查询。

# encoding=utf-8

from elasticsearch import Elasticsearch
from pprint import pprint


def main():
    args = parse_args()
    es = Elasticsearch()
    body = {
      "query": {
        "match_phrase": {
          "title": "中国石油"
        },
        "match_phrase": {
          "main_body": "中国石油"
        }
      }
    }
    ret = es.search(body=body, index="forward")
    pprint(ret)


if __name__ == '__main__':
    main()

Multi-match query, The multi_match query builds on the match query to allow multi-field queries.

{
  "query": {
    "multi_match" : {
      "query":    "中国石油",
      "fields": [ "title", "main_body" ]
    }
  }
}

Allows to highlight search results on one or more fields.

{
    "query" : {
        "match": { "title": "中国石油" }
    },
    "highlight" : {
        "pre_tags" : ["<tag1>"],
        "post_tags" : ["</tag1>"],
        "fields" : {
            "_all" : {}
        }
    }
}
posted @ 2021-05-30 12:37  健康平安快乐  阅读(83)  评论(0编辑  收藏  举报