跟小D每日学口语

Query DSL for elasticsearch Query

Query DSL

Query DSL (资料来自: http://www.elasticsearch.cn/guide/reference/query-dsl/)
http://elasticsearch.qiniudn.com/

--简介--

elasticsearch 提供基于JSON的完整的Query DSL查询表达式(DSL即领域专用语言). 一般来说, 普通的查询如 term 或者 prefix. 另外还有混合查询如 bool 等. 另外查询表达式(Queries)还能够关联特定的过滤表达式,如 filtered 或者 constant_score 查询.
你可以把Query DSL当作是一系列的抽象的查询表达式树( AST ). 特定查询能够包含其它的查询,(如 bool ), 有些查询能够包含过滤器(如 constant_score), 还有的可以同时包含查询和过滤器 (如 filtered). 都能够从ES支持查询集合里面选择任意一个查询或者是从过滤器集合里面挑选出任意一个过滤器, 这样的话,我们就可以构造出任意复杂(maybe 非常有趣)的查询了,是不是很灵活啊.
查询和过滤都可以被用于各种不同的API接口里面. 如 search query, 或者是 facet filter 等等. 本章会介绍构造AST能够用到的各种查询或者过滤器.


提示. 过滤器非常有用因为他们比简单的查询更快(不进行文档评分)并且会自动缓存.


过滤器和缓存(Filters and Caching)
过滤器是用来实现缓存的很好的办法. 因为缓存这些过滤结果并不需要太多的内存, 而且其它的查询可以重用这些过滤(注意是同样参数哦),所以速度是刷刷的.
某些过滤产生的结果是很易于缓存的,有关缓存与否的区别在于是否将过滤结果存放到缓存中,像如下过滤器如 term, terms, prefix, 和 range 默认就是会进行缓存的, 并且建议使用这些过滤条件而不使用同等效果的查询.
其它过滤器,一般会将字段数据加载到内存中来工作, 默认是不缓存结果的. 这些过滤操作的速度其实已经非常快了,如果将它们的结果缓存需要做额外的操作来使它们能够被其它查询使用,这些查询,包括地理位置的(geo), numeric_range, 和 script 默认是没有缓存结果的.
最后一个过滤器的类型是过滤器之间的组合, and, not 和 or ,这些过滤器是没有缓存结果的,因为它们主要是操作内联的过滤器,所以不需要过滤.

所有的过滤器都允许设置 _cache 元素来显式的控制缓存与否. 并且允许设置一个 _cache_key 用来当作缓存的主键. 这个在过滤大集合的情况下非常有用 (如包含很多元素的 terms filter).

--Text Query--

text 类型的查询, 可以用于处理各种文本. 例如:
{
    "text" : {
        "message" : "this is a test"
    }
}
注意, 虽然他的名字叫text, 但可以用它来精确匹配 (类似于 term) 数字和日期.
其中, message 是字段的名称, 你可以用你实际使用的字段名来替换 (包括 _all).

Text Queries的类型
boolean
默认的 text 查询是 boolean 型的. 意思就是说提供的文本会被分析构建为一个布尔型查询. operator 标志可以使用 or 或者 and 来组合布尔子句 (默认为 or).
analyzer 用于设定在分析过程中哪一个分析器会用于处理这段文本. 它会使用mapping中定义的分析器, 如果没有定义则会使用索引的默认分析器.
fuzziness can be set to a value (depending on the relevant type, for string types it should be a value between 0.0 and 1.0) to constructs fuzzy queries for each term analyzed. The prefix_length and max_expansions can be set in this case to control the fuzzy process.
下面这个例子使用了额外的参数 (注意例子中的结构变化, message 是字段的名称):

{
    "text" : {
        "message" : {
            "query" : "this is a test",
            "operator" : "and"
        }
    }
}
phrase

text_phrase 查询会分析文本并且创建一个 phrase 查询. 例如:
{
    "text_phrase" : {
        "message" : "this is a test"
    }
}
既然 text_phrase 只是 text 查询的一个 种类 , 你也可以使用下面的方式:
{
    "text" : {
        "message" : {
            "query" : "this is a test",
            "type" : "phrase"
        }
    }
}
A phrase query maintains order of the terms up to a configurable slop (which defaults to 0).
The analyzer can be set to control which analyzer will perform the analysis process on the text. It default to the field explicit mapping definition, or the default search analyzer, for example:
{
    "text_phrase" : {
        "message" : {
            "query" : "this is a test",
            "analyzer" : "my_analyzer"
        }
    }
}
text_phrase_prefix

The text_phrase_prefix is the same as text_phrase, expect it allows for prefix matches on the last term in the text. For example:

{
    "text_phrase_prefix" : {
        "message" : "this is a test"
    }
}
Or:

{
    "text" : {
        "message" : {
            "query" : "this is a test",
            "type" : "phrase_prefix"
        }
    }
}
It accepts the same parameters as the phrase type. In addition, it also accepts a max_expansions parameter that can control to how many prefixes the last term will be expanded. It is highly recommended to set it to an acceptable value to control the execution time of the query. For example:
{
    "text_phrase_prefix" : {
        "message" : {
            "query" : "this is a test",
            "max_expansions" : 10
        }
    }
}
Comparison to query_string / field

The text family of queries does not go through a “query parsing” process. It does not support field name prefixes, wildcard characters, or other “advance” features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does). Also, the phrase_prefix can provide a great “as you type” behavior to automatically load search results.

--Bool Query--

一个由其他类型查询组合而成的文档匹配查询, 对应Lucene的 BooleanQuery. 它可以由一个或者多个查询语句构成, 每种语句都有它们的匹配条件. 可能的匹配条件如下:

Occur Description
must 匹配的文档必须满足该查询语句.
should 匹配的文档可以满足该查询语句. 如果一个布尔查询(Bool Query)不包含 must 查询语句, 那么匹配的文档必须满足其中一个或多个 should 查询语句, 可以使用 minimum_number_should_match 参数来设定最低满足的数量.
must_not 匹配的文档必须不满足该查询语句. 注意, 不能只用一个 must_not 查询语句来搜索文档.
布尔查询(Bool Query)也支持 disable_coord 参数 (默认为 false).

{
    "bool" : {
        "must" : {
            "term" : { "user" : "kimchy" }
        },
        "must_not" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        },
        "should" : [
            {
                "term" : { "tag" : "wow" }
            },
            {
                "term" : { "tag" : "elasticsearch" }
            }
        ],
        "minimum_number_should_match" : 1,
        "boost" : 1.0
    }
}

--Boosting Query--

The boosting query can be used to effectively demote results that match a given query. Unlike the “NOT” clause in bool query, this still selects documents that contain undesirable terms, but reduces their overall score.

{
    "boosting" : {
        "positive" : {
            "term" : {
                "field1" : "value1"
            }
        },
        "negative" : {
            "term" : {
                "field2" : "value2"
            }
        },
        "negative_boost" : 0.2
    }
}

--Ids Query--

Filters documents that only have the provided ids. Note, this filter does not require the _id field to be indexed since it works using the _uid field.
{
    "ids" : {
        "type" : "my_type"
        "values" : ["1", "4", "100"]
    }
}    
The type is optional and can be omitted, and can also accept an array of values.

--Custom Score Query--

custom_score 查询可以包含其他种类的查询并且自定义评分标准, 可以使用 脚本表达式 来根据文档查询结果中(数值型)的值计算评分, 下面是一个简单的例子:
"custom_score" : {
    "query" : {
        ....
    },
    "script" : "_score * doc['my_numeric_field'].value"
}
除了使用文档结果字段和脚本表达式外, 还可以使用 _score 参数来获取其所含查询的评分.


脚本参数
脚本会被缓存下来用以加快执行速度. 如果脚本中有参数需要代入使用的话, 推荐的方法是使用同一个脚本,然后传入参数:


"custom_score" : {
    "query" : {
        ....
    },
    "params" : {
        "param1" : 2,
        "param2" : 3.1
    }
    "script" : "_score * doc['my_numeric_field'].value / pow(param1, param2)"
}

--Constant Score Query--

A query that wraps a filter or another query and simply returns a constant score equal to the query boost for every document in the filter. Maps to Lucene ConstantScoreQuery.
{
    "constant_score" : {
        "filter" : {
            "term" : { "user" : "kimchy"}
        },
        "boost" : 1.2
    }
}
The filter object can hold only filter elements, not queries. Filters can be much faster compared to queries since they don’t perform any scoring, especially when they are cached.

A query can also be wrapped in a constant_score query:
{
    "constant_score" : {
        "query" : {
            "term" : { "user" : "kimchy"}
        },
        "boost" : 1.2
    }
}

--Dis Max Query--

A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.

This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as Boolean Query would give). If the query is “albino elephant” this ensures that “albino” matching one field and “elephant” matching another gets a higher score than “albino” matching both fields. To get this result, use both Boolean Query and DisjunctionMax Query: for each term a DisjunctionMaxQuery searches for it in each field, while the set of these DisjunctionMaxQuery’s is combined into a BooleanQuery.

The tie breaker capability allows results that include the same term in multiple fields to be judged better than results that include this term in only the best of those multiple fields, without confusing this with the better case of two different terms in the multiple fields.The default tie_breaker is 0.0.

This query maps to Lucene DisjunctionMaxQuery.
{
    "dis_max" : {
        "tie_breaker" : 0.7,
        "boost" : 1.2,
        "queries" : [
            {
                "term" : { "age" : 34 }
            },
            {
                "term" : { "age" : 35 }
            }
        ]
    }
}    

--Field Query--

A query that executes a query string against a specific field. It is a simplified version of query_string query (by setting the default_field to the field this query executed against). In its simplest form:
{
    "field" : { 
        "name.first" : "+something -else"
    }
}
Most of the query_string parameters are allowed with the field query as well, in such a case, the query should be formatted as follows:
{
    "field" : { 
        "name.first" : {
            "query" : "+something -else",
            "boost" : 2.0,
            "enable_position_increments": false
        }
    }
}

--Filtered Query--

对应于Lucene里面的 FilteredQuery ,可以在一个查询的结果上应用一个过滤操作.
{
    "filtered" : {
        "query" : {
            "term" : { "tag" : "wow" }
        },
        "filter" : {
            "range" : {
                "age" : { "from" : 10, "to" : 20 }
            }
        }
    }
}
该DSL里面的 filter 对象只能使用 filter 元素, 而不能是query类型. 过滤(Filters) 要比查询快很多,因为和查询相比它们不需要执行打分过程, 尤其是当设置缓存过滤结果之后.

--Fuzzy Like Query--

Fuzzy like this query find documents that are “like” provided text by running it against one or more fields.
{
    "fuzzy_like_this" : {
        "fields" : ["name.first", "name.last"],
        "like_text" : "text like this one",
        "max_query_terms" : 12
    }
}
fuzzy_like_this can be shortened to flt.

The fuzzy_like_this top level parameters include:

Parameter Description
fields A list of the fields to run the more like this query against. Defaults to the _all field.
like_text The text to find documents like it, required.
ignore_tf Should term frequency be ignored. Defaults to false.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
min_similarity The minimum similarity of the term variants. Defaults to 0.5.
prefix_length Length of required common prefix on variant terms. Defaults to 0.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.
How it Works
Fuzzifies ALL terms provided as strings and then picks the best n differentiating terms. In effect this mixes the behaviour of FuzzyQuery and MoreLikeThis but with special consideration of fuzzy scoring factors. This generally produces good results for queries where users may provide details in a number offields and have no knowledge of boolean query syntax and also want a degree of fuzzy matching and a fast query.

For each source term the fuzzy variants are held in a BooleanQuery with no coord factor (because we are not looking for matches on multiple variants in any one doc). Additionally, a specialized TermQuery is used for variants and does not use that variant term’s IDF because this would favour rarer terms eg misspellings. Instead, all variants use the same IDF ranking (the one for the source query term) and this is factored into the variant’s boost. If the source query term does not exist in the index the average IDF of the variants is used.

--Fuzzy Like Field Query--

The fuzzy_like_this_field query is the same as the fuzzy_like_this query, except that it runs against a single field. It provides nicer query DSL over the generic fuzzy_like_this query, and support typed fields query (automatically wraps typed fields with type filter to match only on the specific type).
{
    "fuzzy_like_this_field" : {
        "name.first" : {
            "like_text" : "text like this one",
            "max_query_terms" : 12
        }
    }
}
fuzzy_like_this_field can be shortened to flt_field.

The fuzzy_like_this_field top level parameters include:

Parameter Description
like_text The text to find documents like it, required.
ignore_tf Should term frequency be ignored. Defaults to false.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
min_similarity The minimum similarity of the term variants. Defaults to 0.5.
prefix_length Length of required common prefix on variant terms. Defaults to 0.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

--Fuzzy Query--

A fuzzy based query that uses similarity based on Levenshtein (edit distance) algorithm.

Warning: this query is not very scalable with its default prefix length of 0 – in this case, every term will be enumerated and cause an edit score calculation or max_expansions is not set.

Here is a simple example:
{
    "fuzzy" : { "user" : "ki" }
}
More complex settings can be set (the values here are the default values):
    {
        "fuzzy" : { 
            "user" : {
                "value" : "ki",
                "boost" : 1.0,
                "min_similarity" : 0.5,
                "prefix_length" : 0
            }
        }
    }
The max_expansions parameter (unbounded by default) controls the number of terms the fuzzy query will expand to.

Numeric / Date Fuzzy

fuzzy query on a numeric field will result in a range query “around” the value using the min_similarity value. For example:
{
    "fuzzy" : {
        "price" : {
            "value" : 12,
            "min_similarity" : 2
        }
    }
}
Will result in a range query between 10 and 14. Same applies to dates, with support for time format for the min_similarity field:
{
    "fuzzy" : {
        "created" : {
            "value" : "2010-02-05T12:05:07",
            "min_similarity" : "1d"
        }
    }
}
In the mapping, numeric and date types now allow to configure a fuzzy_factor mapping value (defaults to 1), which will be used to multiply the fuzzy value by it when used in a query_string type query. For example, for dates, a fuzzy factor of “1d” will result in multiplying whatever fuzzy value provided in the min_similarity by it. Note, this is explicitly supported since query_string query only allowed for similarity valued between 0.0 and 1.0.

--Has Child Query--

has_child 查询仅仅是将一个 has_child 过滤器包含进了一个 constant_score 中. 它的语法跟 has_child filter 是一样的:
{
    "has_child" : {
        "type" : "blog_tag"
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
    }
}    
Scope

A _scope can be defined on the filter allowing to run facets on the same scope name that will work against the child documents. For example:
{
    "has_child" : {
        "_scope" : "my_scope",
        "type" : "blog_tag"
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
    }
}    
内存考量
目前的实现方式是, 所有 _id 的值都会被载入了内存(堆)以便于更快的查找, 所以请确认有足够的内存来存放它们.

--Match All Query--

A query that matches all documents. Maps to Lucene MatchAllDocsQuery.
{
    "match_all" : { }
}
Which can also have boost associated with it:

{
    "match_all" : { "boost" : 1.2 }
}
Index Time Boost

When indexing, a boost value can either be associated on the document level, or per field. The match all query does not take boosting into account by default. In order to take boosting into account, the norms_field needs to be provided in order to explicitly specify which field the boosting will be done on (Note, this will result in slower execution time). For example:
{
    "match_all" : { "norms_field" : "my_field" }
}

--More Like This Query--

More like this query find documents that are “like” provided text by running it against one or more fields.
{
    "more_like_this" : {
        "fields" : ["name.first", "name.last"],
        "like_text" : "text like this one",
        "min_term_freq" : 1,
        "max_query_terms" : 12
    }
}
more_like_this can be shortened to mlt.

The more_like_this top level parameters include:

Parameter Description
fields A list of the fields to run the more like this query against. Defaults to the _all field.
like_text The text to find documents like it, required.
percent_terms_to_match The percentage of terms to match on (float value). Defaults to 0.3 (30 percent).
min_term_freq The frequency below which terms will be ignored in the source doc. The default frequency is 2.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
stop_words An array of stop words. Any word in this set is considered “uninteresting” and ignored. Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as for the purposes of document similarity it seems reasonable to assume that “a stop word is never interesting”.
min_doc_freq The frequency at which words will be ignored which do not occur in at least this many docs. Defaults to 5.
max_doc_freq The maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. Defaults to unbounded.
min_word_len The minimum word length below which words will be ignored. Defaults to 0.
max_word_len The maximum word length above which words will be ignored. Defaults to unbounded (0).
boost_terms Sets the boost factor to use when boosting terms. Defaults to 1.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

--More Like This Field Query--

The more_like_this_field query is the same as the more_like_this query, except it runs against a single field. It provides nicer query DSL over the generic more_like_this query, and support typed fields query (automatically wraps typed fields with type filter to match only on the specific type).
{
    "more_like_this_field" : {
        "name.first" : {
            "like_text" : "text like this one",
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }
}
more_like_this_field can be shortened to mlt_field.

The more_like_this_field top level parameters include:

Parameter Description
like_text The text to find documents like it, required.
percent_terms_to_match The percentage of terms to match on (float value). Defaults to 0.3 (30 percent).
min_term_freq The frequency below which terms will be ignored in the source doc. The default frequency is 2.
max_query_terms The maximum number of query terms that will be included in any generated query. Defaults to 25.
stop_words An array of stop words. Any word in this set is considered “uninteresting” and ignored. Even if your Analyzer allows stopwords, you might want to tell the MoreLikeThis code to ignore them, as for the purposes of document similarity it seems reasonable to assume that “a stop word is never interesting”.
min_doc_freq The frequency at which words will be ignored which do not occur in at least this many docs. Defaults to 5.
max_doc_freq The maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. Defaults to unbounded.
min_word_len The minimum word length below which words will be ignored. Defaults to 0.
max_word_len The maximum word length above which words will be ignored. Defaults to unbounded (0).
boost_terms Sets the boost factor to use when boosting terms. Defaults to 1.
boost Sets the boost value of the query. Defaults to 1.0.
analyzer The analyzer that will be used to analyze the text. Defaults to the analyzer associated with the field.

--Prefix Query 前缀--

Matches documents that have fields containing terms with a specified prefix (not analyzed). The prefix query maps to Lucene PrefixQuery. The following matches documents where the user field contains a term that starts with ki:
{
    "prefix" : { "user" : "ki" }
}
A boost can also be associated with the query:
{
    "prefix" : { "user" :  { "value" : "ki", "boost" : 2.0 } }
}
Or :
{
    "prefix" : { "user" :  { "prefix" : "ki", "boost" : 2.0 } }
}
This multi term query allows to control how it gets rewritten using the rewrite parameter.

--Query String Query--

A query that uses a query parser in order to parse its content. Here is an example:
{
    "query_string" : {
        "default_field" : "content",
        "query" : "this AND that OR thus"
    }
}
The query_string top level parameters include:

Parameter Description
query The actual query to be parsed.
default_field The default field for query terms if no prefix field is specified. Defaults to the _all field.
default_operator The default operator used if no explicit operator is specified. For example, with a default operator of OR, the query capital of Hungary is translated to capital OR of OR Hungary, and with default operator of AND, the same query is translated to capital AND of AND Hungary. The default value is OR.
analyzer The analyzer name used to analyze the query string.
allow_leading_wildcard When set, * or ? are allowed as the first character. Defaults to true.
lowercase_expanded_terms Whether terms of wildcard, prefix, fuzzy, and range queries are to be automatically lower-cased or not (since they are not analyzed). Default it true.
enable_position_increments Set to true to enable position increments in result queries. Defaults to true.
fuzzy_prefix_length Set the prefix length for fuzzy queries. Default is 0.
fuzzy_min_sim Set the minimum similarity for fuzzy queries. Defaults to 0.5
phrase_slop Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is 0.
boost Sets the boost value of the query. Defaults to 1.0.
analyze_wildcard By default, wildcards terms in a query string are not analyzed. By setting this value to true, a best effort will be made to analyze those as well.
auto_generate_phrase_queries Default to false.
minimum_should_match A percent value (for example 20%) controlling how many “should” clauses in the resulting boolean query should match.
When a multi term query is being generated, one can control how it gets rewritten using the rewrite parameter.

Multi Field
The query_string query can also run against multiple fields. The idea of running the query_string query against multiple fields is by internally creating several queries for the same query string, each with default_field that match the fields provided. Since several queries are generated, combining them can be automatically done either using a dis_max query or a simple bool query. For example (the name is boosted by 5 using ^5 notation):
{
    "query_string" : {
        "fields" : ["content", "name^5"],
        "query" : "this AND that OR thus",
        "use_dis_max" : true
    }
}
When running the query_string query against multiple fields, the following additional parameters are allowed:

Parameter Description
use_dis_max Should the queries be combined using dis_max (set it to true), or a bool query (set it to false). Defaults to true.
tie_breaker When using dis_max, the disjunction max tie breaker. Defaults to 0.
The fields parameter can also include pattern based field names, allowing to automatically expand to the relevant fields (dynamically introduced fields included). For example:
{
    "query_string" : {
        "fields" : ["content", "name.*^5"],
        "query" : "this AND that OR thus",
        "use_dis_max" : true
    }
}
Syntax Extension
There are several syntax extensions to the Lucene query language.

missing / exists

The _exists_ and _missing_ syntax allows to control docs that have fields that exists within them (have a value) and missing. The syntax is: _exists_:field1, _missing_:field and can be used anywhere a query string is used.

--Range Query--

Matches documents with fields that have terms within a certain range. The type of the Lucene query depends on the field type, for string fields, the TermRangeQuery, while for number/date fields, the query is a NumericRangeQuery. The following example returns all documents where age is between 10 and 20:
{
    "range" : {
        "age" : { 
            "from" : 10, 
            "to" : 20, 
            "include_lower" : true, 
            "include_upper": false, 
            "boost" : 2.0
        }
    }
}
The range query top level parameters include:

Name Description
from The lower bound. Defaults to start from the first.
to The upper bound. Defaults to unbounded.
include_lower Should the first from (if set) be inclusive or not. Defaults to true
include_upper Should the last to (if set) be inclusive or not. Defaults to true.
gt Same as setting from to the value, and include_lower to false.
gte Same as setting from to the value,and include_lower to true.
lt Same as setting to to the value, and include_upper to false.
lte Same as setting to to the value, and include_upper to true.
boost Sets the boost value of the query. Defaults to 1.0.

--Span First Query--

Matches spans near the beginning of a field. The span first query maps to Lucene SpanFirstQuery. Here is an example:
{
    "span_first" : {
        "match" : {
            "span_term" : { "user" : "kimchy" }
        },
        "end" : 3
    }
}    
The match clause can be any other span type query. The end controls the maximum end position permitted in a match.

--Span Near Query--

Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order. The span near query maps to Lucene SpanNearQuery. Here is an example:
{
    "span_near" : {
        "clauses" : [
            { "span_term" : { "field" : "value1" } },
            { "span_term" : { "field" : "value2" } },
            { "span_term" : { "field" : "value3" } }
        ],
        "slop" : 12,
        "in_order" : false,
        "collect_payloads" : false
    }
}
The clauses element is a list of one or more other span type queries and the slop controls the maximum number of intervening unmatched positions permitted.

--Span Not Query--

Removes matches which overlap with another span query. The span not query maps to Lucene SpanNotQuery. Here is an example:
{
    "span_not" : {
        "include" : {
            "span_term" : { "field1" : "value1" }
        },
        "exclude" : {
            "span_term" : { "field2" : "value2" }
        }
    }
}
The include and exclude clauses can be any span type query. The include clause is the span query whose matches are filtered, and the exclude clause is the span query whose matches must not overlap those returned.

--Span or Query--

Matches the union of its span clauses. The span or query maps to Lucene SpanOrQuery. Here is an example:
{
    "span_or" : {
        "clauses" : [
            { "span_term" : { "field" : "value1" } },
            { "span_term" : { "field" : "value2" } },
            { "span_term" : { "field" : "value3" } }
        ]
    }
}
The clauses element is a list of one or more other span type queries.

--Span term Query--

Matches spans containing a term. The span term query maps to Lucene SpanTermQuery. Here is an example:
{
    "span_term" : { "user" : "kimchy" }
}    
A boost can also be associated with the query:
{
    "span_term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } }
}    
Or :
{
    "span_term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } }
}

--Top Children Query--

The top_children query runs the child query with an estimated hits size, and out of the hit docs, aggregates it into parent docs. If there aren’t enough parent docs matching the requested from/size search request, then it is run again with a wider (more hits) search.

The top_children also provide scoring capabilities, with the ability to specify max, sum or avg as the score type.

One downside of using the top_children is that if there are more child docs matching the required hits when executing the child query, then the total_hits result of the search response will be incorrect.

How many hits are asked for in the first child query run is controlled using the factor parameter (defaults to 5). For example, when asking for 10 docs with from 0, then the child query will execute with 50 hits expected. If not enough parents are found (in our example, 10), and there are still more child docs to query, then the search hits are expanded my multiplying by the incremental_factor (defaults to 2).

The required parameters are the query and type (the child type to execute the query on). Here is an example with all different parameters, including the default values:
{
    "top_children" : {
        "type": "blog_tag",
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
        "score" : "max",
        "factor" : 5,
        "incremental_factor" : 2
    }
}
Scope
A _scope can be defined on the query allowing to run facets on the same scope name that will work against the child documents. For example:
{
    "top_children" : {
        "_scope" : "my_scope",
        "type": "blog_tag",
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
    }
}
Memory Considerations
With the current implementation, all _id values are loaded to memory (heap) in order to support fast lookups, so make sure there is enough mem for it.

--Wildcard Query 通配符--

Matches documents that have fields matching a wildcard expression (not analyzed). Supported wildcards are *, which matches any character sequence (including the empty one), and ?, which matches any single character. Note this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow wildcard queries, a wildcard term should not start with one of the wildcards * or ?. The wildcard query maps to Lucene WildcardQuery.
{
    "wildcard" : { "user" : "ki*y" }
}
A boost can also be associated with the query:
{
    "wildcard" : { "user" : { "value" : "ki*y", "boost" : 2.0 } }
}    
Or :
{
    "wildcard" : { "user" : { "wildcard" : "ki*y", "boost" : 2.0 } }
}    
This multi term query allows to control how it gets rewritten using the rewrite parameter.

--Nested Query 嵌套--

Nested query allows to query nested objects / docs (see nested mapping). The query is executed against the nested objects / docs as if they were indexed as separate docs (they are, internally) and resulting in the root parent doc (or parent nested mapping). Here is a sample mapping we will work with:
{
    "type1" : {
        "properties" : {
            "obj1" : {
                "type" : "nested"
            }
        }
    }
}
And here is a sample nested query usage:
{
    "nested" : {
        "path" : "obj1",
        "score_mode" : "avg",
        "query" : {
            "bool" : {
                "must" : [
                    {
                        "text" : {"obj1.name" : "blue"}
                    },
                    {
                        "range" : {"obj1.count" : {"gt" : 5}}
                    }
                ]
            }
        }
    }
}
The query path points to the nested object path, and the query (or filter) includes the query that will run on the nested docs matching the direct path, and joining with the root parent docs.

The score_mode allows to set how inner children matching affects scoring of parent. It defaults to avg, but can be total, max and none.

Multi level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level (and not root) if it exists within another nested query.

--Custom Filtered Score Query--

A custom_filters_score query allows to execute a query, and if the hit matches a provided filter (ordered), use either a boost or a script associated with it to compute the score. Here is an example:
{
    "custom_filters_score" : {
        "query" : {
            "match_all" : {}
        },
        "filters" : [
            {
                "filter" : { "range" : { "age" : {"from" : 0, "to" : 10} } },
                "boost" : "3"
            },
            {
                "filter" : { "range" : { "age" : {"from" : 10, "to" : 20} } },
                "boost" : "2"
            }
        ]
    }
}
This can considerably simplify and increase performance for parameterized based scoring since filters are easily cached for faster performance, and boosting / script is considerably simpler.

Score Mode

A score_mode can be defined to control how multiple matching filters control the score. By default, it is set to first which means the first matching filter will control the score of the result. It can also be set to max/total/avg which will aggregate the result from all matching filters based on the aggregation type.

Script

A script can be used instead of boost for more complex score calculations. With optional params and lang (on the same level as query and filters).

--官网地址--

from: http://www.elasticsearch.cn/guide/reference/query-dsl/)

other: http://elasticsearch.qiniudn.com/

posted @ 2015-01-08 17:50  Danny Chen  阅读(8803)  评论(0编辑  收藏  举报