search_after 深度分页 Scrolling is not intended for real time user requests no longer recommend using the scroll API for deep pagination point in time PIT preserve the current index state
https://www.elastic.co/docs/reference/elasticsearch/rest-apis/paginate-search-results
By default, searches return the top 10 matching hits. To page through a larger set of results, you can use the search API's from and size parameters. The from parameter defines the number of hits to skip, defaulting to 0. The size parameter is the maximum number of hits to return. Together, these two parameters define a page of results.
GET /_search { "from": 5, "size": 20, "query": { "match": { "user.id": "kimchy" } } }
Avoid using from and size to page too deeply or request too many results at once. Search requests usually span multiple shards. Each shard must load its requested hits and the hits for any previous pages into memory. For deep pages or large sets of results, these operations can significantly increase memory and CPU usage, resulting in degraded performance or node failures.
By default, you cannot use from and size to page through more than 10,000 hits. This limit is a safeguard set by the index.max_result_window index setting. If you need to page through more than 10,000 hits, use the search_after parameter instead.
request
GET twitter/_search { "query": { "match": { "title": "elasticsearch" } }, "sort": [ {"date": "asc"}, {"tie_breaker_id": "asc"} ] }
response
{ "took" : 17, "timed_out" : false, "_shards" : ..., "hits" : { "total" : ..., "max_score" : null, "hits" : [ ... { "_index" : "twitter", "_id" : "654322", "_score" : null, "_source" : ..., "sort" : [ 1463538855, "654322" ] }, { "_index" : "twitter", "_id" : "654323", "_score" : null, "_source" : ..., "sort" : [ 1463538857, "654323" ] } ] } }
search after
GET twitter/_search { "query": { "match": { "title": "elasticsearch" } }, "search_after": [1463538857, "654323"], "sort": [ {"date": "asc"}, {"tie_breaker_id": "asc"} ] }
Repeat this process by updating the search_after array every time you retrieve a new page of results. If a refresh occurs between these requests, the order of your results may change, causing inconsistent results across pages. To prevent this, you can create a point in time (PIT) to preserve the current index state over your searches.
request
POST /my-index-000001/_pit?keep_alive=1m
{ "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "_shards": ... }
request
GET /_search { "size": 10000, "query": { "match" : { "user.id" : "elkbee" } }, "pit": { "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" }, "sort": [ {"@timestamp": {"order": "asc", "format": "strict_date_optional_time_nanos", "numeric_type" : "date_nanos" }} ] }
We no longer recommend using the scroll API for deep pagination. If you need to preserve the index state while paging through more than 10,000 hits, use the search_after parameter with a point in time (PIT).
While a search request returns a single page of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.
Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one data stream or index into a new data stream or index with a different configuration.

浙公网安备 33010602011771号