Track total hits up to 10,000 by default#37466
Conversation
This commit changes the default for the `track_total_hits` option of the search request to `10,000`. This means that by default search requests will accurately track the total hit count up to `10,000` documents, requests that match more than this value will set the `"total.relation"` to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response. Scroll queries are not impacted, they will continue to count the total hits accurately. The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request. I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate. Closes elastic#33028
|
Pinging @elastic/es-search |
…re an accurate count internally
jpountz
left a comment
There was a problem hiding this comment.
The change looks good. We will probably need to talk with other teams before merging this change so that they set track_total_hits=true and can deal with disabling exact counts later?
server/src/main/java/org/elasticsearch/common/io/stream/StreamInput.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/query/EarlyTerminatingCollector.java
Show resolved
Hide resolved
|
run the gradle build tests 1 |
|
@elasticmachine run gradle build tests 1 |
|
@elasticmachine run elasticsearch-ci/2 |
| // no matter what the value of track_total_hits is | ||
| return SearchContext.TRACK_TOTAL_HITS_ACCURATE; | ||
| } | ||
| return request.source() == null ? SearchContext.DEFAULT_TRACK_TOTAL_HITS_UP_TO : request.source().trackTotalHitsUpTo() == null ? |
There was a problem hiding this comment.
is the request.source() == null? check necessary on line 704? Seems like it's redundant given the check on line 700
p.s: I'm a curious engineer looking to learn more about elasticsearch's internals. I'm reading the diffs in my spare time to learn how experts in the field are building real-world distributed systems. Thanks for taking the time to answer my questions!
There was a problem hiding this comment.
The check is on request.scroll which is different than request.source so it is necessary.
This commit changes the default for the
track_total_hitsoption of the search requestto
10,000. This means that by default search requests will accurately track the total hit countup to
10,000documents, requests that match more than this value will set the"total.relation"to
"gte"(e.g. greater than or equals) and the"total.value"to10,000in the search response.Scroll queries are not impacted, they will continue to count the total hits accurately.
The default is set back to
true(accurate hit count) ifrest_total_hits_as_intis set in the search request.I choose
10,000as the default because that's also the number we use to limit pagination. This means thatusers will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate.
Closes #33028