Add the ability to set the number of hits to track accurately#36357
Add the ability to set the number of hits to track accurately#36357jimczi merged 14 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-search |
jpountz
left a comment
There was a problem hiding this comment.
I left some comments but like it in general.
There was a problem hiding this comment.
maybe add "generally" since match_all and term queries without deletions can still get their hit count computed for free
There was a problem hiding this comment.
maybe be explicit and say that total.relation will always be equal to eq when track_total_hits is set to true?
There was a problem hiding this comment.
this example might be a bit confusing since the hit count of match_all can be computed easily. Use eg. a range query instead?
There was a problem hiding this comment.
maybe rephrase to "there are 1000 hits or more" since the lower bound is inclusive
There was a problem hiding this comment.
maybe add a comment: Lucene can give us more information but it seems that we never want to return accurate hit counts if the count is greater than the upTo so that users do not get used to accurate hit counts when they didn't actually ask for accuracy?
There was a problem hiding this comment.
is the message right? "not accurate" happens when the parameter is an integer value rather than true?
There was a problem hiding this comment.
The message is completely off, thanks. I pushed a fix
There was a problem hiding this comment.
why accurate? we don't need hit counts since we already have it via numHits?
There was a problem hiding this comment.
Yes the logic is wrong here, I reverted it to disable the tracking if the scroll context already contains the total hits:
331d8fe
63c233e to
c840019
Compare
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested. It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected. Relates elastic#33028
The merge sort of sorted segments can produce an invalid sort if the sort field is an Integer/Long that uses reverse order and contains values equal to Integer/Long#MIN_VALUE. These values are always sorted first during a merge (instead of last because of the reverse order) due to this bug. Indices affected by the bug can be detected by running the CheckIndex command on a distribution that contains the fix (7.6+).
c840019 to
6287563
Compare
|
run gradle build tests 1 |
|
run gradle build tests 2 |
| of the total number of hits that match the query. | ||
| (see <<index-modules-index-sorting,_Index Sorting_>> for more details). | ||
| Defaults to true. | ||
| It also accepts an integer which in this case represents the number of |
There was a problem hiding this comment.
@jimczi , this is problematic in strongly-typed languages, since the track_total_hits parameter is defined as a boolean, and eg. the Go test suite fails with cannot use 4 (type int) as type bool in field value:
I understand the motivation here, but we should at least revisit the parameter definitions, and support something like "type" : ["boolean","number]", so the code generators can make decisions here.
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.
Relates #33028