Automatically early terminate search query based on index sorting#24864
Automatically early terminate search query based on index sorting#24864jimczi merged 7 commits intoelastic:masterfrom
Conversation
jpountz
left a comment
There was a problem hiding this comment.
It looks good overall! Something that I'm wondering is what we should do with total_hits in the response when track_total_hits is false. Should we return null like we do for the score when track_score is false?
There was a problem hiding this comment.
could it be a regular boolean?
There was a problem hiding this comment.
It can because we don't use the parallelize execution in IndexSearcher.
I'll fix
There was a problem hiding this comment.
Do we really need the setter on SearchContext or could it only be on DefaultSearchContext?
There was a problem hiding this comment.
It is needed for the tests that use TestSearchContext
There was a problem hiding this comment.
let's assert that the collector is null before assigning it again?
There was a problem hiding this comment.
maybe unwrap BoostQuery too?
There was a problem hiding this comment.
Not related to this PR but this will test the index sort when replicas are on different nodes.
|
@jpountz I pushed some changes. |
I'd much prefer returning |
|
Fair enough, can we use |
sure |
|
I pushed another change that forces total hits to -1 in the response when |
This is a spin off for elastic#24398. This commit refactors the query phase in order to be able to automatically detect queries that can be early terminated. If the index sort matches the query sort, the top docs collection is early terminated on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector. This change also adds a new parameter to the search request called `track_total_hits`. It indicates if the total number of hits that match the query should be tracked. If false, queries sorted by the index sort will not try to compute this information and will limit the collection to the first N documents per segment. Aggregations are not impacted and will continue to see every document even when the index sort matches the query sort and `track_total_hits` is false. Relates elastic#6720
Set `terminated_early` to true if the top docs collector has actually terminated early (even if other collectors had to continue the collection). Adapt tests and docs.
6bd3df2 to
416530c
Compare
jpountz
left a comment
There was a problem hiding this comment.
Looks good, I left minor questions/comments. Thanks for working on this!
| } | ||
| context.trackScores(source.trackScores()); | ||
| if (source.trackTotalHits() == false && context.scrollContext() != null) { | ||
| throw new SearchContextException(context, "disabling `track_total_hits` is not allowed in a scroll context"); |
There was a problem hiding this comment.
I think we have been using brackets in general, ie. [track_total_hits]
| hits[i] = SearchHitTests.createTestItem(false); // creating random innerHits could create loops | ||
| } | ||
| long totalHits = randomLong(); | ||
| long totalHits = frequently() ? Math.abs(randomLong()) : -1; |
There was a problem hiding this comment.
Math.abs(randomLong()) can be negative if the random number generator returns MIN_VALUE, maybe use TestUtil.nextLong?
There was a problem hiding this comment.
or clear the sign bit with l & 0x7FFFFFFFFFFFFFFFL?
| // TESTRESPONSE[s/"_shards": \.\.\./"_shards": "$body._shards",/] | ||
| // TESTRESPONSE[s/"total": \.\.\./"total": "$body.hits.total",/] | ||
| // TESTRESPONSE[s/"took": 20,/"took": "$body.took",/] | ||
| // TESTRESPONSE[s/"terminated_early": true,//] |
There was a problem hiding this comment.
why de we need exceptions for total and terminated_early?
There was a problem hiding this comment.
Not needed for total, thanks.
The index is empty so terminated_early is false in this example.
|
Thanks @jpountz ! |
The QueryCollectorContext abstraction was introduced by #24864 based on the requirement that the top docs collector creation needed to be delayed until after all the other collectors had been created. At the same time, collectors get wrapped depending on the search features enabled by the request, but the top score / total hit count collector is the root collector where the wrapping starts, which is why its corresponding context gets added at position 0 in the list of collector contexts. Requirements have changed since #27666 , which means that we can go back to a simpler way of creating collectors and wrapping them. We no longer need a QueryCollectorContext abstraction, and we can instead create collectors straight-away, and wrap them as needed. This is much easier to follow compared to the very generic create(Collector) method that the context exposes. TopDocsCollectorContext adds some value in that it incorporates all the logic around creating the top docs collector, yet it can be further simplified as well by making the postProcess method more specific.
This is a spin off for #24398.
This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated
to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called
track_total_hits.It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and
track_total_hitsis false.Relates #6720