Optimize sort on numeric long and date fields.#49732
Merged
mayya-sharipova merged 1 commit intoelastic:7.xfrom Nov 29, 2019
Merged
Optimize sort on numeric long and date fields.#49732mayya-sharipova merged 1 commit intoelastic:7.xfrom
mayya-sharipova merged 1 commit intoelastic:7.xfrom
Conversation
This rewrites long sort as a `DistanceFeatureQuery`, which can efficiently skip non-competitive blocks and segments of documents. Depending on the dataset, the speedups can be 2 - 10 times. The optimization can be disabled with setting the system property `es.search.rewrite_sort` to `false`. Optimization is skipped when an index has 50% or more data with the same value. Optimization is done through: 1. Rewriting sort as `DistanceFeatureQuery` which can efficiently skip non-competitive blocks and segments of documents. 2. Sorting segments according to the primary numeric sort field(elastic#44021) This allows to skip non-competitive segments. 3. Using collector manager. When we optimize sort, we sort segments by their min/max value. As a collector expects to have segments in order, we can not use a single collector for sorted segments. We use collectorManager, where for every segment a dedicated collector will be created. 4. Using Lucene's shared TopFieldCollector manager This collector manager is able to exchange minimum competitive score between collectors, which allows us to efficiently skip the whole segments that don't contain competitive scores. 5. When index is force merged to a single segment, elastic#48533 interleaving old and new segments allows for this optimization as well, as blocks with non-competitive docs can be skipped. Backport for elastic#48804 Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Collaborator
|
Pinging @elastic/es-search (:Search/Search) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This rewrites long sort as a
DistanceFeatureQuery, which canefficiently skip non-competitive blocks and segments of documents.
Depending on the dataset, the speedups can be 2 - 10 times.
The optimization can be disabled with setting the system property
es.search.rewrite_sorttofalse.Optimization is skipped when an index has 50% or more data with
the same value.
Optimization is done through:
Rewriting sort as
DistanceFeatureQuerywhich canefficiently skip non-competitive blocks and segments of documents.
Sorting segments according to the primary numeric sort field(Sort leaves on search according to the primary numeric sort field #44021)
This allows to skip non-competitive segments.
Using collector manager.
When we optimize sort, we sort segments by their min/max value.
As a collector expects to have segments in order,
we can not use a single collector for sorted segments.
We use collectorManager, where for every segment a dedicated collector
will be created.
Using Lucene's shared TopFieldCollector manager
This collector manager is able to exchange minimum competitive
score between collectors, which allows us to efficiently skip
the whole segments that don't contain competitive scores.
When index is force merged to a single segment, Add a new merge policy that interleaves old and new segments on force merge #48533 interleaving
old and new segments allows for this optimization as well,
as blocks with non-competitive docs can be skipped.
Backport for #48804
Co-authored-by: Jim Ferenczi jim.ferenczi@elastic.co