Don't use optimized merge for fields that will be pruned by the merge#144000
Merged
elasticsearchmachine merged 5 commits intoelastic:mainfrom Mar 11, 2026
Merged
Don't use optimized merge for fields that will be pruned by the merge#144000elasticsearchmachine merged 5 commits intoelastic:mainfrom
elasticsearchmachine merged 5 commits intoelastic:mainfrom
Conversation
We avoid iterating over doc values fields multiple times during merge in certain cases by re-using segment statistics. If _seq_no fields are being pruned by the merge, we can't use this shortcut, as we don't necessarily know how many documents will still contain values after pruning. This commit reworks the DocValuesProducer classes within RecoverySourcePruneMergePolicy to make them visible to the MergeState, and updates DocValuesConsumerUtil to skip merge optimizations for the _seq_no field if sequence number pruning is active.
Collaborator
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
tlrx
approved these changes
Mar 11, 2026
Member
tlrx
left a comment
There was a problem hiding this comment.
LGTM - Nice simple change, I would expect it to be more complex
| }); | ||
| } | ||
|
|
||
| public void testSeqNoPrunedAfterMergeWithTsdbCodecFails() throws Exception { |
Member
There was a problem hiding this comment.
nit: maybe remove the Fails suffix?
server/src/main/java/org/elasticsearch/index/engine/RecoverySourcePruneMergePolicy.java
Outdated
Show resolved
Hide resolved
…seqno/optimised-merge
19 tasks
fcofdez
reviewed
Mar 11, 2026
Contributor
fcofdez
left a comment
There was a problem hiding this comment.
LGTM. Nice contained change 👍
tlrx
added a commit
to tlrx/elasticsearch
that referenced
this pull request
Mar 11, 2026
martijnvg
reviewed
Mar 12, 2026
|
|
||
| if (docValuesProducer instanceof PruningMergePolicy.PruningDocValuesProducer pdv) { | ||
| if (pdv.shouldPruneNumericDocValues(mergedFieldInfo.name)) { | ||
| return UNSUPPORTED; |
tlrx
added a commit
that referenced
this pull request
Mar 13, 2026
michalborek
pushed a commit
to michalborek/elasticsearch
that referenced
this pull request
Mar 23, 2026
…elastic#144000) We avoid iterating over doc values fields multiple times during merge in certain cases by re-using segment statistics. If _seq_no fields are being pruned by the merge, we can't use this shortcut, as we don't necessarily know how many documents will still contain values after pruning. This commit reworks the DocValuesProducer classes within RecoverySourcePruneMergePolicy to make them visible to the MergeState, and updates DocValuesConsumerUtil to skip merge optimizations for the _seq_no field if sequence number pruning is active.
michalborek
pushed a commit
to michalborek/elasticsearch
that referenced
this pull request
Mar 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
We avoid iterating over doc values fields multiple times during merge in
certain cases by re-using segment statistics. If _seq_no fields are being
pruned by the merge, we can't use this shortcut, as we don't necessarily
know how many documents will still contain values after pruning. This
commit reworks the DocValuesProducer classes within
RecoverySourcePruneMergePolicy to make them visible to the MergeState,
and updates DocValuesConsumerUtil to skip merge optimizations for the
_seq_no field if sequence number pruning is active.