Skip to content

Don't use optimized merge for fields that will be pruned by the merge#144000

Merged
elasticsearchmachine merged 5 commits intoelastic:mainfrom
romseygeek:seqno/optimised-merge
Mar 11, 2026
Merged

Don't use optimized merge for fields that will be pruned by the merge#144000
elasticsearchmachine merged 5 commits intoelastic:mainfrom
romseygeek:seqno/optimised-merge

Conversation

@romseygeek
Copy link
Copy Markdown
Contributor

We avoid iterating over doc values fields multiple times during merge in
certain cases by re-using segment statistics. If _seq_no fields are being
pruned by the merge, we can't use this shortcut, as we don't necessarily
know how many documents will still contain values after pruning. This
commit reworks the DocValuesProducer classes within
RecoverySourcePruneMergePolicy to make them visible to the MergeState,
and updates DocValuesConsumerUtil to skip merge optimizations for the
_seq_no field if sequence number pruning is active.

We avoid iterating over doc values fields multiple times during merge
in certain cases by re-using segment statistics.  If _seq_no fields
are being pruned by the merge, we can't use this shortcut, as we
don't necessarily know how many documents will still contain values
after pruning.  This commit reworks the DocValuesProducer classes
within RecoverySourcePruneMergePolicy to make them visible to
the MergeState, and updates DocValuesConsumerUtil to skip merge
optimizations for the _seq_no field if sequence number pruning is
active.
@romseygeek romseygeek marked this pull request as ready for review March 11, 2026 12:11
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Nice simple change, I would expect it to be more complex

});
}

public void testSeqNoPrunedAfterMergeWithTsdbCodecFails() throws Exception {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe remove the Fails suffix?

@romseygeek romseygeek added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Mar 11, 2026
@elasticsearchmachine elasticsearchmachine merged commit 6b105b2 into elastic:main Mar 11, 2026
36 checks passed
@romseygeek romseygeek deleted the seqno/optimised-merge branch March 11, 2026 13:57
Copy link
Copy Markdown
Contributor

@fcofdez fcofdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice contained change 👍


if (docValuesProducer instanceof PruningMergePolicy.PruningDocValuesProducer pdv) {
if (pdv.shouldPruneNumericDocValues(mergedFieldInfo.name)) {
return UNSUPPORTED;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

tlrx added a commit that referenced this pull request Mar 13, 2026
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
…elastic#144000)

We avoid iterating over doc values fields multiple times during merge in
certain cases by re-using segment statistics.  If _seq_no fields are
being  pruned by the merge, we can't use this shortcut, as we don't
necessarily  know how many documents will still contain values after
pruning.  This  commit reworks the DocValuesProducer classes within 
RecoverySourcePruneMergePolicy to make them visible to the MergeState, 
and updates DocValuesConsumerUtil to skip merge optimizations for the 
_seq_no field if sequence number pruning is active.
michalborek pushed a commit to michalborek/elasticsearch that referenced this pull request Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue :StorageEngine/Codec Team:StorageEngine v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants