Support no-op tombstones documents in TSDB indices with synthetic ids#144935
Merged
tlrx merged 14 commits intoelastic:mainfrom Mar 30, 2026
Merged
Support no-op tombstones documents in TSDB indices with synthetic ids#144935tlrx merged 14 commits intoelastic:mainfrom
tlrx merged 14 commits intoelastic:mainfrom
Conversation
Collaborator
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Collaborator
|
Pinging @elastic/es-distributed (Team:Distributed) |
fcofdez
approved these changes
Mar 27, 2026
| flush(backingIndex); | ||
| } | ||
|
|
||
| // Ensure all operations are replicated |
Contributor
There was a problem hiding this comment.
isn't this ensured by the bulk request semantics?
Member
Author
There was a problem hiding this comment.
I guess Cursor hallucinated a bit, thanks for spotting this. I removed it in 664b739.
| final int nbGaps = randomIntBetween(1, 25); | ||
| primaryShard.withEngine(engine -> { | ||
| for (int i = 0; i < nbGaps; i++) { | ||
| generateNewSeqNo(engine); |
Contributor
There was a problem hiding this comment.
TIL: neat way of generating gaps without needing to isolate replicas or something like that
Member
Author
There was a problem hiding this comment.
Thanks Cursor! I didn't know about this but there are other usage in IT tests.
| * Warning: This method can be slow because it potentially scans many documents in the segment. | ||
| * </p> | ||
| */ | ||
| int findFirstDocWithTsIdOrdinalEqualOrGreaterThan(int tsIdOrd) throws IOException { |
Contributor
There was a problem hiding this comment.
nit: maybe we should assert that tsIdOrd >= 0?
felixbarny
pushed a commit
to felixbarny/elasticsearch
that referenced
this pull request
Mar 30, 2026
…elastic#144935) No-op tombstones documents can be indexed into Lucene during the promotion of a replica after a primary failure, or after restoring a snapshot or during peer-recovery when the primary shard has no-op tombstones documents. Such documents have the __soft_deletes field set so they are automatically filtered out from search hits and GET responses. The _tsid, @timestamp and _ts_routing_hash doc value fields (that are used to compute the synthetic _id of documents) of delete tombstones document are populated so the fields exist in the Lucene index (the values are derived from the document id of the DELETE request). For no-op tombstone documents, it's different because we cannot deduce the doc values fields from a document id. Therefore those no-op tombstone documents must be checked for and filtered out from the TSDB synthetic id postings format. Also, the TSDB synthetic id custom codec ensures that all open/written segment have the _tsid, @timestamp and _ts_routing_hash doc value fields. This is not true for segment that are only composed of no-op tombstones documents, so the assertions there must be relaxed. This commit adjust the postings format and coded used in TSDB indices with synthetic ids and adds an integration test that exercise the 3 code paths where no-op tombstone document can be indexed into Lucene.
tlrx
added a commit
to tlrx/elasticsearch
that referenced
this pull request
Mar 30, 2026
mamazzol
pushed a commit
to mamazzol/elasticsearch
that referenced
this pull request
Mar 30, 2026
…elastic#144935) No-op tombstones documents can be indexed into Lucene during the promotion of a replica after a primary failure, or after restoring a snapshot or during peer-recovery when the primary shard has no-op tombstones documents. Such documents have the __soft_deletes field set so they are automatically filtered out from search hits and GET responses. The _tsid, @timestamp and _ts_routing_hash doc value fields (that are used to compute the synthetic _id of documents) of delete tombstones document are populated so the fields exist in the Lucene index (the values are derived from the document id of the DELETE request). For no-op tombstone documents, it's different because we cannot deduce the doc values fields from a document id. Therefore those no-op tombstone documents must be checked for and filtered out from the TSDB synthetic id postings format. Also, the TSDB synthetic id custom codec ensures that all open/written segment have the _tsid, @timestamp and _ts_routing_hash doc value fields. This is not true for segment that are only composed of no-op tombstones documents, so the assertions there must be relaxed. This commit adjust the postings format and coded used in TSDB indices with synthetic ids and adds an integration test that exercise the 3 code paths where no-op tombstone document can be indexed into Lucene.
tlrx
added a commit
that referenced
this pull request
Mar 31, 2026
pmpailis
pushed a commit
that referenced
this pull request
Mar 31, 2026
ncordon
pushed a commit
to ncordon/elasticsearch
that referenced
this pull request
Apr 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No-op tombstones documents can be indexed into Lucene during the promotion of a replica after a primary failure, or after restoring a snapshot or during peer-recovery when the primary shard has no-op tombstones documents. Such documents have the
__soft_deletesfield set so they are automatically filtered out from search hits and GET responses.The
_tsid,@timestampand_ts_routing_hashdoc value fields (that are used to compute the synthetic_idof documents) of delete tombstones document are populated so the fields exist in the Lucene index (the values are derived from the document id of the DELETE request).For no-op tombstone documents, it's different because we cannot deduce the doc values fields from a document id. Therefore those no-op tombstone documents must be checked for and filtered out from the TSDB synthetic id postings format.
Also, the TSDB synthetic id custom codec ensures that all open/written segment have the
_tsid,@timestampand_ts_routing_hashdoc value fields. This is not true for segment that are only composed of no-op tombstones documents, so the assertions there must be relaxed.This commit adjust the postings format and coded used in TSDB indices with synthetic ids and adds an integration test that exercise the 3 code paths where no-op tombstone document can be indexed into Lucene.
Note: Cursor greatly helped for writing the test