Replace hard-deletes by soft-deletes to maintain document history#29549
Replace hard-deletes by soft-deletes to maintain document history#29549dnhatn merged 10 commits intoelastic:ccrfrom
Conversation
Today we can use the soft-deletes feature from Lucene to maintain a history of a document. This change simply replaces hard-deletes by soft-deletes in Engine. Besides marking a document as deleted, we also index a tombstone associated with that delete operation. Storing delete tombstones allows us to have a history of sequence-based operations which can serve in recovery or rollback.
|
Pinging @elastic/es-distributed |
|
/cc @jasontedor and @martijnvg |
bleskes
left a comment
There was a problem hiding this comment.
Thanks Nhat. I left some comments.
| private final CircuitBreakerService circuitBreakerService; | ||
| private final LongSupplier globalCheckpointSupplier; | ||
| private final LongSupplier primaryTermSupplier; | ||
| private final MetaDocSupplier metaDocSupplier; |
There was a problem hiding this comment.
is TombstoneDoc a better name?
There was a problem hiding this comment.
Yes, it's better - more explicit.
|
|
||
| private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) { | ||
| final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON); | ||
| final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source); |
There was a problem hiding this comment.
Instead of creating everything and removing what we don't need, can we maybe fold this into the document mapper and add a createTombstoneDoc method to it that only does what it needs to create the right fields (probably only calls the preParse / postParse methods in the right fields (similar to here)?
| private ParseContext.Document newMetaDoc(String type, String id, long seqno, long primaryTerm, long version) { | ||
| final SourceToParse source = SourceToParse.source(shardId.getIndexName(), type, id, new BytesArray("{}"), XContentType.JSON); | ||
| final ParsedDocument parsedDocument = docMapper(type).getDocumentMapper().parse(source); | ||
| parsedDocument.updateSeqID(seqno, primaryTerm); |
There was a problem hiding this comment.
I prefer to be consistent with how the engine set these during indexing - i.e., during the addition to lucene.
| t.join(); | ||
| } | ||
| // Close remaining searchers | ||
| IOUtils.close(searchers); |
There was a problem hiding this comment.
Why was this move earlier? I think I miss something I don't know - can you explain?
|
please run all tests. |
|
@elasticmachine please test this |
|
run sample packaging tests. |
s1monw
left a comment
There was a problem hiding this comment.
LGTM I left some questions none of them are blockers
| } | ||
|
|
||
| @FunctionalInterface | ||
| public interface TombstoneDocSupplier { |
There was a problem hiding this comment.
maybe add some javadocs here?
| } | ||
|
|
||
| public ParsedDocument createTombstoneDoc(String index, String type, String id) throws MapperParsingException { | ||
| final SourceToParse emptySource = SourceToParse.source(index, type, id, new BytesArray("{}"), XContentType.JSON); |
There was a problem hiding this comment.
do we need some identifier that this doc is a tombstone? I am not sure we do at this point but down the road we would no?
There was a problem hiding this comment.
+1 . We don't plan to keep many of these around so the overhead is minimal and we'd have the ability to search for them and debug. The question is how to do this. I guess the easiest would be to add a boolean metadata field to the mapping, but that feels like an overkill. In any case, I think this can be a follow up.
There was a problem hiding this comment.
Yes, I will think about it and make it in a follow-up.
bleskes
left a comment
There was a problem hiding this comment.
This LGTM, but I think we want a unit test on the engine level for this? What am I missing?
We planned to add real tests in the next PR when indexing stale deletes and no-ops. This PR is just a cut-over. I am fine to add a simple test here. |
Cool. Thanks for explaining. |
Today we can use the soft-deletes feature from Lucene to maintain a history of a document. This change simply replaces hard-deletes by soft-deletes in Engine. Besides marking a document as deleted, we also index a tombstone associated with that delete operation. Storing delete tombstones allows us to have a history of sequence-based operations which can serve in recovery or rollback. Relates #29530
Today we can use the soft-deletes feature from Lucene to maintain a
history of a document. This change simply replaces hard-deletes by
soft-deletes in Engine.
Besides marking a document as deleted, we also index a tombstone
associated with that delete operation. Storing delete tombstones allows
us to have a history of sequence-based operations which can serve in
recovery or rollback.
Relates #29530