MB-66163: always protect the latest snapshot within bolt#2183
Merged
Conversation
abhinavdangeti
previously approved these changes
Apr 22, 2025
54150a1 to
3a08bed
Compare
abhinavdangeti
previously approved these changes
Apr 23, 2025
abhinavdangeti
approved these changes
Apr 23, 2025
abhinavdangeti
left a comment
Member
There was a problem hiding this comment.
Solid diagnosis & fix @Thejas-bhat !
metonymic-smokey
approved these changes
Apr 24, 2025
Thejas-bhat
added a commit
that referenced
this pull request
Apr 24, 2025
- Currently because of the time-series based checkpointing mechanism, there can be edge cases where we don't end up protecting the very latest snapshot that's there in the bolt file. - This can cause some serious issues along the persister path, where the newly created file segments that's part of the new snapshot written to bolt can potentially get cleaned up when the purge code `removeOldData` is invoked. If there were a backup operation `indexReader.CopyTo()` to be performed just after this purge operation, the latest root still has a pointer to the file segment but the underlying file itself would have been purged because its not part of `ineligibleForRemoval` structure and also the corresponding bolt entry would have been deleted. - The issue highlighted has also been unit tested and is consistently reproducable. ``` rollback_test.go:559: error copying the index: error backing up index snapshot: segment: /var/folders/5n/g80bkslj5qn6xn8srh6pxv_r0000gp/T/bleve-scorch-test-TestLatestSnapshotCheckpointed/000000000004.zap copy err: stat /var/folders/5n/g80bkslj5qn6xn8srh6pxv_r0000gp/T/bleve-scorch-test-TestLatestSnapshotCheckpointed/000000000004.zap: no such file or directory ``` - The PR addresses this issue by firstly preserving the latest snapshot within bolt and n-1 time-series based checkpoints and also marking the file as ineligible for removal as an layer of protection. - There is also a bug fix along a specific merger path where the old snapshot's ref count wasn't being decremented appropriately which can potentially cause file leak and memory bloat due to invalid snapshots still being present.
Thejas-bhat
added a commit
that referenced
this pull request
Apr 24, 2025
Backporting #2183 for couchbase 7.6.x release cycle builds.
abhinavdangeti
pushed a commit
that referenced
this pull request
May 6, 2025
Backporting #2183 for couchbase 7.6.x release cycle builds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
removeOldDatais invoked. If there were a backup operationindexReader.CopyTo()to be performed just after this purge operation, the latest root still has a pointer to the file segment but the underlying file itself would have been purged because its not part ofineligibleForRemovalstructure and also the corresponding bolt entry would have been deleted.