Skip to content

MB-66163: fix checkpointing mechanism under sparse mutation scenarios #2186

Merged
Thejas-bhat merged 3 commits into
masterfrom
checkpointBug
Apr 25, 2025
Merged

MB-66163: fix checkpointing mechanism under sparse mutation scenarios #2186
Thejas-bhat merged 3 commits into
masterfrom
checkpointBug

Conversation

@Thejas-bhat

Copy link
Copy Markdown
Member
  • the checkpointing mechanism considers snapshots which fall within the numSnapshotsToKeep * rollbackSamplingInterval duration window. The snapshots outside this window are purged.
  • Under sparse mutation scenario for eg, when the workload pattern in such that it results in seldomly persisting snapshots that too outside this duration window - we could end up purging ALL those snapshots which makes the system fallback to preserving the latest snapshots
  • For a better partial rollback behaviour, we retain rollbackRetentionFactor worth of snapshots out of this duration window such that the cost of rebuilding the index from scratch is avoided.
  • This PR fixes the decision logic involved in the scenario pointed and tracks the previous checkpoints while opening back the index as well.

@Thejas-bhat Thejas-bhat changed the title bug fix: fix checkpointing mechanism under sparse mutation scenarios MB-66163: fix checkpointing mechanism under sparse mutation scenarios Apr 24, 2025
@abhinavdangeti abhinavdangeti added this to the v2.5.1 milestone Apr 24, 2025
Comment thread index/scorch/persister.go Outdated
@abhinavdangeti

Copy link
Copy Markdown
Member

@Thejas-bhat Let's backport this one as well to 7.6.x-couchbase after this PR is merged.

@Thejas-bhat Thejas-bhat merged commit dcc8de8 into master Apr 25, 2025
Thejas-bhat added a commit that referenced this pull request Apr 25, 2025
…#2186)

- the checkpointing mechanism considers snapshots which fall within the
`numSnapshotsToKeep * rollbackSamplingInterval` duration window. The
snapshots outside this window are purged.
- Under sparse mutation scenario for eg, when the workload pattern in
such that it results in seldomly persisting snapshots that too outside
this duration window - we could end up purging ALL those snapshots which
makes the system fallback to preserving the latest snapshots
- For a better partial rollback behaviour, we retain
`rollbackRetentionFactor` worth of snapshots out of this duration window
such that the cost of rebuilding the index from scratch is avoided.
- This PR fixes the decision logic involved in the scenario pointed and
tracks the previous checkpoints while opening back the index as well.
Thejas-bhat added a commit that referenced this pull request Apr 25, 2025
…arios (#2187)

Backporting #2186 to couchbase 7.6.x release cycle
@abhinavdangeti abhinavdangeti deleted the checkpointBug branch May 3, 2025 20:17
abhinavdangeti pushed a commit that referenced this pull request May 6, 2025
…arios (#2187)

Backporting #2186 to couchbase 7.6.x release cycle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants