Skip to content

Fix corruption due to concurrent update during snapshot#7298

Merged
agourlay merged 1 commit intodevfrom
fix-concurrent-update-corruption-snapshot
Sep 24, 2025
Merged

Fix corruption due to concurrent update during snapshot#7298
agourlay merged 1 commit intodevfrom
fix-concurrent-update-corruption-snapshot

Conversation

@agourlay
Copy link
Member

@agourlay agourlay commented Sep 24, 2025

This PR fixes the non-atomic copy on write by forcing the serialization of all updates.

During snapshot, a single write segment is shared by all proxy segments.

At the end of the segments snapshot process, deletes are propagated to the wrapped segments.

This happens concurrently with user update operations which are serialized through an update channel.

As demonstrated by the test test_continuous_snapshot, it is possible that the delete propagation happens right in between of the CoW for the set_payload operation.

 fn set_payload(
        &mut self,
        op_num: SeqNumberType,
        point_id: PointIdType,
        payload: &Payload,
        key: &Option<JsonPath>,
        hw_counter: &HardwareCounterCell,
    ) -> OperationResult<bool> {
        self.move_if_exists(op_num, point_id, hw_counter)?;
        // NOT ATOMIC - write segment can be updated from another proxy segment
        self.write_segment
            .get()
            .write()
            .set_payload(op_num, point_id, payload, key, hw_counter)
    }

This PR introduces a new update lock at the level of the segment holder to ensure that the snapshot process does not interfere with on-going client updates.

@timvisee timvisee self-requested a review September 24, 2025 09:23
@agourlay agourlay force-pushed the fix-concurrent-update-corruption-snapshot branch from cc58515 to df19026 Compare September 24, 2025 09:25
@agourlay agourlay marked this pull request as ready for review September 24, 2025 12:52
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

Warning

Rate limit exceeded

@agourlay has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 1 minutes and 57 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 38c394d and df19026.

📒 Files selected for processing (2)
  • lib/collection/tests/integration/continuous_snapshot_test.rs (1 hunks)
  • lib/shard/src/segment_holder/mod.rs (5 hunks)
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-concurrent-update-corruption-snapshot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@agourlay agourlay merged commit 688651b into dev Sep 24, 2025
16 checks passed
@agourlay agourlay deleted the fix-concurrent-update-corruption-snapshot branch September 24, 2025 12:59
@timvisee timvisee mentioned this pull request Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants