Skip to content

KVStore: Defer deleting locks until after the committed data has been written to storage#9921

Merged
ti-chi-bot[bot] merged 5 commits intopingcap:masterfrom
JinheLin:defer_remove_lock_when_raft_write
Mar 3, 2025
Merged

KVStore: Defer deleting locks until after the committed data has been written to storage#9921
ti-chi-bot[bot] merged 5 commits intopingcap:masterfrom
JinheLin:defer_remove_lock_when_raft_write

Conversation

@JinheLin
Copy link
Contributor

@JinheLin JinheLin commented Feb 27, 2025

What problem does this PR solve?

Issue Number: close #9922

Problem Summary:

If we directly remove the logic of the read thread writing to the storage engine during learner reads in the existing code, it would cause concurrency control problems that could result in missing some of the latest data. The concurrency logic is as follows:

  1. Raft thread: Receives a write log and removes the lock.
  2. Read thread: Detects no lock and begins reading (missing the data written in step 3).
  3. Raft thread: Writes the data.

The root cause of this issue is that the Raft thread removes the lock before writing the data to the storage engine.

What is changed and how it works?

  • Removes locks after committed data has been written to the storage.

  • Rename Region::remove to Region:removeDebug because this function only used to tests.


Check List

Tests

  • Unit test

  • Integration test

  • Manual test (add detailed scripts or steps below)

    • Consistency test:
      image
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 27, 2025
@JinheLin JinheLin changed the title KVStore: Defer the deletion of the lock until after the write to storage KVStore: Removes locks after committed data has been written to the storage Feb 27, 2025
@JinheLin JinheLin changed the title KVStore: Removes locks after committed data has been written to the storage KVStore: Defer deleting locks until after the committed data has been written to storage Mar 1, 2025
Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply the change of remove write storage in doLearnerRead in the same PR?

Rest LGTM

@JinheLin
Copy link
Contributor Author

JinheLin commented Mar 3, 2025

Should we apply the change of remove write storage in doLearnerRead in the same PR?

Rest LGTM

@JaySon-Huang

This PR is the peconditions of remove write storage in doLearnerRead.

The next PR will remove write storage in doLearnerReader.

@CalvinNeo
Copy link
Member

Should we apply the change of remove write storage in doLearnerRead in the same PR?

Rest LGTM

I think it is better that we firstly merge this PR, and let it run for a week for all relevant tests to make sure this change is less likely to have problems.

If we remove the remove write storage in doLearnerRead, then there will be no concurrent decoding, so some problem may not emerge

Copy link
Member

@CalvinNeo CalvinNeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 3, 2025
Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Mar 3, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 3, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-03-03 08:38:02.16669548 +0000 UTC m=+257995.295615224: ☑️ agreed by CalvinNeo.
  • 2025-03-03 08:52:32.283373208 +0000 UTC m=+258865.412292951: ☑️ agreed by JaySon-Huang.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CalvinNeo, JaySon-Huang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [CalvinNeo,JaySon-Huang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 3, 2025

@JinheLin: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

trigger some heavy tests which will not run always when PR updated.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit ab39d25 into pingcap:master Mar 3, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transaction locks can only be removed after committed data has been written to the storage.

3 participants