Skip to content

txn: introduce large_txn_cache in txn_status_cache#17460

Merged
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
ekexium:feat-txn-status-cache-large-txn
Sep 24, 2024
Merged

txn: introduce large_txn_cache in txn_status_cache#17460
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
ekexium:feat-txn-status-cache-large-txn

Conversation

@ekexium
Copy link
Contributor

@ekexium ekexium commented Aug 29, 2024

What is changed and how it works?

Issue Number: ref #17459

What's Changed:

Introduce large_txn_status in txn_status_cache.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

None

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Aug 29, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 29, 2024
@ekexium ekexium changed the title Feat txn status cache large txn txn: introduce large_txn_cache in txn_status_cache Aug 29, 2024
Signed-off-by: ekexium <eke@fastmail.com>
@ekexium ekexium force-pushed the feat-txn-status-cache-large-txn branch from 3308941 to 18822df Compare August 29, 2024 16:25
Signed-off-by: ekexium <eke@fastmail.com>
@ekexium ekexium marked this pull request as ready for review September 6, 2024 06:46
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 6, 2024
@ekexium ekexium requested a review from MyonKeminta September 9, 2024 09:49
//! write operation that the request needs to perform.

//!
//! ## Design Update-1: Dual-Cache System
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually ok to directly edit the previous design, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I was thinking it's good to preserve the design history, which shows how this module has evolved.

slots: Vec<CachePadded<Mutex<TxnStatusCacheSlot>>>,
// default cache for committed txns
normal_cache: Vec<CachePadded<Mutex<TxnStatusCacheSlot>>>,
// for large txns, or any txn whose min_commit_ts needs to be cached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better describe the purpose in detail here, as well as the reason to separate it from normal_cache (I wonder the reason, too 🤔)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they actually serve different purposes. The design prevents large transactions from being evicted due to normal transactions. This is how it "prioritizes" large transactions.

Comment on lines +496 to +504
let mut large_txn_cache = self.large_txn_cache[slot_index].lock();
if let Some(entry) = large_txn_cache.get(&start_ts) {
return Some(entry.state.clone());
}

let mut normal_cache = self.normal_cache[slot_index].lock();
if let Some(entry) = normal_cache.get(&start_ts) {
return Some(entry.state.clone());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use a single cache to hold normal transactions and large transactions, so that there won't be any non-atomicity between two caches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely possible. But I think it would be less robust against worst case scenario.
Btw the two caches can be viewed as two individual parts and they do not depend on each other. So I suppose there should be no risks here?

Signed-off-by: ekexium <eke@fastmail.com>
//! request to be cached in the cache can also be treated as large
//! transactions, as they imply their min_commit_ts are useful.
//!
//! - Prioritized Caching: The `large_txn_cache` has a higher priority when
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should make it clear the large transactions here indicates only pipelined dml transactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the project currently focuses on pipelined transactions, the module itself doesn't need to be transaction-type specific. It can handle caching min_commit_ts for any large transaction that requires it, not just pipelined DMLs. This approach offers more flexibility and future-proofs our design.

/// for-filtering-out-unwanted-late-arrived-stale-prewrite-requests) for details
/// about why this is needed.
const CACHE_ITEMS_REQUIRED_KEEP_TIME: Duration = Duration::from_secs(30);
const CACHE_ITEMS_REQUIRED_KEEP_TIME_FOR_LARGE_TXNS: Duration = Duration::from_secs(30);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of large transactions should be much less than normal transactions, perhaps this constant is unncessary if its value is the same.
While the capacity could be smaller for the large transactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use the same capacity mainly because I wanted to make the configs as few as possible. It can be confusing if we have 2 separate capacity config items for txn_status_cache. Besides, capacity merely sets an upper limit and doesn't directly correlate with the amount of memory actually used.

The individual constant just implies that the keep-time can be different and changed when needed.

@ekexium ekexium force-pushed the feat-txn-status-cache-large-txn branch from 9b6d3f0 to ac3b3c3 Compare September 18, 2024 16:04
@ekexium ekexium force-pushed the feat-txn-status-cache-large-txn branch from ac3b3c3 to 1eb9a0a Compare September 19, 2024 02:45
@ekexium
Copy link
Contributor Author

ekexium commented Sep 19, 2024

/retest

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 19, 2024
@ekexium
Copy link
Contributor Author

ekexium commented Sep 20, 2024

/retest

@ekexium ekexium requested a review from MyonKeminta September 24, 2024 01:56
@ti-chi-bot ti-chi-bot bot added the lgtm label Sep 24, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfzjywxk, zyguan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 24, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 24, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-09-19 03:14:12.756871355 +0000 UTC m=+1103722.497295294: ☑️ agreed by cfzjywxk.
  • 2024-09-24 06:35:28.154187192 +0000 UTC m=+1547797.894611116: ☑️ agreed by zyguan.

@ekexium
Copy link
Contributor Author

ekexium commented Sep 24, 2024

/retest

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 24, 2024

@ekexium: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot bot merged commit ddd5da8 into tikv:master Sep 24, 2024
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Sep 24, 2024
/// Insert a transaction status into the cache, or update it. The current
/// system time should be passed from outside to avoid getting system time
/// repeatedly when multiple items are being inserted.
pub fn upsert(&self, start_ts: TimeStamp, state: TxnState, now: SystemTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is upsert a common term? Actually I think the word insert is good enough here, on the contrary, the previous behavior (not replacing or promoting when exists) is the thing that's special.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants