txn: introduce large_txn_cache in txn_status_cache by ekexium · Pull Request #17460 · tikv/tikv

ekexium · 2024-08-29T16:24:41Z

What is changed and how it works?

Issue Number: ref #17459

What's Changed:

Introduce large_txn_status in txn_status_cache.

Related changes

PR to update pingcap/docs/pingcap/docs-cn:
Need to cherry-pick to the release branch

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Release note

None

ti-chi-bot · 2024-08-29T16:24:44Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Signed-off-by: ekexium <eke@fastmail.com>

MyonKeminta · 2024-09-14T06:31:31Z

src/storage/txn/txn_status_cache.rs

 //! write operation that the request needs to perform.
-
+//!
+//! ## Design Update-1: Dual-Cache System


It's actually ok to directly edit the previous design, I think

Yes. I was thinking it's good to preserve the design history, which shows how this module has evolved.

MyonKeminta · 2024-09-14T07:35:29Z

src/storage/txn/txn_status_cache.rs

-    slots: Vec<CachePadded<Mutex<TxnStatusCacheSlot>>>,
+    // default cache for committed txns
+    normal_cache: Vec<CachePadded<Mutex<TxnStatusCacheSlot>>>,
+    // for large txns, or any txn whose min_commit_ts needs to be cached


Better describe the purpose in detail here, as well as the reason to separate it from normal_cache (I wonder the reason, too 🤔)

Because they actually serve different purposes. The design prevents large transactions from being evicted due to normal transactions. This is how it "prioritizes" large transactions.

src/storage/txn/txn_status_cache.rs

MyonKeminta · 2024-09-14T08:29:00Z

src/storage/txn/txn_status_cache.rs

+        let mut large_txn_cache = self.large_txn_cache[slot_index].lock();
+        if let Some(entry) = large_txn_cache.get(&start_ts) {
+            return Some(entry.state.clone());
+        }
+
+        let mut normal_cache = self.normal_cache[slot_index].lock();
+        if let Some(entry) = normal_cache.get(&start_ts) {
+            return Some(entry.state.clone());
+        }


Is it possible to use a single cache to hold normal transactions and large transactions, so that there won't be any non-atomicity between two caches?

It's definitely possible. But I think it would be less robust against worst case scenario.
Btw the two caches can be viewed as two individual parts and they do not depend on each other. So I suppose there should be no risks here?

src/storage/txn/txn_status_cache.rs

Signed-off-by: ekexium <eke@fastmail.com>

cfzjywxk · 2024-09-18T12:50:23Z

src/storage/txn/txn_status_cache.rs

+//!   request to be cached in the cache can also be treated as large
+//!   transactions, as they imply their min_commit_ts are useful.
+//!
+//! - Prioritized Caching: The `large_txn_cache` has a higher priority when


Perhaps we should make it clear the large transactions here indicates only pipelined dml transactions.

While the project currently focuses on pipelined transactions, the module itself doesn't need to be transaction-type specific. It can handle caching min_commit_ts for any large transaction that requires it, not just pipelined DMLs. This approach offers more flexibility and future-proofs our design.

src/storage/txn/txn_status_cache.rs

cfzjywxk · 2024-09-18T12:56:12Z

src/storage/txn/txn_status_cache.rs

 /// for-filtering-out-unwanted-late-arrived-stale-prewrite-requests) for details
 /// about why this is needed.
 const CACHE_ITEMS_REQUIRED_KEEP_TIME: Duration = Duration::from_secs(30);
+const CACHE_ITEMS_REQUIRED_KEEP_TIME_FOR_LARGE_TXNS: Duration = Duration::from_secs(30);


The number of large transactions should be much less than normal transactions, perhaps this constant is unncessary if its value is the same.
While the capacity could be smaller for the large transactions.

I use the same capacity mainly because I wanted to make the configs as few as possible. It can be confusing if we have 2 separate capacity config items for txn_status_cache. Besides, capacity merely sets an upper limit and doesn't directly correlate with the amount of memory actually used.

The individual constant just implies that the keep-time can be different and changed when needed.

src/storage/txn/txn_status_cache.rs

Signed-off-by: ekexium <eke@fastmail.com>

ekexium · 2024-09-19T03:06:59Z

/retest

ekexium · 2024-09-20T06:30:18Z

/retest

src/storage/txn/txn_status_cache.rs

ti-chi-bot · 2024-09-24T06:35:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfzjywxk, zyguan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cfzjywxk,zyguan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2024-09-24T06:35:29Z

[LGTM Timeline notifier]

Timeline:

2024-09-19 03:14:12.756871355 +0000 UTC m=+1103722.497295294: ☑️ agreed by cfzjywxk.
2024-09-24 06:35:28.154187192 +0000 UTC m=+1547797.894611116: ☑️ agreed by zyguan.

Signed-off-by: ekexium <eke@fastmail.com>

ekexium · 2024-09-24T07:09:08Z

/retest

ti-chi-bot · 2024-09-24T07:09:25Z

@ekexium: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

MyonKeminta · 2024-09-24T06:00:49Z

src/storage/txn/txn_status_cache.rs

+    /// Insert a transaction status into the cache, or update it. The current
+    /// system time should be passed from outside to avoid getting system time
+    /// repeatedly when multiple items are being inserted.
+    pub fn upsert(&self, start_ts: TimeStamp, state: TxnState, now: SystemTime) {


Is upsert a common term? Actually I think the word insert is good enough here, on the contrary, the previous behavior (not replacing or promoting when exists) is the thing that's special.

ekexium changed the title ~~Feat txn status cache large txn~~ txn: introduce large_txn_cache in txn_status_cache Aug 29, 2024

feat: introduce large_txn_cache

18822df

Signed-off-by: ekexium <eke@fastmail.com>

ekexium force-pushed the feat-txn-status-cache-large-txn branch from 3308941 to 18822df Compare August 29, 2024 16:25

fix: replace entry when upsert a normal txn entry

847300e

Signed-off-by: ekexium <eke@fastmail.com>

ekexium marked this pull request as ready for review September 6, 2024 06:46

ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 6, 2024

ekexium requested a review from MyonKeminta September 9, 2024 09:49

This was referenced Sep 10, 2024

*: implement broadcast_txn_status for tikv service #17522

Closed

Prevent pipelined transactions from blocking resolved-ts #17459

Closed

MyonKeminta reviewed Sep 14, 2024

View reviewed changes

address comments

ea44196

Signed-off-by: ekexium <eke@fastmail.com>

cfzjywxk reviewed Sep 18, 2024

View reviewed changes

ekexium force-pushed the feat-txn-status-cache-large-txn branch from 9b6d3f0 to ac3b3c3 Compare September 18, 2024 16:04

refine: don't update committed or rolled back txns in normal cache

1eb9a0a

Signed-off-by: ekexium <eke@fastmail.com>

ekexium force-pushed the feat-txn-status-cache-large-txn branch from ac3b3c3 to 1eb9a0a Compare September 19, 2024 02:45

cfzjywxk approved these changes Sep 19, 2024

View reviewed changes

ekexium requested a review from MyonKeminta September 24, 2024 01:56

zyguan approved these changes Sep 24, 2024

View reviewed changes

src/storage/txn/txn_status_cache.rs Show resolved Hide resolved

ti-chi-bot bot added the lgtm label Sep 24, 2024

ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 24, 2024

ti-chi-bot bot and others added 2 commits September 24, 2024 06:39

Merge branch 'master' into feat-txn-status-cache-large-txn

cde03c1

assert min_commit_ts <= commit_ts

001f809

Signed-off-by: ekexium <eke@fastmail.com>

Merge branch 'master' into feat-txn-status-cache-large-txn

c89afbe

ti-chi-bot bot merged commit ddd5da8 into tikv:master Sep 24, 2024

ti-chi-bot bot added this to the Pool milestone Sep 24, 2024

MyonKeminta reviewed Sep 24, 2024

View reviewed changes

Conversation

ekexium commented Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is changed and how it works?

Related changes

Check List

Release note

Uh oh!

ti-chi-bot bot commented Aug 29, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ekexium commented Sep 19, 2024

Uh oh!

ekexium commented Sep 20, 2024

Uh oh!

Uh oh!

ti-chi-bot bot commented Sep 24, 2024

Uh oh!

ti-chi-bot bot commented Sep 24, 2024

[LGTM Timeline notifier]

Uh oh!

ekexium commented Sep 24, 2024

Uh oh!

ti-chi-bot bot commented Sep 24, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ekexium commented Aug 29, 2024 •

edited

Loading