txn: introduce large_txn_cache in txn_status_cache#17460
txn: introduce large_txn_cache in txn_status_cache#17460ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
Signed-off-by: ekexium <eke@fastmail.com>
3308941 to
18822df
Compare
Signed-off-by: ekexium <eke@fastmail.com>
| //! write operation that the request needs to perform. | ||
|
|
||
| //! | ||
| //! ## Design Update-1: Dual-Cache System |
There was a problem hiding this comment.
It's actually ok to directly edit the previous design, I think
There was a problem hiding this comment.
Yes. I was thinking it's good to preserve the design history, which shows how this module has evolved.
| slots: Vec<CachePadded<Mutex<TxnStatusCacheSlot>>>, | ||
| // default cache for committed txns | ||
| normal_cache: Vec<CachePadded<Mutex<TxnStatusCacheSlot>>>, | ||
| // for large txns, or any txn whose min_commit_ts needs to be cached |
There was a problem hiding this comment.
Better describe the purpose in detail here, as well as the reason to separate it from normal_cache (I wonder the reason, too 🤔)
There was a problem hiding this comment.
Because they actually serve different purposes. The design prevents large transactions from being evicted due to normal transactions. This is how it "prioritizes" large transactions.
| let mut large_txn_cache = self.large_txn_cache[slot_index].lock(); | ||
| if let Some(entry) = large_txn_cache.get(&start_ts) { | ||
| return Some(entry.state.clone()); | ||
| } | ||
|
|
||
| let mut normal_cache = self.normal_cache[slot_index].lock(); | ||
| if let Some(entry) = normal_cache.get(&start_ts) { | ||
| return Some(entry.state.clone()); | ||
| } |
There was a problem hiding this comment.
Is it possible to use a single cache to hold normal transactions and large transactions, so that there won't be any non-atomicity between two caches?
There was a problem hiding this comment.
It's definitely possible. But I think it would be less robust against worst case scenario.
Btw the two caches can be viewed as two individual parts and they do not depend on each other. So I suppose there should be no risks here?
Signed-off-by: ekexium <eke@fastmail.com>
| //! request to be cached in the cache can also be treated as large | ||
| //! transactions, as they imply their min_commit_ts are useful. | ||
| //! | ||
| //! - Prioritized Caching: The `large_txn_cache` has a higher priority when |
There was a problem hiding this comment.
Perhaps we should make it clear the large transactions here indicates only pipelined dml transactions.
There was a problem hiding this comment.
While the project currently focuses on pipelined transactions, the module itself doesn't need to be transaction-type specific. It can handle caching min_commit_ts for any large transaction that requires it, not just pipelined DMLs. This approach offers more flexibility and future-proofs our design.
| /// for-filtering-out-unwanted-late-arrived-stale-prewrite-requests) for details | ||
| /// about why this is needed. | ||
| const CACHE_ITEMS_REQUIRED_KEEP_TIME: Duration = Duration::from_secs(30); | ||
| const CACHE_ITEMS_REQUIRED_KEEP_TIME_FOR_LARGE_TXNS: Duration = Duration::from_secs(30); |
There was a problem hiding this comment.
The number of large transactions should be much less than normal transactions, perhaps this constant is unncessary if its value is the same.
While the capacity could be smaller for the large transactions.
There was a problem hiding this comment.
I use the same capacity mainly because I wanted to make the configs as few as possible. It can be confusing if we have 2 separate capacity config items for txn_status_cache. Besides, capacity merely sets an upper limit and doesn't directly correlate with the amount of memory actually used.
The individual constant just implies that the keep-time can be different and changed when needed.
9b6d3f0 to
ac3b3c3
Compare
Signed-off-by: ekexium <eke@fastmail.com>
ac3b3c3 to
1eb9a0a
Compare
|
/retest |
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cfzjywxk, zyguan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: ekexium <eke@fastmail.com>
|
/retest |
|
@ekexium: Your PR was out of date, I have automatically updated it for you. If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
| /// Insert a transaction status into the cache, or update it. The current | ||
| /// system time should be passed from outside to avoid getting system time | ||
| /// repeatedly when multiple items are being inserted. | ||
| pub fn upsert(&self, start_ts: TimeStamp, state: TxnState, now: SystemTime) { |
There was a problem hiding this comment.
Is upsert a common term? Actually I think the word insert is good enough here, on the contrary, the previous behavior (not replacing or promoting when exists) is the thing that's special.
What is changed and how it works?
Issue Number: ref #17459
What's Changed:
Related changes
pingcap/docs/pingcap/docs-cn:Check List
Tests
Side effects
Release note