Skip to content

*: Fix incorrect mapped allocation per thread metric (#18126)#18564

Merged
ti-chi-bot[bot] merged 1 commit intotikv:release-7.5from
ti-chi-bot:cherry-pick-18126-to-release-7.5
Jul 11, 2025
Merged

*: Fix incorrect mapped allocation per thread metric (#18126)#18564
ti-chi-bot[bot] merged 1 commit intotikv:release-7.5from
ti-chi-bot:cherry-pick-18126-to-release-7.5

Conversation

@ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #18126

What is changed and how it works?

Issue Number: Close #18125

What's Changed:

Fix incorrect mapped allocation per thread metric

Not all thread builders are hooked by `thread_allocate_exclusive_arena`, so some threads are using shared arena, causing incorrect per thread allocation. 

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note

Fix incorrect mapped allocation per thread metric

@ti-chi-bot ti-chi-bot added dco-signoff: yes Indicates the PR's author has signed the dco. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. type/cherry-pick-for-release-7.5 This PR is cherry-picked to release-7.5 from a source PR. labels Jun 18, 2025
@ti-chi-bot
Copy link
Member Author

@hhwyt This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@hhwyt
Copy link
Contributor

hhwyt commented Jun 19, 2025

Here is the test result: #18563 (comment).

@hhwyt hhwyt requested review from Connor1996, glorv and overvenus June 19, 2025 07:34
@hhwyt hhwyt force-pushed the cherry-pick-18126-to-release-7.5 branch from b11246b to 2107add Compare June 19, 2025 07:35
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 19, 2025
@hhwyt hhwyt force-pushed the cherry-pick-18126-to-release-7.5 branch from 2107add to 1385073 Compare June 19, 2025 07:36
@hhwyt
Copy link
Contributor

hhwyt commented Jun 19, 2025

Once #18563 is merged, I'll rebase the first commit from master.

@hhwyt hhwyt force-pushed the cherry-pick-18126-to-release-7.5 branch 2 times, most recently from 6d43d21 to 3341338 Compare June 19, 2025 07:42
@hhwyt hhwyt requested a review from cfzjywxk June 19, 2025 07:42
Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 19, 2025
@zhangjinpeng87
Copy link
Member

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Metric Descriptor Missing in Collector

The arena_count metric is initialized and collected, but its descriptor is not added to the AllocStatsCollector's descs field. This prevents proper registration of the metric with Prometheus.

components/tikv_util/src/metrics/allocator_metrics.rs#L51-L60

Ok(AllocStatsCollector {
descs: [&stats, &thread_stats, &allocation]
.iter()
.flat_map(|m| m.desc().into_iter().cloned())
.collect(),
memory_stats: stats,
thread_stats,
allocation,
arena_count,
})

Fix in Cursor


Bug: Thread Panic on Jemalloc Failure

The thread_allocate_exclusive_arena().unwrap() call can cause a thread panic if jemalloc operations (e.g., arena creation) fail. This unhandled panic, particularly in thread pool contexts, can prevent threads from starting or disrupt the entire pool. Graceful error handling (logging and continuing) is recommended, as thread startup failure is more disruptive than missing allocation metrics.

components/server/src/server2.rs#L307-L308

unsafe { add_thread_memory_accessor() };
thread_allocate_exclusive_arena().unwrap();

components/server/src/server.rs#L315-L316

unsafe { add_thread_memory_accessor() };
thread_allocate_exclusive_arena().unwrap();

components/tikv_util/src/yatp_pool/mod.rs#L189-L190

tikv_alloc::add_thread_memory_accessor();
tikv_alloc::thread_allocate_exclusive_arena().unwrap();

components/tikv_util/src/sys/thread.rs#L434-L435

unsafe { add_thread_memory_accessor() };
thread_allocate_exclusive_arena().unwrap();

components/tikv_util/src/sys/thread.rs#L458-L459

unsafe { add_thread_memory_accessor() };
thread_allocate_exclusive_arena().unwrap();

components/tikv_util/src/sys/thread.rs#L483-L484

unsafe { add_thread_memory_accessor() };
thread_allocate_exclusive_arena().unwrap();

Fix in Cursor


BugBot free trial expires on July 22, 2025
You have used $0.00 of your $10.00 spend limit so far. Manage your spend limit in the Cursor dashboard.

Was this report helpful? Give feedback by reacting with 👍 or 👎

@hhwyt
Copy link
Contributor

hhwyt commented Jun 24, 2025

@zhangjinpeng87

Bug: Metric Descriptor Missing in Collector

This is true but this does not affect Grafana Dashboard display, so can be ignored.

Bug: Thread Panic on Jemalloc Failure

This is true. While the error-handling can be more graceful as the arena allocation failure is a non-core logic should not cause TiKV panic, such failures are rare in practice, so I think it's ok to keep current implementation.

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 10, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 10, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-06-19 08:21:08.152141632 +0000 UTC m=+346320.875320614: ☑️ agreed by Connor1996.
  • 2025-07-10 02:41:05.028607477 +0000 UTC m=+2140317.751786453: ☑️ agreed by LykxSassinator.

@hhwyt
Copy link
Contributor

hhwyt commented Jul 10, 2025

/hold

@ti-chi-bot ti-chi-bot bot added cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Jul 10, 2025
@ti-chi-bot ti-chi-bot bot added the approved label Jul 11, 2025
@glorv
Copy link
Contributor

glorv commented Jul 11, 2025

/retest

@hhwyt
Copy link
Contributor

hhwyt commented Jul 11, 2025

/hold

close tikv#18125

Fix incorrect mapped allocation per thread metric

Not all thread builders are hooked by `thread_allocate_exclusive_arena`, so some threads are using shared arena, causing incorrect per thread allocation.

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
(cherry picked from commit 18f4419)
Signed-off-by: hhwyt <hhwyt1@gmail.com>
@hhwyt hhwyt force-pushed the cherry-pick-18126-to-release-7.5 branch from 3341338 to 34ac9d6 Compare July 11, 2025 07:44
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jul 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfzjywxk, Connor1996, LykxSassinator

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [Connor1996,LykxSassinator,cfzjywxk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hhwyt
Copy link
Contributor

hhwyt commented Jul 11, 2025

/unhold

@ti-chi-bot ti-chi-bot bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 11, 2025
@ti-chi-bot ti-chi-bot bot merged commit 664fb07 into tikv:release-7.5 Jul 11, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cherry-pick-approved Cherry pick PR approved by release team. dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. type/cherry-pick-for-release-7.5 This PR is cherry-picked to release-7.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants