fix tiflash resource control low token signal miss#10140
fix tiflash resource control low token signal miss#10140ti-chi-bot[bot] merged 23 commits intopingcap:masterfrom
Conversation
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
There was a problem hiding this comment.
Pull Request Overview
This PR fixes the missed low token signal in TiFlash resource control by updating the TokenBucket low token threshold calculation, refining time-based functions with explicit timestamps, and reducing the minimum wait duration.
- Update low token threshold to use initial tokens instead of capacity.
- Revise TokenBucket functions (consume, peek, reConfig, compact, getDynamicTokens) to accept an explicit timestamp parameter.
- Reduce min_wait_dura_ms from 100ms to 10ms and update wait duration usage in the scheduling queue.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| dbms/src/Flash/ResourceControl/TokenBucket.h | Adjusted initialization of low token threshold and updated function signatures to accept timestamps. |
| dbms/src/Flash/ResourceControl/TokenBucket.cpp | Refactored functions to use provided timestamps; removed unnecessary iostream include. |
| dbms/src/Flash/ResourceControl/MockLocalAdmissionController.cpp | Updated reConfig call to include current time. |
| dbms/src/Flash/Pipeline/Schedule/TaskQueues/tests/gtest_resource_control_queue.cpp | Updated reConfig calls to include timestamps. |
| dbms/src/Flash/Pipeline/Schedule/TaskQueues/ResourceControlQueue.cpp | Updated wait duration variable and comments to reflect new timeout logic. |
| dbms/src/Common/TiFlashMetrics.h | Added new metrics for low token threshold, GAC requests, and degradation mode. |
There was a problem hiding this comment.
Pull Request Overview
This PR fixes the low token signal issue in the TiFlash resource control by adjusting how the low token threshold is computed, refactoring time-related parameters in token bucket methods, and reducing the minimal wait duration. Key changes include:
- Changing the low token threshold calculation in TokenBucket to use init_tokens instead of capacity.
- Updating related APIs (consume, reConfig, peek, compact, getDynamicTokens) to require an explicit timestamp.
- Adjusting the minimal wait duration from 100ms to 10ms and updating related configurations and metrics.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| dbms/src/Flash/ResourceControl/TokenBucket.h | Updated threshold calculation and API signatures to include time. |
| dbms/src/Flash/ResourceControl/TokenBucket.cpp | Modified function implementations to use the new time parameter. |
| dbms/src/Flash/ResourceControl/MockLocalAdmissionController.cpp | Updated reConfig call to include a timestamp. |
| dbms/src/Flash/Pipeline/Schedule/TaskQueues/tests/gtest_resource_control_queue.cpp | Adjusted reConfig invocation with proper timestamp. |
| dbms/src/Flash/Pipeline/Schedule/TaskQueues/ResourceControlQueue.cpp | Changed wait duration assignment for scheduling tasks. |
| dbms/src/Common/TiFlashMetrics.h | Added new metrics for low token threshold and GAC-related counts. |
Comments suppressed due to low confidence (2)
dbms/src/Flash/ResourceControl/TokenBucket.h:47
- [nitpick] Please confirm that computing low_token_threshold using init_tokens_ instead of capacity_ is the intended behavior, as this change may affect the signaling logic.
, low_token_threshold(LOW_TOKEN_THRESHOLD_RATE * init_tokens_)
dbms/src/Flash/Pipeline/Schedule/TaskQueues/ResourceControlQueue.cpp:117
- [nitpick] Verify that using DEFAULT_MAX_EST_WAIT_DURATION for wait_dura achieves the desired scheduling responsiveness compared to the previous DEFAULT_FETCH_GAC_INTERVAL_MS.
UInt64 wait_dura = LocalAdmissionController::DEFAULT_MAX_EST_WAIT_DURATION.count();
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Refine some error message for troubleshooting
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JaySon-Huang, solotzg The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
Signed-off-by: JaySon-Huang <tshent@qq.com>
|
/run-cherry-picker |
|
In response to a cherrypick label: new pull request created to branch |
|
In response to a cherrypick label: new pull request created to branch |
|
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
What problem does this PR solve?
Issue Number: close #10137
Problem Summary:
before
wait_forignore low token signalafter:
mainLoopthread will compute ru speed every 1srequestGACLoopthread is dedicated to sending gac threadmin_wait_dura_msfrom 100ms to 10ms.What is changed and how it works?
Check List
Tests
high.change.sh
periodically.change.sh
priority.change.sh

burstable.change.sh

Side effects
Documentation
Release note