Tablet throttler: get remote tablets metrics from Realtime Stats , with auto-detection#13034
Closed
shlomi-noach wants to merge 9 commits intovitessio:mainfrom
Closed
Tablet throttler: get remote tablets metrics from Realtime Stats , with auto-detection#13034shlomi-noach wants to merge 9 commits intovitessio:mainfrom
shlomi-noach wants to merge 9 commits intovitessio:mainfrom
Conversation
…imeStats Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…id actively probing for relevant tablet Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Contributor
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
4 tasks
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Contributor
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
Contributor
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
Contributor
Author
|
We're not going to pursue this path. Instead, we will convert throttler's HTTP calls with RPC calls. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
An enhancement of #13018 ; per #13018 (comment), this is a modification where the newly introduced
--feature-throttler-read-realtime-statscommand line flag is not required, and removed in this PR.In this PR we track availability of throttler metrics in
RealtimeStats. If a throttle metric was seen inRealtimeStatsin the past minute, we do not run probes on the relevant tablet. If no metric has been seen for a tablet in the past minute, then the throttler runs the usual probes (currently HTTP based) for that tablet.I'm not sure this approach is better than #13018, and the reason has to do with probe frequency. The
PRIMARYtablet runs the standard probes run in subsecond intervals. However, it has no control over the probing frequency in other tablets. Thus, if--health_check_intervalis high on replica tablets, say10s, that means thePRIMARYhas low resolution for throttler metrics (in particular, replication lag).It does make sense when the throttler's threshold accommodates
--health_check_interval. For example,health_check_intervalof5smakes sense if throttler is configured for the default replication lag metric, and the threshold is configured to, say,30. But if the threshold is at5s, then I'd expect a1s-2svalue forhealth_check_interval.Related Issue(s)
Checklist
Deployment Notes