On demand heartbeats via --heartbeat_on_demand_duration, used by the tablet throttler#10198
Merged
shlomi-noach merged 17 commits intovitessio:mainfrom May 9, 2022
Merged
Conversation
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Contributor
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
rohit-nayak-ps
previously approved these changes
May 3, 2022
doc/releasenotes/14_0_0_summary.md
Outdated
|
|
||
| The default value for `--heartbeat_on_demand_duration` is zero, which means the flag is not set and there is no change in behavior. | ||
|
|
||
| When `--heartbeat_on_demand_duration` has a positive value, then heartbeats are only injected on demand, per internal requests. For example, when `--heartbeat_on_demand_duration=5s`, the tablet starts without injecting heartbeats. An internal module, like the lag throttle, may request the heartbeat writer for heartbeats. Starting at that point in time, and for the duration of `5s` in our example, the tablet will write heartbeats. If no other requests come in during that duraiton, then the tables then ceases to write heartbeats. If more requests for heartbeats come while heartbeats are being written, then the tablet extends the heartbeat duration for the next `5s` following up each request. Thus, it stops writing heartbeats `5s` after the last request is received. |
Member
There was a problem hiding this comment.
duraiton=>duration
"the tables then ceases" => tablet
doc/releasenotes/14_0_0_summary.md
Outdated
|
|
||
| The throttler now checks in with the heartbeat writer to request heartbeats, any time it (the throttler) is asked for a check. | ||
|
|
||
| When `--heartbeat_on_demand_duration` is not set, there is now change in behavior. |
| } | ||
| // In this function we're going to create a timer to activate heartbeats by-demand. Creating a timer has a cost. | ||
| // Now, this function can be spammed by clients (the lag throttler). We therefore only allow this function to | ||
| // actually once per X seconds (1/4 of onDemandDuration as a reasonable oversampling value): |
@shlomi: do you need to add the heartbeat module to git?
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Contributor
Author
|
I think there's a legitimate CI failure in |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…emand Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Contributor
Author
|
OK, solved the failing test - which actually validated the changes here. Added more testing to validate the changes. |
2 tasks
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Contributor
Author
|
some more legit failures. Will only look at these next week. |
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…he change to create _vt.schema_migrations. This is a simple fix to ensure table exists before issuing any query from QueryExecutor Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Contributor
Author
|
Good for a renewed review. Fixed tests. I believe I reduced some flakiness, or at least did not introduce new ones! |
This was referenced May 9, 2022
Merged
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #10196, and see the general description of the problem and suggested fix in #10196.
This PR introduces a new command line flag,
--heartbeat_on_demand_duration. By default it's zero, which means it's disabled and has no effect (no change in behavior).When positive, e.g.
--heartbeat_on_demand_duration=5s, and assuming--heartbeat_enableis set, then the tablet does not in fact start generating heartbeats on startup. Instead, it only generates heartbeats on a lease per demand.In this PR the tablet throttler is currently the only entity which asks the heartbeats writer to generate on demand heartbeats. Right now, the tablet throttler makes such requests any time anyone runs a throttler
check. Normally, no one does, and so the throttler does not make any request for heartbeats, and so no heartbeats are generated.However, whenever a
vreplicationworkflow begins, or whenever an Online DDL operation begins, the throttler is called (assuming--enable_lag_throttleris enabled, of course), and in turn the throttler asks the heartbeat writer to generate heartbeats.Once the workflow or migration complete, and assuming no other workflows/migrations running, the throttler stops asking for heartbeats, the leas expires, the heartbeat writer ceases to write heartbeats.
The effect is that we keep the binary logs small, as we don't write heartbeats when those are not strictly needed.
As tests go, I've added
--heartbeat_on_demand_duration=5sto allendtoend/onlineddltests where--enable_lag_throttleris set; a coupleendtoendwhere--heartbeat_on_demand_duration=5sis set are left without--heartbeat_on_demand_duration.Related Issue(s)
#10196
Checklist
Deployment Notes