admission: improve granter logs by tbg · Pull Request #80859 · cockroachdb/cockroach

tbg · 2022-05-02T11:04:02Z

To the uninitiated, they were previously difficult to understand.
Now it's clearer what each number means just from looking at the
output.

While rebasing this PR on top of #80480, I realized how hard it
was (at least for me) to keep track of the many quantities involved. The
way the updates to the state were computed (were the old struct is
updated incrementally in-place) additionally made me worried
that bugs could easily be introduced. Some quantities are cumulative,
some are for the current interval but contain other quantities, etc - it
all melted my brain with ease.

I decided to at least try to make it easier for the next newcomer to
this codebase to get their bearings by constructing all of the updates
first, and then overwriting the old state. I additionally renamed all
of the variables according to a fixed pattern so that it is now clear
which ones are for the current interval or cumulative, and added
commentary to (hopefully) clarify which quantities contain each other.
Additionally, all quantities are now stored for posterity, so in
principle it's now straightforward to write a test case on any of the
intermediate quantities for any desired inputs.

More could be done here but I feel that this leaves things in a good
place. There's a chance that I have written equally intricate code but
happen to like it more because I wrote it. I trust my reviewers to keep
me honest.

Release note: None

cockroach-teamcity · 2022-05-02T11:04:09Z

This change is

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)

pkg/util/admission/granter.go line 1727 at r1 (raw file):

		}
		if doLog {
			avgAdded := bytesAdded / int64(admitted)

thanks for doing this. Is this something you wanted to backport to 22.1 and 21.2?
I ask since the log statement is changing with the byte tokens PR

cockroach/pkg/util/admission/granter.go

Line 1832 in 81aa87d

log.Infof(ctx,

It still lacks the humanized formatting, so something that will need to be fixed there too.

tbg

Big update, PTAL. I realize this may make it harder to backport later changes (if we're not backporting this one as well). If you're concerned about that let's discuss dropping this PR.

Please review carefully, I am still worried I messed something up here though the test diffs seem to indicate that I haven't.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

pkg/util/admission/granter.go line 1727 at r1 (raw file):

Previously, sumeerbhola wrote…

thanks for doing this. Is this something you wanted to backport to 22.1 and 21.2?
I ask since the log statement is changing with the byte tokens PR

cockroach/pkg/util/admission/granter.go

Line 1832 in 81aa87d

log.Infof(ctx,

It still lacks the humanized formatting, so something that will need to be fixed there too.

I'll defer to you in terms of backporting. I'm not sure how good the test coverage here is. I don't seem to have introduced any changes (please carefully review) but there is now a large diff. Then again, not backporting may make it impossible to backport later changes. If you think that it is too risky to make a refactor of the size I'm doing here, I'm happy to hold off, though there isn't typically a better time in the cycle to do so and so it might mean the improvements here won't happen.

I think if this bakes for some time we could probably backport it to 22.1, at least if we're also backporting the bytes change. (Not sure what the plan there was).

tbg · 2022-05-11T10:08:35Z

Rebased, PTAL after Infest.

sumeerbhola

Reviewed 3 of 7 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola and @tbg)

pkg/util/admission/granter.go line 1626 at r3 (raw file):

	// represents what has been given out. The units is bytes.
	totalTokens     int64
	tokensAllocated int64

all of these are moved into ioLoadListenerState, yes? I like the comments and better names.

pkg/util/admission/granter.go line 1750 at r3 (raw file):

	intL0CompactedBytes         int64
	smoothedIntL0CompactedBytes float64

is this smoothed value the same as in ioLoadListenerState? Same question for the two other Smoothed* members in this struct.

pkg/util/admission/granter.go line 1812 at r3 (raw file):

	// intIngestedBytes are the bytes ingested since the last token adjustment. Bytes
	// tracked here are also tracked in intAdmittedBytes.
	intIngestedBytes := cumAdmissionStats.ingestedBytes - prev.cumAdmissionStats.ingestedBytes

s/intIngestedBytes/intIngestedAccountedBytes/

pkg/util/admission/granter.go line 1860 at r3 (raw file):

		// Keep the old smoothed value around, i.e. don't update it with a zero and
		// don't zero it out.
		// TODO(during review): why?

we've not had work flow through admission control, so we don't have any reasonable interval data to update our previous estimates.

pkg/util/admission/granter.go line 1864 at r3 (raw file):

	}

	// TODO(during review): aren't we double-penalizing here? Let's say 1GB was

I think what you are suggesting is potentially a higher fidelity model, but I maybe misunderstanding what you are saying.
We are currently taking all the unaccounted bytes that landed into L0 (ingests or flush) and spreading them evenly over what was admitted. This will then be fed into storeRequestEstimates.workByteAddition which is used by WorkQueue for both regular writes and ingests.

I don't understand the "penalize ingestions" question. The l0BytesIngestedFraction is solely based on what was reported by the WorkQueue. That is, intIngestedBytes is not from the Pebble metrics. This fraction calculation is only to minimize the error in the interval between Admit and AdmittedWorkDone: if there a 1MB ingest, we want to make a reasonable guess on how many tokens to deduct based on past observations, and then we will fix any error in AdmittedWorkDone. So this is not saying anything about what silently bypassed admission control.
Renaming intIngestedBytes to intIngestedAccountedBytes would make this clearer.

pkg/util/admission/granter.go line 1907 at r3 (raw file):

		}
	} else {
		// Under the threshold. Maintain a smoothedNumTokens based on what was

Maintain a smoothedTotalNumByteTokens ...

pkg/util/admission/work_queue.go line 1650 at r3 (raw file):

			q.mu.stats.admittedBytes += uint64(h.writeBytes)
			if h.ingestRequest {
				q.mu.stats.ingestedBytes += uint64(h.writeBytes)

ingestedAccountedBytes?
Though by definition, all the WorkQueue stats are accounted.

To the uninitiated, they were previously difficult to understand. Now it's clearer what each number means just from looking at the output. While rebasing this PR on top of cockroachdb#80480, I realized how hard it was (at least for me) to keep track of the many quantities involved. The way the updates to the state were computed (were the old struct is updated incrementally in-place) additionally made me worried that bugs could easily be introduced. Some quantities are cumulative, some are for the current interval but contain other quantities, etc - it all melted my brain with ease. I decided to at least try to make it easier for the next newcomer to this codebase to get their bearings by constructing all of the updates first, and then overwriting the old state. I additionally renamed all of the variables according to a fixed pattern so that it is now clear which ones are for the current interval or cumulative, and added commentary to (hopefully) clarify which quantities contain each other. Additionally, all quantities are now stored for posterity, so in principle it's now straightforward to write a test case on any of the intermediate quantities for any desired inputs. More could be done here but I feel that this leaves things in a good place. There's a chance that I have written equally intricate code but happen to like it more because I wrote it. I trust my reviewers to keep me honest. Release note: None

tbg

Dismissed @sumeerbhola from 4 discussions.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)

pkg/util/admission/granter.go line 1626 at r3 (raw file):

Previously, sumeerbhola wrote…

all of these are moved into ioLoadListenerState, yes? I like the comments and better names.

Yep

pkg/util/admission/granter.go line 1750 at r3 (raw file):

Previously, sumeerbhola wrote…

is this smoothed value the same as in ioLoadListenerState? Same question for the two other Smoothed* members in this struct.

Good catch. I accidentally duplicated some of these fields. Cleaned this

pkg/util/admission/granter.go line 1812 at r3 (raw file):

Previously, sumeerbhola wrote…

s/intIngestedBytes/intIngestedAccountedBytes/

Done.

pkg/util/admission/granter.go line 1860 at r3 (raw file):

Previously, sumeerbhola wrote…

we've not had work flow through admission control, so we don't have any reasonable interval data to update our previous estimates.

Updated with this comment.
Still, is this the right thing to do? You'd expect us to "always measure/update" but just not throttle when admission control is off. Is this just reflecting the fact that when admission control was introduced, we wanted the "off" switch to completely disable most code related to it? As is, it seems that once admission control kicks in again after a long period of being below threshold, we'll initially start out with a mostly bogus smoothed value here. I think this is fine, just trying to deeply understand the rationale here.

pkg/util/admission/granter.go line 1864 at r3 (raw file):

Previously, sumeerbhola wrote…

I think what you are suggesting is potentially a higher fidelity model, but I maybe misunderstanding what you are saying.
We are currently taking all the unaccounted bytes that landed into L0 (ingests or flush) and spreading them evenly over what was admitted. This will then be fed into storeRequestEstimates.workByteAddition which is used by WorkQueue for both regular writes and ingests.

I don't understand the "penalize ingestions" question. The l0BytesIngestedFraction is solely based on what was reported by the WorkQueue. That is, intIngestedBytes is not from the Pebble metrics. This fraction calculation is only to minimize the error in the interval between Admit and AdmittedWorkDone: if there a 1MB ingest, we want to make a reasonable guess on how many tokens to deduct based on past observations, and then we will fix any error in AdmittedWorkDone. So this is not saying anything about what silently bypassed admission control.
Renaming intIngestedBytes to intIngestedAccountedBytes would make this clearer.

I hadn't understood this properly, with your explanation it makes sense.

pkg/util/admission/granter.go line 1907 at r3 (raw file):

Previously, sumeerbhola wrote…

Maintain a smoothedTotalNumByteTokens ...

Done.

pkg/util/admission/work_queue.go line 1650 at r3 (raw file):

Previously, sumeerbhola wrote…

ingestedAccountedBytes?
Though by definition, all the WorkQueue stats are accounted.

If we're going to do the accounted infix (and it looks like we are), then it's better to have it everywhere. Made sure it's present in all of the storeAdmissionStats byte counters.

tbg · 2022-05-17T13:32:03Z

On bazel, TestDataDriven/multiregion failed. On regular CI, TestLogic/5node/distsql_stats

But neither are this PR's fault.

bors r=sumeerbhola

TFTR!

Will send a follow-up if you have further comments.

craig · 2022-05-17T14:28:30Z

Build succeeded:

GitHub CI (Cockroach)

I broke this in cockroachdb#80859. Release note: None

81242: storage: use `SetOptions()` when reusing Pebble iterators r=jbowens a=erikgrinaker Iterator reuse now relies on `Pebble.SetOptions()` to configure the reused Pebble iterator. This allows a wider range of iterators to be reused, since previously only the bounds could be changed on existing iterators. Release note: None 81661: admission: avoid spurious logging when not in IO overload r=sumeerbhola a=tbg I broke this in #80859. Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>

I broke this in cockroachdb#80859. Release note: None

tbg requested a review from sumeerbhola May 2, 2022 11:04

tbg force-pushed the granter-logs branch from 500a1f7 to 4f14bbf Compare May 2, 2022 11:17

tbg marked this pull request as ready for review May 2, 2022 11:17

tbg force-pushed the granter-logs branch from 4f14bbf to f0bd62f Compare May 2, 2022 11:23

sumeerbhola reviewed May 3, 2022

View reviewed changes

tbg force-pushed the granter-logs branch from f0bd62f to 560fd41 Compare May 3, 2022 19:07

tbg requested a review from sumeerbhola May 3, 2022 19:13

tbg commented May 3, 2022

View reviewed changes

tbg force-pushed the granter-logs branch from 560fd41 to 88616e4 Compare May 11, 2022 10:08

sumeerbhola approved these changes May 16, 2022

View reviewed changes

tbg requested a review from sumeerbhola May 17, 2022 07:36

tbg commented May 17, 2022

View reviewed changes

tbg force-pushed the granter-logs branch from 88616e4 to 48dacdb Compare May 17, 2022 07:40

craig bot merged commit 4216e6e into cockroachdb:master May 17, 2022

tbg added a commit to tbg/cockroach that referenced this pull request May 23, 2022

admission: avoid spurious logging when not in IO overload

2354f8d

I broke this in cockroachdb#80859. Release note: None

tbg mentioned this pull request May 23, 2022

admission: avoid spurious logging when not in IO overload #81661

Merged

tbg added a commit to tbg/cockroach that referenced this pull request May 23, 2022

admission: avoid spurious logging when not in IO overload

50649b6

I broke this in cockroachdb#80859. Release note: None

andrewbaptist pushed a commit to andrewbaptist/cockroach that referenced this pull request May 25, 2022

admission: avoid spurious logging when not in IO overload

b63d8f2

I broke this in cockroachdb#80859. Release note: None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission: improve granter logs#80859

admission: improve granter logs#80859
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:granter-logs

tbg commented May 2, 2022 •

edited

Loading

Uh oh!

cockroach-teamcity commented May 2, 2022

Uh oh!

sumeerbhola left a comment

Uh oh!

tbg left a comment

Uh oh!

tbg commented May 11, 2022

Uh oh!

sumeerbhola left a comment

Uh oh!

tbg left a comment

Uh oh!

tbg commented May 17, 2022

Uh oh!

craig bot commented May 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tbg commented May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented May 2, 2022

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

tbg commented May 11, 2022

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

tbg commented May 17, 2022

Uh oh!

craig bot commented May 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tbg commented May 2, 2022 •

edited

Loading