Skip to content

changefeedccl: avoid telemetry double count on txn retry#91812

Closed
stevendanna wants to merge 1 commit intocockroachdb:masterfrom
stevendanna:telemetry-double-count
Closed

changefeedccl: avoid telemetry double count on txn retry#91812
stevendanna wants to merge 1 commit intocockroachdb:masterfrom
stevendanna:telemetry-double-count

Conversation

@stevendanna
Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna commented Nov 13, 2022

Previously, if the ALTER CHANGEFEED txn retried, we would increment these counters again.

In #91563 we are making a change that made it more likely that this transaction may retry during one of the test, revealing the issue.

Epic: None

Release note: None

@stevendanna stevendanna requested a review from a team as a code owner November 13, 2022 19:39
@stevendanna stevendanna requested review from miretskiy and removed request for a team November 13, 2022 19:39
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

}

func (a *telemetryAggregator) CountBucketed(path string, value int64) {
a.txn.AddCommitTrigger(func(context.Context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't that add multiple triggers if txn retries?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We clear out the transaction when we prepare for retry. But, your comment prompted me to go look and there is a code comment that points to #18170 which indicates maybe we might have a problem still along some code paths.

Perhaps we can install a single callback and use something like sync.Once to make sure we only call it once even if it doesn't get reset.

Copy link
Copy Markdown
Collaborator Author

@stevendanna stevendanna Nov 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a test that checks that we don't double count on a retry-like event (I rollback to a SAVEPOINT at the start of the txn).

Still not sure I love using AddCommitTrigger for this though. I suppose one question is what we want the semantics of these counters to be. For instance, it seems reasonable that an automatic txn retry isn't double counted. But the case in the test I added is less clear. If the user interactively typed the statement twice and just happened to rollback. Using AddCommitTrigger also have the side-effect of only counting it in the case of a successful commit. Which, might be a bit odd if we are just trying to get a sense of what features are being used.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree re txn.AddCommitTrigger -- not a fan of it.
I was about to suggest sync.Once;
What do you think about something like https://gist.github.com/miretskiy/2800e81ea4ce779eb02a86d21e143879 instead? Just use withTelemetryUpdate fn in alter statement

This sounds like a bigger problem though, right? Every hook based statement that updates telemetry (which is every hook based statement) will be impacted? Do we have an issue for that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a bigger problem though, right?

Yes, my guess is it affects all of the hook based statements.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a small issue here for it: #91827

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about something like gist.github.com/miretskiy/2800e81ea4ce779eb02a86d21e143879 instead? Just use withTelemetryUpdate fn in alter statement

I'm not sure that works because I believe the entire plan hook gets re-run on a retry.

Previously, if the ALTER CHANGEFEED txn retried, we would increment
these counters again.

In cockroachdb#91563 we are making a change that made it more likely that this
transaction may retry during one of the test, revealing the issue.

Release note: None
@stevendanna
Copy link
Copy Markdown
Collaborator Author

If we do something here, it won't be this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants