Skip to content

scheduledlogging: shorten TestCaptureIndexUsageStats run time#89319

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
THardy98:shorten_capture_index_usage_stats_test
Oct 6, 2022
Merged

scheduledlogging: shorten TestCaptureIndexUsageStats run time#89319
craig[bot] merged 1 commit intocockroachdb:masterfrom
THardy98:shorten_capture_index_usage_stats_test

Conversation

@THardy98
Copy link
Copy Markdown

@THardy98 THardy98 commented Oct 4, 2022

Resolves: #87772

Previously, TestCaptureIndexUsageStats ran through 4 iterations of 20 seconds for a run time of over a minute. This change reduces the run time of the test to under 10 seconds.

Release note: None

@THardy98 THardy98 requested review from a team and knz October 4, 2022 18:54
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from cd40aa4 to 2ebdca3 Compare October 4, 2022 19:11
// Verify that a second schedule has run after the enabled interval has passed.
// Wait for channel value from end of 1st schedule.
<-scheduleCompleteChan
time.Sleep(timeBuffer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? Can't you have the main code signal to the test when it's finished writing to logs?

Copy link
Copy Markdown
Author

@THardy98 THardy98 Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main code signals to the channel here:

https://github.com/cockroachdb/cockroach/pull/89319/files#diff-567a23559822ff60b8d9d63b7fda0e82a78d11fa6dced531e3f6d24966282362R165

after the logging has completed here:
https://github.com/cockroachdb/cockroach/pull/89319/files#diff-567a23559822ff60b8d9d63b7fda0e82a78d11fa6dced531e3f6d24966282362R153.

For some reason - that I have not been able to figure out why - the logs don't appear if you check immediately, despite logging being completed. The 1 second buffer seems to consistently allow for the logs to appear in the log file, including under stress.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you're missing log.Flush?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, was missing log.Flush, will remove the time.Sleep calls.

// Verify that a third schedule has run after the overlap duration has passed.
// Wait for channel value from end of 2nd schedule.
<-scheduleCompleteChan
time.Sleep(timeBuffer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed time.Sleep

}, sd.getOverlapDuration()+timeBuffer)
// Wait for channel value from end of 3rd schedule.
<-scheduleCompleteChan
time.Sleep(timeBuffer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed time.Sleep

@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from 2ebdca3 to 2f0dc3f Compare October 5, 2022 15:11
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection from me, but please get a review from your own team and teach them what's going on here.

Reviewed 3 of 3 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @THardy98)

@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from 2f0dc3f to 4a3faf8 Compare October 5, 2022 15:49
Copy link
Copy Markdown
Contributor

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reasoning behind using a channel directly in the knobs over a callback function (that then does the same thing, but keeps the channel local to the test). Just wondering if a callback might offer some more flexibility in the future.

@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from 4a3faf8 to 127b34f Compare October 5, 2022 19:19
@THardy98
Copy link
Copy Markdown
Author

THardy98 commented Oct 5, 2022

What was the reasoning behind using a channel directly in the knobs over a callback function (that then does the same thing, but keeps the channel local to the test). Just wondering if a callback might offer some more flexibility in the future.

None, I hadn't considered a callback function though I think it's a better idea for the reasons you mentioned. Changed to use a callback instead.

@xinhaoz
Copy link
Copy Markdown
Contributor

xinhaoz commented Oct 5, 2022

pkg/sql/scheduledlogging/captured_index_usage_stats.go line 116 at r3 (raw file):

	}
	// Otherwise, schedule the next interval normally.
	return telemetryCaptureIndexUsageStatsInterval.Get(&s.st.SV) - s.getLoggingDuration()

Can we save the result of s.getLoggingDuration() prior to this block? I guess there is an off chance that the second call here will actually result in a negative value otherwise.

Code quote:

	if s.getLoggingDuration() >= telemetryCaptureIndexUsageStatsInterval.Get(&s.st.SV) {
		return s.durationOnOverlap()
	}
	// Otherwise, schedule the next interval normally.
	return telemetryCaptureIndexUsageStatsInterval.Get(&s.st.SV) - s.getLoggingDuration()

Previously, TestCaptureIndexUsageStats ran through 4 iterations of 20
seconds for a run time of over a minute. This change reduces the run
time of the test to under 10 seconds.

Release note: None
@THardy98 THardy98 force-pushed the shorten_capture_index_usage_stats_test branch from 127b34f to bb18504 Compare October 6, 2022 14:02
Copy link
Copy Markdown
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @xinhaoz)


pkg/sql/scheduledlogging/captured_index_usage_stats.go line 116 at r3 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

Can we save the result of s.getLoggingDuration() prior to this block? I guess there is an off chance that the second call here will actually result in a negative value otherwise.

Done

Copy link
Copy Markdown
Contributor

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just wondering if anyone knows the answer to the question below.

return
case <-timer.C:
if !telemetryCaptureIndexUsageStatsEnabled.Get(&s.st.SV) {
timer.Reset(telemetryCaptureIndexUsageStatsStatusCheckEnabledInterval.Get(&s.st.SV))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering out loud here, but I noticed this timer was created directly from the std lib over the one wrapped in timeutil pkg. I recall reading that the timeutil one had a fix relating to Reset or something. Does it matter which one we're using when the return value of Reset isn't being used? 🤔

@THardy98
Copy link
Copy Markdown
Author

THardy98 commented Oct 6, 2022

TYFR :)

@THardy98
Copy link
Copy Markdown
Author

THardy98 commented Oct 6, 2022

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 6, 2022

Build succeeded:

@craig craig bot merged commit ea43f89 into cockroachdb:master Oct 6, 2022
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Oct 6, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from bb18504 to blathers/backport-release-22.1-89319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


error creating merge commit from bb18504 to blathers/backport-release-22.2-89319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


error creating merge commit from bb18504 to blathers/backport-release-22.2.0-89319: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.0 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@maryliag
Copy link
Copy Markdown
Contributor

looks like your backports failed to be created, can you create them? (you don't need one for 22.2.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scheduledlogging: TestCaptureIndexUsageStats is unreasonably long to run

5 participants