jobs: fix test for batch jobs creation, marked as flaky by sajjadrizvi · Pull Request #69014 · cockroachdb/cockroach

sajjadrizvi · 2021-08-16T21:08:00Z

Commit #67991 introduced a test that turned out to be flaky.
The test runs out of memory sometimes as it creates a very
large batch of jobs. This fix disables job adoptions to avoid
large memory use.

Release note: None

Fixes: #68962

cockroach-teamcity · 2021-08-16T21:08:09Z

This change is

ajwerner

The test runs out of memory sometimes as it creates a very
large batch of jobs.

While I could be fine removing this test, I think we need to unpack this analysis a little more deeply. 5k jobs is not a lot of memory. I see the log from the failure says could not mark as succeeded: log-job: root: memory budget exceeded: 10240 bytes requested, 134215680 currently allocated, 134217728 bytes in budget but I think we need to ask what this means as it's farfetched that 5k empty jobs would use a 132MiB.

please reference the issue that this PR corresponds to.

Reviewable status: complete! 0 of 0 LGTMs obtained

ajwerner

It seems to me like the problem here is that all these jobs are getting adopted and we really don't want them to. In the course of running the adoption, we then do spend a good bit of time actually adopting these jobs. What I think you should do is create a testing knob to disable job adoption and use it in this test. See

cockroach/pkg/jobs/registry.go

Line 1315 in 852934c

func (r *Registry) adoptionDisabled(ctx context.Context) bool {

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sajjadrizvi)

pkg/jobs/registry_test.go, line 351 at r1 (raw file):

Quoted 4 lines of code…

				tdb.CheckQueryResultsRetry(t, "SELECT count(*) FROM [SHOW JOBS]",
					[][]string{{fmt.Sprintf("%d", test.batchSize)}})
				for _, id := range jobIDs {
					tdb.CheckQueryResultsRetry(t, fmt.Sprintf("SELECT status FROM system.jobs WHERE id = '%d'", id),

In retrospect this should not have been CheckQueryResultsRetry as it should not need to retry.

sajjadrizvi · 2021-08-16T22:55:40Z

In retrospect this should not have been CheckQueryResultsRetry as it should not need to retry.

Yes, the main culprit here is retries.

ajwerner · 2021-08-16T22:57:35Z

Yes, the main culprit here is retries.

Really? What makes you say that?

sajjadrizvi · 2021-08-16T23:20:41Z

Really? What makes you say that?

I am wrong. I saw the failure message adoption completed with error job â€¹684636351025086465â€º: could not mark as succeeded, but it itself is because of running out of memory.

ajwerner · 2021-08-17T02:11:23Z

It's hard to say whether you're wrong but at least I don't think what you said is very clear. Any of that memory allocated to the query which returned that error would be free by the time that error returns, or, at least, it should be. It does not accumulate forever. The memory monitoring framework is mostly just hooked up to the query execution layer and once that error comes back to the jobs layer, the corresponding memory account will have been closed.

Commit cockroachdb#67991 introduced a test that turned out to be flaky. The test runs out of memory sometimes as it creates a very large batch of jobs. This fix disables job adoptions to avoid large memory use. Release note: None Fixes: cockroachdb#68962

ajwerner

Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sajjadrizvi)

sajjadrizvi · 2021-08-17T18:58:04Z

TFTR!

bors r=ajwerner

craig · 2021-08-17T20:18:24Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2021-08-17T23:48:17Z

Build succeeded:

GitHub CI (Cockroach)

ajwerner reviewed Aug 16, 2021

View reviewed changes

sajjadrizvi force-pushed the remove_extra_test_batch_jobs branch from efb7c3b to 913a1f4 Compare August 17, 2021 14:24

sajjadrizvi changed the title ~~jobs: remove unnecessary test for batch jobs creation~~ jobs: fix test for batch jobs creation, marked as flaky Aug 17, 2021

ajwerner approved these changes Aug 17, 2021

View reviewed changes

sajjadrizvi marked this pull request as ready for review August 17, 2021 18:57

sajjadrizvi requested a review from a team August 17, 2021 18:57

craig bot merged commit 5d2c91c into cockroachdb:master Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobs: fix test for batch jobs creation, marked as flaky#69014

jobs: fix test for batch jobs creation, marked as flaky#69014
craig[bot] merged 1 commit intocockroachdb:masterfrom
sajjadrizvi:remove_extra_test_batch_jobs

sajjadrizvi commented Aug 16, 2021 •

edited

Loading

Uh oh!

cockroach-teamcity commented Aug 16, 2021

Uh oh!

ajwerner left a comment

Uh oh!

ajwerner left a comment

Uh oh!

sajjadrizvi commented Aug 16, 2021

Uh oh!

ajwerner commented Aug 16, 2021

Uh oh!

sajjadrizvi commented Aug 16, 2021

Uh oh!

ajwerner commented Aug 17, 2021

Uh oh!

ajwerner left a comment

Uh oh!

sajjadrizvi commented Aug 17, 2021

Uh oh!

craig bot commented Aug 17, 2021

Uh oh!

craig bot commented Aug 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sajjadrizvi commented Aug 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Aug 16, 2021

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

sajjadrizvi commented Aug 16, 2021

Uh oh!

ajwerner commented Aug 16, 2021

Uh oh!

sajjadrizvi commented Aug 16, 2021

Uh oh!

ajwerner commented Aug 17, 2021

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

sajjadrizvi commented Aug 17, 2021

Uh oh!

craig bot commented Aug 17, 2021

Uh oh!

craig bot commented Aug 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sajjadrizvi commented Aug 16, 2021 •

edited

Loading