Skip to content

backupccl: fix progress updating test#58015

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
pbardea:fix-progress-test
Dec 21, 2020
Merged

backupccl: fix progress updating test#58015
craig[bot] merged 1 commit intocockroachdb:masterfrom
pbardea:fix-progress-test

Conversation

@pbardea
Copy link
Copy Markdown
Contributor

@pbardea pbardea commented Dec 17, 2020

This test was previously flaky since it was assuming that backup would
issue the same number of requests as spans issued. This assumption was
incorrect, and fixing the progress accounting for backup revealed that
this test was faulty.

While planning the work distribution for backup worker nodes,
PartitionSpans automatically merges spans that are co-located on the
same node, therefore reducing the number of ExportRequests issued.
Before this commit, we used to block on the ExportRequest responses.
The blocking was triggered on a per-range-request level. However,
progress is only updated when the processor-level batch request is
returned. This meant that the test might block a batch request and
therefore the progress of the job would be less than what the test
expected.

This is now fixed by adjusting the blocking mechanism and range counting
to all be at the level of the merged ranges with which the backup
processor operates.

Fixes #57831.

This test would fail under stress pretty quickly (~40) -- I was able to get
over 600 runs without failure with this patch.

Release note: none

@pbardea pbardea requested review from a team and adityamaru December 17, 2020 03:44
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@pbardea pbardea removed the request for review from a team December 17, 2020 03:45
// produce. It merges contiguous ranges on the same node.
// It sorts ranges by start_key and counts the number of times the
// lease_holder changes by comparing against the previous row's lease_holder.
mergedRangeQuery := `
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice query, didn't even know you could do this kind of thing in SQL.

This test was previously flaky since it was assuming that backup would
issue the same number of requests as spans issued. This assumption was
incorrect, and fixing the progress accounting for backup revealed that
this test was faulty.

While planning the work distribution for backup worker nodes,
PartitionSpans automatically merges spans that are co-located on the
same node, therefore reducing the number of ExportRequests issued.
Before this commit, we used to block on the ExportRequest responses.
The blocking was triggered on a per-range-request level. However,
progress is only updated when the processor-level batch request is
returned. This meant that the test might block a batch request and
therefore the progress of the job would be less than what the test
expected.

This is now fixed by adjusting the blocking mechanism and range counting
to all be at the level of the merged ranges with which the backup
processor operates.

Release note: none
@pbardea
Copy link
Copy Markdown
Contributor Author

pbardea commented Dec 21, 2020

TFTR!
bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 21, 2020

Build failed (retrying...):

@craig craig bot merged commit e552218 into cockroachdb:master Dec 21, 2020
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 21, 2020

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ccl/backupccl: TestBackupRestoreSystemJobsProgress failed

3 participants