Skip to content

catalog/lease: fix error handling for purgeOldVersions#157131

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:leasingDefensive
Nov 17, 2025
Merged

catalog/lease: fix error handling for purgeOldVersions#157131
craig[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:leasingDefensive

Conversation

@fqazi
Copy link
Copy Markdown
Collaborator

@fqazi fqazi commented Nov 10, 2025

Previously, when purging old versions we would acquire a lease on the latest version, which was introduced when we added logic for acquiring leases on the previous version. The logic to acquire the previous version would clear the error, preventing errors from correctly surfacing and causing issues with dropped / offline descriptors. To address this, ensure the acquisition logic for the previous version is only executed if a clean up will occur.

Informs: #156176
Release note: None

@fqazi fqazi requested a review from a team as a code owner November 10, 2025 14:31
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Nov 10, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@fqazi fqazi force-pushed the leasingDefensive branch 5 times, most recently from 64cff3c to e86bbde Compare November 10, 2025 18:09
Copy link
Copy Markdown
Contributor

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spilchen reviewed 1 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/catalog/lease/lease.go line 2331 at r1 (raw file):

	_ = s.RunAsyncTask(ctx, "refresh-leases", func(ctx context.Context) {
		defer func() {
			if err := recover(); err != nil {

Does this code for recover cause us to suppress panics now? Is the behaviour we want? It seems like an orthogonal issue.

Previously, when purging old versions we would acquire a lease on the
latest version, which was introduced when we added logic for acquiring
leases on the previous version. The logic to acquire the previous
version would clear the error, preventing errors from correctly
surfacing and causing issues with dropped / offline descriptors. To
address this, ensure the acquisition logic for the previous version is
only executed if a clean up will occur.

This patch also enhances logging for error scenarios where we could
fail to acquire new leases.

Informs: cockroachdb#156176
Release note: None
Copy link
Copy Markdown
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @spilchen)


pkg/sql/catalog/lease/lease.go line 2331 at r1 (raw file):

Previously, spilchen wrote…

Does this code for recover cause us to suppress panics now? Is the behaviour we want? It seems like an orthogonal issue.

Yes, it makes it easier to detect cases where the async task just crashes on us. I updated these changes to include a rethrow of the panic.

Copy link
Copy Markdown
Contributor

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@spilchen reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @fqazi)

@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Nov 17, 2025

@spilchen TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 17, 2025

@craig craig bot merged commit 0b875de into cockroachdb:master Nov 17, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants