Skip to content

catalog/lease: handle retryable replica errors properly#118646

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:TestStartupInjectedFailureSingleNodeFailure
Feb 2, 2024
Merged

catalog/lease: handle retryable replica errors properly#118646
craig[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:TestStartupInjectedFailureSingleNodeFailure

Conversation

@fqazi
Copy link
Copy Markdown
Collaborator

@fqazi fqazi commented Feb 2, 2024

Previously, the expiration based leasing mechanism was used, which was more lenient for duplicate rows in the system.lease table. Certain retryable errors like replica errors could lead to rows being written, but not being cleaned up with the assumption nothing was written. In the session based model we are strict and require a single row per descriptor, version and session, so any errors not handled properly will lead to future failures. To address this, this patch with treat both ambiguous result errors and replica errors as retryable. When these are hit the lease will be deleted and re-inserted in case an existing one was persisted.

Fixes: #118520

Release note: None

Previously, the expiration based leasing mechanism was used, which was
more lenient for duplicate rows in the system.lease table. Certain
retryable errors like replica errors could lead to rows being written,
but not being cleaned up with the assumption nothing was written. In the
session based model we are strict and require a single row per
descriptor, version and session, so any errors not handled properly will
lead to future failures. To address this, this patch with treat both
ambiguous result errors and replica errors as retryable.  When these
are hit the lease will be deleted and re-inserted in case an existing one
was persisted.

Fixes: cockroachdb#118520

Release note: None
@fqazi fqazi requested a review from a team as a code owner February 2, 2024 15:40
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Feb 2, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great find! glad we already had IsRetryableReplicaError

@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Feb 2, 2024

@rafiss TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 2, 2024

Build succeeded:

@craig craig bot merged commit 06d9698 into cockroachdb:master Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: TestStartupInjectedFailureSingleNode failed [session based leasing issues]

3 participants