descs: fix txn commit waiting on wrong lease version#87706
descs: fix txn commit waiting on wrong lease version#87706craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
ajwerner
left a comment
There was a problem hiding this comment.
LGTM though I don't have a totally clear understanding as to how it happens.
pkg/sql/catalog/descs/collection.go
Outdated
| "expected original version of uncommitted %s %q (%d) to be non-zero", | ||
| uncommitted.DescriptorType(), uncommitted.GetName(), uncommitted.GetID()) | ||
| } | ||
| // Ignore uncommitted descriptors which for whatever reason have not |
There was a problem hiding this comment.
Do you understand the reason?
There was a problem hiding this comment.
For a reason unknown to me we allow uncommitted descriptors to be added with the same version as the cluster version, in addition to clusterVersion+1. This is not new and I haven't questioned it.
Prior to #87067 these were excluded from consideration by an early return on !mut.IsUncommittedDescriptor(), which happens when the version counter hasn't actually been bumped.
There was a problem hiding this comment.
Perhaps we should file an issue to tighten this. I can't see any instances where AddUncommittedDescriptor isn't preceded by a MaybeIncrementVersion call, unless it's obviously a new descriptor.
There was a problem hiding this comment.
In the old world, before your rewrite, we'd add to uncommittedDescriptors or whatever it was then called (storedDescriptors) on the read path, and the caller which resolved that descriptor might decide to throw that away.
I'd still like to understand how we were modifying this set which, as you say, shouldn't have anything added to it unless it is being written. One note is that the thing which triggers the bug is the underlying call to GetAllDescriptors.
There was a problem hiding this comment.
I looked into this and the bug is deeper, even though the changes brought by this PR are good.
The problem is that we upsert into the storage cache when really we should be inserting. Consider this test case which you supplied. What happens in that transaction is that CREATE SCHEMA sc2 adds an uncommitted db descriptor to the collection with the new schema entry and bumps its version counter. It then Runs that batch. The subsequent crdb_internal.tables query calls a GetAllDescriptors which overwrites the storage cache with the uncommitted descriptor!
This is why the original descriptors in the uncommitted layer can't be relied on.
There was a problem hiding this comment.
This is why the original descriptors in the uncommitted layer can't be relied on.
Yeah... this seems like the right answer. I think we should avoid touching original in AddUncommittedDescriptor when we already have an entry for the descriptor as opposed to pulling from the stored cache.
There was a problem hiding this comment.
Indeed. I'll revert to draft mode and push a fix in that direction and we'll see what happens.
| ) error { | ||
| return ud.uncommitted.IterateByID(func(entry catalog.NameEntry) error { | ||
| if o := ud.original.Get(entry.GetID()); o != nil { | ||
| return fn(lease.NewIDVersionPrev(o.GetName(), o.GetID(), o.(catalog.Descriptor).GetVersion())) |
There was a problem hiding this comment.
The bug proper was that fn was being called a bit too liberally here. The regression was introduced in this PR, I highlighted the line with the relevant part of the old logic: https://github.com/cockroachdb/cockroach/pull/87067/files#diff-74473a29d93e28d8f048009aed817c687c38ab748f111c95fba1ec148f7c0768L664
I fixed this and pushed a more verbose version of this code down to the call site in the hopes of clarifying what it's supposed to do.
|
Well wouldn't you know it, we're updating view back-references in tables based on leased descriptors. |
ajwerner
left a comment
There was a problem hiding this comment.
Now this feels like the right fix!
LGTM
|
Yes I feel much better too! Glad to get rid of this weirdness with the versions. |
Recent work on the descs.Collection (PR cockroachdb#87067) introduced a regression in which it would return bad lease.IDVersion versions for the uncommitted descriptors. In addition to fixing this, this commit corrects the lifecycle of the `original` descriptors in the uncommitted layer, which now remain unaffected by changes in the storage layer. This commit tightens the constraints on what can be passed to AddUncommittedDescriptor. Fixes cockroachdb#87672. Release justification: important bug fix Release note: None
|
Thanks for the review. bors r+ |
|
Build succeeded: |
Recent work on the descs.Collection (PR #87067) introduced a regression
in which it would return bad lease.IDVersion versions for the uncommitted
descriptors.
Fixes #87672.
Release justification: important bug fix
Release note: None