sql: bugfixes in role membership caching#42998
Conversation
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mberhault and @rohany)
pkg/sql/authorization.go, line 200 at r1 (raw file):
// If the cache is empty, allocate space for the cache. if roleMembersCache.userCache == nil { roleMembersCache.userCache = make(map[string]userRoleMembership)
roleMembersCache is shared by all connections on a server, right? This isn't thread safe. Probably not necessary either as the cache is created on invalidation and it is fine to do a lookup on a nil-map.
pkg/sql/authorization.go, line 208 at r1 (raw file):
Required: true, // We need to avoid the table descriptor cache. AvoidCached: true,
Is this going to destroy performance? MemberOfWithAdminOption is called by CheckPrivilege which is called on every statement.
Cc @knz who has more knowledge than I do have how descriptor caching works nowadays.
petermattis
left a comment
There was a problem hiding this comment.
The cache invalidation mechanism relied on using table descriptor
versions to know when to invalidate the cache, but would not bypass
the table descriptor cache when looking up the table descriptor.
As I mentioned in a comment below, I think we need to use the table descriptor cache for performance.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mberhault and @rohany)
rohany
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mberhault, @petermattis, and @rohany)
pkg/sql/authorization.go, line 200 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
roleMembersCacheis shared by all connections on a server, right? This isn't thread safe. Probably not necessary either as the cache is created on invalidation and it is fine to do a lookup on a nil-map.
Oh, I didn't know you could do that. Then we don't need this -- I'll remove it.
|
If we use the tableDescriptor cache, then won't we run into the problems you were worried about in your comment on the original issue? We don't seem to wait for the new table descriptor version to get propagated around. On that note, we didn't seem to see problems around this when it was using the cache... |
That was a theory I was putting forward. I didn't have any evidence that it was actually a problem. I'd revert this part of the PR, or look for guidance from @lucy-zhang and @knz. |
aa275c0 to
7ac31cb
Compare
Fixes cockroachdb#42098. This PR fixes some bugs in role membership caching. * The role membership cache was a single global, which meant multiple servers in the same cockroach process could pollute the cache for each other, causing flaky test failures. * The cache updating mechanism seemed to use the incorrect variable to update the cache version when a concurrent request caused a cache invalidation as well. Although unlikely, this could have lead to incorrect results being served. Release note (bug fix): This change fixes some existing caching issues surrounding role memberships, where users could sometimes see out of date role membership information.
7ac31cb to
362a2d9
Compare
|
That makes sense -- I'll revert that change for now, and talk to lucy about some of the specifics later. PTAL |
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mberhault and @rohany)
|
Nice. Probably benefits from a backport.
--
Verstuurd vanaf mijn Android apparaat met K-9 Mail. Excuseer mijn beknoptheid.
|
This only affects unit tests which run multiple test clusters concurrently and use GRANTs. This first showed up in #42066 which hasn't been backported. So not currently a problem on 19.2, though I don't mind seeing it backported. |
|
bors r=petermattis |
Build failed (retrying...) |
42985: issues: simplify post() r=jordanlewis a=tbg This prepares the code for allowing the caller more control over the message body. This makes sense since post() is called both by roachtest and nightly stress, and it doesn't make sense to use the same message body for both. Release note: None 42986: roachprod: don't try to make long dns names r=dt a=dt long cluster names break DNS. Rather than break DNS for everyone, just skip the clusters that have these long names. DNS is a nice-to-have in any case, so it is OK to skip it -- roachprod will fallback to IPs. Release note: none. 42991: settings: explanatory comments around cluster settings r=irfansharif a=irfansharif Update stale references to changed method signatures and add a bit more documentation around the usage of IsActive. Release note: None 42998: sql: bugfixes in role membership caching r=petermattis a=rohany Fixes #42098. This PR fixes some bugs in role membership caching. * The role membership cache was a single global, which meant multiple servers in the same cockroach process could pollute the cache for each other, causing flaky test failures. * The cache updating mechanism seemed to use the incorrect variable to update the cache version when a concurrent request caused a cache invalidation as well. Although unlikely, this could have lead to incorrect results being served. Release note (bug fix): This change fixes some existing caching issues surrounding role memberships, where users could sometimes see out of date role membership information. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Build succeeded |
It is a problem in 19.2 which uses multiple in-memory nodes side-by-side to implement I'm glad it got backported for that reason. |
Oh, excellent point. I had forgotten about that case. |
|
I don't think it has an effect on cockroach demo -- I think the reason why it fails in the logic tests is because its acting as a cache for two different tables, while in cockroach demo there is only one version of the role table. |
Fixes #42098.
This PR fixes some bugs in role membership caching.
multiple servers in the same cockroach process could pollute
the cache for each other, causing flaky test failures.
to update the cache version when a concurrent request caused a cache
invalidation as well. Although unlikely, this could have lead to incorrect
results being served.
Release note (bug fix): This change fixes some existing caching issues
surrounding role memberships, where users could sometimes see out of date
role membership information.