sql: block session for database name cache updates #21900
sql: block session for database name cache updates #21900andreimatei merged 1 commit intocockroachdb:masterfrom
Conversation
|
At least one logic test becomes flaky with this change because I think logic tests were abusing this mechanism for blocking a bit such that gossip propagates updated descriptors and some caches are invalidated. E.g. renaming a database and immediately verifying that the database doesn't exist is now flaky. Will figure something to do. |
|
Review status: 0 of 13 files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
9c8156c to
a992c13
Compare
|
@nvanbenschoten I've kept a subset of the uses of testingVerifyMetadata for database name cache purposes. PTAL @vivekmenezes this is the best I could come up with. All the other facilities we have in the area, like the uncommitted databases you were mentioning, are per txn, so they don't really apply to blocking different transaction. |
|
Not confident in how this all works. Can you restructure it to block after all the schema changers have executed? Can you copy all the uncommitted databases from the session to construct the list of names it needs to wait for? |
a992c13 to
36f9d80
Compare
36f9d80 to
d1c49f7
Compare
|
I've rewritten the thing in a different way - now the blocking is encapsulated with the Review status: 0 of 18 files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
|
Reviewed 18 of 18 files at r1. pkg/sql/backfill.go, line 284 at r1 (raw file):
I don't think this is what you want. pkg/sql/table.go, line 294 at r1 (raw file):
Put this next to pkg/sql/table.go, line 506 at r1 (raw file):
pkg/sql/table.go, line 514 at r1 (raw file):
Let's get rid of this string matching. It's pretty fragile. Comments from Reviewable |
d1c49f7 to
8cdb81c
Compare
|
switched to a sync.Map inside the cache. Also added a comment about how the cache sharing and wiping is funky. Review status: 12 of 18 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending. pkg/sql/backfill.go, line 284 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
lolo pkg/sql/table.go, line 294 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. pkg/sql/table.go, line 506 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
error was at the next word pkg/sql/table.go, line 514 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. Comments from Reviewable |
8cdb81c to
9b5b1d5
Compare
|
Reviewed 6 of 6 files at r2. Comments from Reviewable |
vivekmenezes
left a comment
There was a problem hiding this comment.
Lovely!!!! You have excelled yourself
| ) | ||
|
|
||
| func (tc *TableCollection) addUncommittedDatabase(name string, id sqlbase.ID, action dbAction) { | ||
| db := uncommittedDatabase{name: name, id: id, dropped: action == dbDropped} |
There was a problem hiding this comment.
I suppose you could store the action in uncommittedDatabase as opposed to the boolean.
Before this patch, if one did something like: ALTER DATABASE foo RENAME TO bar SELECT * from foo the SELECT might or might not fail. Similarly: ALTER DATABASE foo RENAME TO bar CREATE DATABASE foo the CREATE might or might not succeed. This is because database names are cached and the cache is updated asynchonously, through gossip. The situation is different for table names, as that cache is integrated with the table leasing mechanism. This patch makes it so that the SQL session on which the a database is dropped/renamed is blocked until the database cache is updated to reflect the availability of the name. The blocking happens when the transaction (implicit or explicit) of the DDL statement commits. Name resolution inside the txn (if any), is not changed in this path. Presumably that works fine because the session maintains a per-txn list of uncommitted databases. Of course, this all refers strictly to the session doing the DROP/RENAME DDL. Any other session is not affected, and their behavior remains as erratic as before. This works by having the TableCollection optionally block for updates to the database cache to arrive once a txn is committed, as such ensuring that the session that ran the DDL is only unblocked to run subsequent statements once the cache is copacetic. The changes described above were already essentially in effect in our logic tests, through a testing-only mechanism - "testingVerifyMetadata". This patch removes that mechanism: We've had this weird testing facility that we've had since time immemorial - DDL statements would provide a callback that verifies the state of the table descriptor, as seen by the Executor after running the statement. The idea, I think, was to verify two things: a) that the descriptor is not mutated too much "synchronously" (as part of running the DDL) - instead it should be mutated by asynchronous jobs. Second, to verify that these async jobs eventually run. The mechanism was super weird, leading to intractable code that I don't want to port forward to the new connExecutor. I doubt that the test facility was ever useful, beyond the database cache updates discussed above. Release note: the results for database creation, dropping or renaming statement are now block only delivered to the client once it's guaranteed that the client in question can use the new database (or the new name of the database).
9b5b1d5 to
21f8373
Compare
|
TFTRs Review status: 14 of 18 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending. pkg/sql/table.go, line 546 at r2 (raw file): Previously, vivekmenezes wrote…
my current policy (which has been known to change) is to prefer enums to bools for function arguments because they're self-documenting at call sites, but prefer bool fields to enums because they're sometimes easier to use (in an Comments from Reviewable |
|
stressed a bunch Review status: 14 of 18 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful. Comments from Reviewable |
Before this patch, if one did something like:
ALTER DATABASE foo RENAME TO bar
SELECT * from foo
the SELECT might or might not fail.
Similarly:
ALTER DATABASE foo RENAME TO bar
CREATE DATABASE foo
the CREATE might or might not succeed.
This is because database names are cached and the cache is updated
asynchonously, through gossip. The situation is different for table
names, as that cache is integrated with the table leasing mechanism.
This patch makes it so that the SQL session on which the a database is
dropped/renamed is blocked until the database cache is updated to
reflect the availability of the name. The blocking happens when the
transaction (implicit or explicit) of the DDL statement commits.
Name resolution inside the txn (if any), is not changed in this path.
Presumably that works fine because the session maintains a per-txn list
of uncommitted databases.
Of course, this all refers strictly to the session doing the DROP/RENAME
DDL. Any other session is not affected, and their behavior remains as
erratic as before.
The changes described above were already essentially in effect in our
logic tests, through a testing-only mechanism - "testingVerifyMetadata".
This patch removes most of that mechanism, and reworks and elevates a
small set of its uses beyond testing:
We've had this weird testing facility that we've had since time
immemorial - DDL statements would provide a callback that verifies the
state of the table descriptor, as seen by the Executor after running the
statement. The idea, I think, was to verify two things: a) that the
descriptor is not mutated too much "synchronously" (as part of running
the DDL) - instead it should be mutated by asynchronous jobs. Second, to
verify that these async jobs eventually run.
The mechanism was super weird, leading to intractable code that I don't
want to port forward to the new connExecutor. I doubt that the test
facility was ever useful, beyond the database cache updates discussed
above.
Release note: None