-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql: deadlock due to planning query for SHOW TABLE #46447
Description
When planning a query we use a different transaction. We do this, at least in part, to ensure that reads of the schema performed during planning are cacheable. See here:
cockroach/pkg/sql/conn_executor_prepare.go
Lines 163 to 172 in e5bb268
| // Preparing needs a transaction because it needs to retrieve db/table | |
| // descriptors for type checking. | |
| // TODO(andrei): Needing a transaction for preparing seems non-sensical, as | |
| // the prepared statement outlives the txn. I hope that it's not used for | |
| // anything other than getting a timestamp. | |
| txn := kv.NewTxn(ctx, ex.server.cfg.DB, ex.server.cfg.NodeID.Get()) | |
| ex.statsCollector.reset(&ex.server.sqlStats, ex.appStats, &ex.phaseTimes) | |
| p := &ex.planner | |
| ex.resetPlanner(ctx, p, txn, ex.server.cfg.Clock.PhysicalTime() /* stmtTS */) |
Unfortunately, we do use this transaction for more than just a timestamp. This transaction makes its way to name resolution. An example of a problematic case is here:
cockroach/pkg/sql/delegate/show_table.go
Line 172 in e5bb268
| dataSource, resName, err := d.catalog.ResolveDataSource(d.ctx, flags, &tn) |
Imagine that we have Txn1 and Txn2:
Txn1: BEGIN;
Txn1: ALTER TABLE foo ...
Txn2: BEGIN;
Txn2: ALTER TABLE bar ...
Txn2: ALTER TABLE foo ... // blocks on Txn1
Txn1: SHOW COLUMNS FROM bar // blocks on Txn2 in planner txn
Had the planner been using the user's transaction then deadlock detection would have kicked in and we would have been fine. The problem here is that deadlock detection has no way to associate the planner txn with Txn1.
The fix for the short term is likely to be to follow the pattern of #46384 and #46170 and make the transaction used here high priority. Unlike in those cases however, the result of this query is not necesarily going to be cached as a shared resource. In fact, if we do this, then there's a chance that we can introduce a live-lock scenario where concurrent SHOW COLUMN ... can hold off a schema change. I'm not certain that that's the worst thing. I think for now it's the right thing to do.
A better solution might be to explicitly relate the planner txn to Tx1 for deadlock detection. Perhaps something like a read-only child transaction which carries its parent in push requests. @nvanbenschoten I think you've been having thoughts in this area.
This is fallout from #46402.