Skip to content

sql: DDL has become excessively sensitive to txn retries #35549

@knz

Description

@knz

For context, I had a coding hiatus in the last 2-3 weeks and I have more sensitivity to "larger scale" differences in the health of our test suite between, say, a month ago and today.

The truth of the matter is I am seeing a bunch of new test flakes that all point in the direction of SQL DDL suffering from transaction retries in ways that did not exist previously.

The reader of this issue description should understand that we have many, many SQL tests that assume that the allocation of object IDs (table, seq, view, dbs) is deterministic and not subject to retries if there is just 1 client to an entire (multi-node) cluster.

This assumption is currently massively violated.

Some symptoms, which many of you may recognize:

  • the allocated IDs are occasionally larger than expected, which indicates the DDL was unexpectedly retried
  • the test fails outright with a txn retry error

I will refrain from phrasing an opinion about whether the retries are desirable/acceptable. However I'd like to point out that if we keep the current behavior, we need to audit and rewrite a very large number of tests throughout SQL and this task is thoroughly unwelcome so late in the release cycle.

@bdarnell @petermattis please advise.

Metadata

Metadata

Labels

C-test-failureBroken test (automatically or manually discovered).

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions