sql: disable test tenant for tests using TestingDisableTableLeases#166108
sql: disable test tenant for tests using TestingDisableTableLeases#166108trunk-io[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
😎 Merged successfully - details. |
ZhouXing19
left a comment
There was a problem hiding this comment.
LGTM, thanks! Just one nit comment about the issue to link.
I'm also curious why this test failure happens recently -- seems that this is an old test, and the introduction of default test tenant / VC injection is not new either (since 2023, IIUC). Is it purely because the injection of a default tenant is metamorphic, or something changes recently? I asked glean but got nothing convincing.
@ZhouXing19 reviewed 4 files and all commit messages, and made 2 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on mw5h and yuzefovich).
pkg/sql/crdb_internal_test.go line 217 at r1 (raw file):
// This is incompatible with external process virtual clusters where // the lease disable doesn't take effect. DefaultTestTenant: base.TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet(165980),
nit: I'm not sure using the issue number of the original test failure is conventional here (maybe @yuzefovich knows better), but I think we should have it linked to a "root cause" issue, which is about "tests that mutate descriptors via KV + TestingDisableTableLeases flaking when an external-process virtual cluster is injected". Also, we might want to add this root issue to #76378?
mw5h
left a comment
There was a problem hiding this comment.
Great question on the "why now?" — I dug into it and the answer is straightforward.
Before October 2025, the entire pkg/sql test package had a package-wide override in main_test.go that disabled test tenant injection for all tests:
defer serverutils.TestingSetDefaultTenantSelectionOverride(
base.TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet(76378),
)()In 3f64b0eb656 (Oct 22, 2025), @yuzefovich removed that package-wide override and enabled test tenants for pkg/sql, individually fixing tests he identified as incompatible. These tests weren't caught in that pass. Since tenant injection is metamorphic, they only fail when the random coin flip injects an external-process tenant, which is why it took months to surface as flaky failures.
Re: the issue number — good point, I'll file a root cause issue for "tests using TestingDisableTableLeases + direct KV descriptor manipulation are incompatible with external-process virtual clusters" and link it to #76378, then update the references in this PR.
e86cbeb to
7d8815d
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
@yuzefovich reviewed 4 files and all commit messages, and made 3 comments.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on mw5h and ZhouXing19).
pkg/sql/crdb_internal_test.go line 217 at r1 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
nit: I'm not sure using the issue number of the original test failure is conventional here (maybe @yuzefovich knows better), but I think we should have it linked to a "root cause" issue, which is about "tests that mutate descriptors via KV +
TestingDisableTableLeasesflaking when an external-process virtual cluster is injected". Also, we might want to add this root issue to #76378?
Yeah, this is the way - filing a separate issue to this particular problem and then link this new issue in the meta issue #76378.
pkg/sql/crdb_internal_test.go line 439 at r2 (raw file):
// This is incompatible with external process virtual clusters where // the lease disable doesn't take effect. DefaultTestTenant: base.TestIsForStuffThatShouldWorkWithSecondaryTenantsButDoesntYet(166311),
nit: I think in the shared process these tests should work, so it might be worth creating TestDoesNotWorkWithExternalProcessModeButWeDontKnowWhyYet helper and using it this PR.
Several tests directly manipulate table descriptors via KV writes and rely on `lease.TestingDisableTableLeases()` to make those changes immediately visible to the SQL layer. This process-global atomic bool only takes effect within the test process; when an external process virtual cluster is metamorphically injected, the external SQL server's lease manager still caches descriptors normally and never sees the direct KV writes. This causes intermittent failures where the SQL layer uses stale descriptors. The reported failure in TestOperationsWithIndexMutation manifests as a missing row after UPSERT: the previous subtest's direct KV descriptor writes (8+ version bumps) leave stale lease state in the external process, and after the table is dropped and recreated, the first UPSERT can silently target stale state. Fix this by setting `DefaultTestTenant` to disable external process virtual cluster injection in all tests that combine `TestingDisableTableLeases` with direct KV descriptor manipulation. Since shared-process mode works fine (same process, same atomic bool), we use `TestDoesNotWorkWithExternalProcessMode` rather than disabling all secondary tenant modes. These tests were exposed when the package-wide tenant injection override was removed in 3f64b0e (Oct 2025). Since injection is metamorphic, failures are intermittent. Fixes: cockroachdb#165980 Informs: cockroachdb#166311 Release note: None Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7d8815d to
e268efd
Compare
mw5h
left a comment
There was a problem hiding this comment.
@mw5h made 1 comment and resolved 1 discussion.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on yuzefovich).
pkg/sql/crdb_internal_test.go line 439 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I think in the shared process these tests should work, so it might be worth creating
TestDoesNotWorkWithExternalProcessModeButWeDontKnowWhyYethelper and using it this PR.
Turns out that this already exists!
|
Based on the specified backports for this PR, I applied new labels to the following linked issue(s). Please adjust the labels as needed to match the branches actually affected by the issue(s), including adding any known older branches. Issue #165980: branch-release-26.1, branch-release-26.2. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Summary
Several tests directly manipulate table descriptors via KV writes and rely on
lease.TestingDisableTableLeases()to make those changes immediately visibleto the SQL layer. This process-global atomic bool only takes effect within the
test process; when an external process virtual cluster is metamorphically
injected, the external SQL server's lease manager still caches descriptors
normally and never sees the direct KV writes, causing intermittent failures.
This PR disables the default test tenant for all tests that combine
TestingDisableTableLeaseswith direct KV descriptor manipulation:descriptor_mutation_test.go: 5 testsindex_mutation_test.go: 1 testdelete_preserving_index_test.go: 3 test sitescrdb_internal_test.go: 2 testsFixes: #165980
Release note: None
🤖 Generated with Claude Code