Skip to content

kvserver: deflake TestPromoteNonVoterInAddVoter#155861

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
arulajmani:kv-deflake-TestPromoteNonVoterInAddVoter
Oct 22, 2025
Merged

kvserver: deflake TestPromoteNonVoterInAddVoter#155861
craig[bot] merged 1 commit intocockroachdb:masterfrom
arulajmani:kv-deflake-TestPromoteNonVoterInAddVoter

Conversation

@arulajmani
Copy link
Copy Markdown
Collaborator

This test could previously fail because a lease transfer could race with a split. Without the split, the effects of the span config (that the test asserts on) will not apply. We sidestep this problem by removing the need of a split entirely -- by tuning some cluster settings to ensure that every table is on its own range to begin with.

H/t to @pav-kv for the analysis on this one.

Closes #155828
Closes #155747
Closes #155316
Closes #154773
Closes #134383

Release note: None

@arulajmani arulajmani requested review from pav-kv and wenyihu6 October 22, 2025 18:08
@arulajmani arulajmani requested review from a team as code owners October 22, 2025 18:08
@arulajmani arulajmani added the backport-all Flags PRs that need to be backported to all supported release branches label Oct 22, 2025
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

This test could previously fail because a lease transfer could race
with a split. Without the split, the effects of the span config
(that the test asserts on) will not apply. We sidestep this problem by
removing the need of a split entirely -- by tuning some cluster settings
to ensure that every table is on its own range to begin with.

H/t to @pav-kv for the analysis on this one.

Closes cockroachdb#155828
Closes cockroachdb#155747
Closes cockroachdb#155316
Closes cockroachdb#154773
Closes cockroachdb#134383

Release note: None
@arulajmani arulajmani force-pushed the kv-deflake-TestPromoteNonVoterInAddVoter branch from 29fe1e5 to a884f8f Compare October 22, 2025 18:26
@arulajmani
Copy link
Copy Markdown
Collaborator Author

TFTR!

bors r+

@tbg
Copy link
Copy Markdown
Member

tbg commented Oct 22, 2025 via email

craig bot pushed a commit that referenced this pull request Oct 22, 2025
155616: rangefeed: add assertion that OnError is called on disconnected registration r=stevendanna a=stevendanna

Epic: none
Release note: None

155797: *: replace some usages of global rand r=yuzefovich a=yuzefovich

**sem/builtins: use RNG from eval context for 'st_generatepoints'**

***: replace some usages of global rand**

When we use `rand.*` methods like `math/rand.Intn`, under the hood we hit
the global rand source, which is protected by a mutex, so we could be subject
to contention on that lock. This commit updates some of such usages in
favor of object-specific rand source to avoid that. Note that allocating
a separate rand source is not free (the allocation itself is about 5KiB
in size), so we modify spots where the lifecycle matches that of the
server or when the affected code is heavy already / not on the hot path
(also when it's clear that the access is from within a single
goroutine).

Epic: None
Release note: None

155861: kvserver: deflake TestPromoteNonVoterInAddVoter r=arulajmani a=arulajmani

This test could previously fail because a lease transfer could race with a split. Without the split, the effects of the span config (that the test asserts on) will not apply. We sidestep this problem by removing the need of a split entirely -- by tuning some cluster settings to ensure that every table is on its own range to begin with.

H/t to `@pav-kv` for the analysis on this one.

Closes #155828 
Closes #155747 
Closes #155316 
Closes #154773 
Closes #134383

Release note: None

Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 22, 2025

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 22, 2025

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Oct 22, 2025

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating PR, but backport branch blathers/backport-release-24.3-155861 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls: 502 Server Error []

Backport to branch release-24.3 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@pav-kv
Copy link
Copy Markdown
Collaborator

pav-kv commented Oct 23, 2025

LGTM. Should we still be concerned about the underlying non-test-only behaviour - the fact that zone config application can be stuck?

tbg added a commit to tbg/cockroach that referenced this pull request Oct 30, 2025
This reverts commit a884f8f (cockroachdb#155861).

As far as I understand, the cluster settings in this commit had the
effect of making sure that the span config for the table in the test
was exactly `[/Table/N,/Table/(N+1))`, i.e. that it wouldn't merge
with adjacent configs. But in a recent flake we see that the real
problem is that the range containing our table sometimes extends
past the span config (in a recent example, it reached the end of
the keyspace and wasn't split down in time for the test to pass).

Since commits in this PR ensure that the splitting happens proactively, it
should not matter whether span configs are coalesced or not, and we back out
that attempt at a fix as it's unclear what positive effect it may have.

@arulajmani: I'm probably missing something here. Happy to update
the description if so.
tbg added a commit to tbg/cockroach that referenced this pull request Oct 30, 2025
This reverts commit a884f8f (cockroachdb#155861).

As far as I understand, the cluster settings in this commit had the
effect of making sure that the span config for the table in the test
was exactly `[/Table/N,/Table/(N+1))`, i.e. that it wouldn't merge
with adjacent configs. But in a recent flake we see that the real
problem is that the range containing our table sometimes extends
past the span config (in a recent example, it reached the end of
the keyspace and wasn't split down in time for the test to pass).

Since commits in this PR ensure that the splitting happens proactively, it
should not matter whether span configs are coalesced or not, and we back out
that attempt at a fix as it's unclear what positive effect it may have.

@arulajmani: I'm probably missing something here. Happy to update
the description if so.
tbg added a commit to tbg/cockroach that referenced this pull request Oct 31, 2025
This reverts commit a884f8f (cockroachdb#155861).

As far as I understand, the cluster settings in this commit had the
effect of making sure that the span config for the table in the test
was exactly `[/Table/N,/Table/(N+1))`, i.e. that it wouldn't merge
with adjacent configs. But in a recent flake we see that the real
problem is that the range containing our table sometimes extends
past the span config (in a recent example, it reached the end of
the keyspace and wasn't split down in time for the test to pass).

Since commits in this PR ensure that the splitting happens proactively, it
should not matter whether span configs are coalesced or not, and we back out
that attempt at a fix as it's unclear what positive effect it may have.

@arulajmani: I'm probably missing something here. Happy to update
the description if so.
tbg added a commit to tbg/cockroach that referenced this pull request Oct 31, 2025
This reverts commit a884f8f (cockroachdb#155861).

As far as I understand, the cluster settings in this commit had the
effect of making sure that the span config for the table in the test
was exactly `[/Table/N,/Table/(N+1))`, i.e. that it wouldn't merge
with adjacent configs. But in a recent flake we see that the real
problem is that the range containing our table sometimes extends
past the span config (in a recent example, it reached the end of
the keyspace and wasn't split down in time for the test to pass).

Since commits in this PR ensure that the splitting happens proactively, it
should not matter whether span configs are coalesced or not, and we back out
that attempt at a fix as it's unclear what positive effect it may have.

@arulajmani: I'm probably missing something here. Happy to update
the description if so.
tbg added a commit to tbg/cockroach that referenced this pull request Oct 31, 2025
This reverts commit a884f8f (cockroachdb#155861).

As far as I understand, the cluster settings in this commit had the
effect of making sure that the span config for the table in the test
was exactly `[/Table/N,/Table/(N+1))`, i.e. that it wouldn't merge
with adjacent configs. But in a recent flake we see that the real
problem is that the range containing our table sometimes extends
past the span config (in a recent example, it reached the end of
the keyspace and wasn't split down in time for the test to pass).

Since commits in this PR ensure that the splitting happens proactively, it
should not matter whether span configs are coalesced or not, and we back out
that attempt at a fix as it's unclear what positive effect it may have.

@arulajmani: I'm probably missing something here. Happy to update
the description if so.
tbg added a commit to tbg/cockroach that referenced this pull request Oct 31, 2025
This reverts commit a884f8f (cockroachdb#155861).

As far as I understand, the cluster settings in this commit had the
effect of making sure that the span config for the table in the test
was exactly `[/Table/N,/Table/(N+1))`, i.e. that it wouldn't merge
with adjacent configs. But in a recent flake we see that the real
problem is that the range containing our table sometimes extends
past the span config (in a recent example, it reached the end of
the keyspace and wasn't split down in time for the test to pass).

Since commits in this PR ensure that the splitting happens proactively, it
should not matter whether span configs are coalesced or not, and we back out
that attempt at a fix as it's unclear what positive effect it may have.

@arulajmani: I'm probably missing something here. Happy to update
the description if so.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-all Flags PRs that need to be backported to all supported release branches backport-failed v26.1.0-prerelease

Projects

None yet

5 participants