Skip to content

libroach: make DumpThreadStacks thread-safe#64081

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20210422-stacks-mutex
Apr 22, 2021
Merged

libroach: make DumpThreadStacks thread-safe#64081
craig[bot] merged 1 commit intocockroachdb:masterfrom
knz:20210422-stacks-mutex

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Apr 22, 2021

Fixes #64079.

Note: I am not including unit tests in this PR, as the change here will be suitably validated by a green CI in #59863.

Release note (bug fix): Fixed a bug where multiple concurrent
invocations of debug zip could yield cluster instability. This bug
had been present since CockroachDB v20.1.

Release note (bug fix): Fixed a bug where multiple concurrent
invocations of `debug zip` could yield cluster instability. This bug
had been present since CockroachDB v20.1.
@knz knz requested a review from petermattis April 22, 2021 16:49
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Apr 22, 2021

Not merging this directly; will merge when #59863 merges.

craig bot pushed a commit that referenced this pull request Apr 22, 2021
59863: cli/zip: issue per-node requests concurrently r=tbg a=knz

Requested by @ajwerner and @andreimatei 
First commit from  #64081.

Release note (cli change): `cockroach debug zip` now attempts to pull
data from multiple nodes concurrently, up to 15 nodes at a time. This
change is meant to accelerate the data collection when a cluster
contains multiple nodes. This behavior can be negated with the new
command-line flag `--sequential`.

63800: kv: properly handle intent pushes to synthetic timestamps r=nvanbenschoten a=nvanbenschoten

This commit reworks the timestamp that pushers use when pushing intents
to synthetic timestamps. Usually, as of #59086, a pusher limits the
timestamp that it pushes conflicting intents to using the local HLC
clock. This is based on the assumption that it will be able to ignore
any intent above the local HLC clock's reading using observed timestamps
and a local uncertainty limit.

However, this argument only holds if we expect to be able to use a local
uncertainty limit when we return to read the pushed intent. Notably,
local uncertainty limits can not be used to ignore intents with
synthetic timestamps that would otherwise be in a reader's uncertainty
interval. This is because observed timestamps do not apply to
intents/values with synthetic timestamps. So if we know that we will be
pushing an intent to a synthetic timestamp, we don't limit the value to
a clock reading from the local clock.

This came out of some exploration of kvnemesis. I don't think this was
actually causing an issue, but it seemed like a potential source of an
infinite push + conflict loop.

Release note (bug fix): Read-write contention on GLOBAL tables no longer
has a potential to thrash without making progress.

64017: sql: add test TestAlterRegionalByRowEnclosingRegionAddDrop r=arulajmani a=arulajmani

See individual commits for details. 

There's more test scenarios I want to add where we sandwich an ADD/DROP region inside of an ALTER to REGIONAL BY ROW/or an operation on one of these tables, but here's a start. The error messages aren't wrapped yet, so they may appear a bit ugly in the jobs table. What this test ensures, which I think is quite valuable, is that the rollback is fine and no manual cleanup is required.

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: arulajmani <arulajmani@gmail.com>
@craig craig bot merged commit 80ef6b3 into cockroachdb:master Apr 22, 2021
@knz knz deleted the 20210422-stacks-mutex branch April 23, 2021 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server,storage: apparent deadlock in storage._Cfunc_DBDumpThreadStacks

3 participants