roachtest: add admission-control/database-drop by irfansharif · Pull Request #104051 · cockroachdb/cockroach

irfansharif · 2023-05-29T16:44:46Z

roachtest: add admission-control/database-drop

Part of #89208. We've seen incidents where a large index being dropped
caused a moderate latency/throughput impact on foreground load.
Index/table/database drop all share the same underlying code, we make
metadata only changes on the descriptor, wait out the GC TTL, and issue
range deletion tombstones over the now-deleted keyspace. On subsequent
pebble compactions, storage space is reclaimed. See
pkg/sql/schema_changer.go for where all this work originates.

This particular test makes two copies of the 4TiB replicated TPC-E 100k
dataset, runs foreground workload against one and drops the other. It
makes use of disk snapshots of course, because we're impatient.

roachprod/gce: use prefix search in snapshot listing API

Like the API intended. Only noticed in the subsequent commit when
introducing a second use of disk snapshots that happened to use
"tpcc-100k" in its name, somewhere.

roachtest: update index-backfill roachtest

Adding a bit more commentary, throwing in another concurrent index
backfill for free, and fighting some open CRDB bugs unrelated to the
test.

roachtest: fix race with concurrent map writes

This author forgot that struct copies weren't deep enough.

This author forgot that struct copies weren't deep enough. Release note: None

Adding a bit more commentary, throwing in another concurrent index backfill for free, and fighting some open CRDB bugs unrelated to the test. Release note: None

Like the API intended. Only noticed in the subsequent commit when introducing a second use of disk snapshots that happened to use "tpcc-100k" in its name, somewhere. Release note: None

cockroach-teamcity · 2023-05-29T16:44:58Z

This change is

irfansharif · 2023-05-29T17:42:12Z

Data dump for run: prometheus-snapshot.tar.gz

Visualize with:

#!/bin/sh
set -eu

rm -rf data/snapshots
tar xf prometheus-snapshot.tar.gz
snapdir=$(find data/snapshots -mindepth 1 -maxdepth 1 -type d)
promyml=$(mktemp)
chmod -R o+rw "${snapdir}" "${promyml}"

cat <<EOF > "${promyml}"
global:
  scrape_interval: 10s
  scrape_timeout: 5s
EOF

set -x
# Worked as of v2.33.1 so you can hard-code that if necessary.
docker run --privileged -p 9090:9090 \
    -v "${promyml}:/etc/prometheus/prometheus.yml" -v "${PWD}/${snapdir}:/prometheus" \
    prom/prometheus:latest \
    --config.file=/etc/prometheus/prometheus.yml \
    --storage.tsdb.path=/prometheus \
    --web.enable-admin-api

sumeerbhola

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 6 of 6 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @irfansharif, and @smg260)

pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 130 at r2 (raw file):

			}

			promCfg := &prometheus.Config{}

has the promCfg block moved down here because we don't care about prometheus during the snapshot generation phase?

pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 170 at r2 (raw file):

					defer db.Close()

					// Gently wind up AddSST concurrency, to increase the

nit: "Gently wind up" suggested to me that there would be a loop that would be slowly increasing the value of this cluster setting as time elapses.
Is this comment trying to say that we could have set it to something very high, like 100, but are only increasing it by a small amount, from 1 to 3, since that is sufficient for causing follower overload? Is there any harm in increasing it further -- I would imagine we would want a test that would fail unless we have replication admission control, and wouldn't start passing simply because disks or machines became slightly faster.

Part of cockroachdb#89208. We've seen incidents where a large index being dropped caused a moderate latency/throughput impact on foreground load. Index/table/database drop all share the same underlying code, we make metadata only changes on the descriptor, wait out the GC TTL, and issue range deletion tombstones over the now-deleted keyspace. On subsequent pebble compactions, storage space is reclaimed. See pkg/sql/schema_changer.go for where all this work originates. This particular test makes two copies of the 4TiB replicated TPC-E 100k dataset, runs foreground workload against one and drops the other. It makes use of disk snapshots of course, because we're impatient. Release note: None

irfansharif

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan, @smg260, and @sumeerbhola)

pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 130 at r2 (raw file):

Previously, sumeerbhola wrote…

has the promCfg block moved down here because we don't care about prometheus during the snapshot generation phase?

Yes.

pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 170 at r2 (raw file):
Reworded.

Is this comment trying to say that we could have set it to something very high, like 100, but are only increasing it by a small amount, from 1 to 3, since that is sufficient for causing follower overload?

Yes. Note that this is a concurrency limit, not a rate limit. And 3 is quite sufficient for us. This test could be rewritten to use fewer nodes so more concentration in terms of follower writes, which I'll do in a later pass.

Is there any harm in increasing it further -- I would imagine we would want a test that would fail unless we have replication admission control, and wouldn't start passing simply because disks or machines became slightly faster.

I observed some uninteresting bulk adder timeouts when cranking up the concurrency, and I didn't investigate. Just noting that this and all other admission-control tests aren't pass/fail ones somewhat intentionally. I still haven't plugged meaningful numbers into roachperf so regressions are observed but data for all these runs are present in the test-eng grafana for the last 90 days, which is quite good to find regressions. As for the replication admission control specific test, I think something more targeted would be better. I have #81516 on my TODO list to polish and merge.

craig · 2023-05-31T16:31:38Z

Build succeeded:

Bazel Essential CI (Cockroach)

irfansharif added 3 commits May 29, 2023 12:44

roachtest: fix race with concurrent map writes

ffa02a3

This author forgot that struct copies weren't deep enough. Release note: None

roachtest: update index-backfill roachtest

e85a44f

Adding a bit more commentary, throwing in another concurrent index backfill for free, and fighting some open CRDB bugs unrelated to the test. Release note: None

roachprod/gce: use prefix search in snapshot listing API

6df9f42

Like the API intended. Only noticed in the subsequent commit when introducing a second use of disk snapshots that happened to use "tpcc-100k" in its name, somewhere. Release note: None

irfansharif requested a review from a team May 29, 2023 16:44

irfansharif requested a review from a team as a code owner May 29, 2023 16:44

irfansharif requested review from herkolategan and smg260 and removed request for a team May 29, 2023 16:44

irfansharif force-pushed the 230527.ac-idx-backfill-roachtest branch from 2eaec31 to c6af62f Compare May 29, 2023 16:53

irfansharif mentioned this pull request May 29, 2023

admission: add integration tests #89208

Open

21 tasks

irfansharif force-pushed the 230527.ac-idx-backfill-roachtest branch from c6af62f to 2aa2046 Compare May 29, 2023 17:26

sumeerbhola approved these changes May 30, 2023

View reviewed changes

irfansharif force-pushed the 230527.ac-idx-backfill-roachtest branch from 2aa2046 to 9d7c9fb Compare May 31, 2023 15:51

irfansharif commented May 31, 2023

View reviewed changes

craig bot merged commit d39caae into cockroachdb:master May 31, 2023

irfansharif deleted the 230527.ac-idx-backfill-roachtest branch May 31, 2023 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: add admission-control/database-drop#104051

roachtest: add admission-control/database-drop#104051
craig[bot] merged 4 commits intocockroachdb:masterfrom
irfansharif:230527.ac-idx-backfill-roachtest

irfansharif commented May 29, 2023

Uh oh!

cockroach-teamcity commented May 29, 2023

Uh oh!

irfansharif commented May 29, 2023

Uh oh!

sumeerbhola left a comment

Uh oh!

irfansharif left a comment

Uh oh!

craig bot commented May 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

irfansharif commented May 29, 2023

Uh oh!

cockroach-teamcity commented May 29, 2023

Uh oh!

irfansharif commented May 29, 2023

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

irfansharif left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented May 31, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants