[WIP] workload/schemachange: concurrent randomized schema change workload#46402
[WIP] workload/schemachange: concurrent randomized schema change workload#46402petermattis wants to merge 1 commit intocockroachdb:masterfrom
Conversation
|
This is the concurrent schema change workload I've been fiddling with. I got it to the point where it should do something useful. Perhaps it has as it very quickly hits a state where a query is hanging which shouldn't be hanging. To reproduce, create a local single-node roachprod cluster: Then run this new workload against the cluster: The number of queries it takes before wedging varies. Note that in this run, it happened after two queries. And we never even performed a schema change! We can see the hung query via We can also experience a hung Goroutines indicate something blocked way down in kv land: Just realized I haven't pulled in a day or two. Perhaps this is already fixed. |
Just rebased on top of 94bef65 and the hung |
|
Good idea! We've been in need of this form of randomized testing around schema changes for a while. Do you picture extending this to include non-empty tables in order to stress the backfill machinery as well? I haven't dug into this at all, but I did try to get the load gen running and saw the same stall that you are reporting. I can also easily reproduce on |
|
Extending this to non-empty tables seems doable, though I’d like to flesh
out all of the schema change surface area first. In particular, there is
work to do on foreign keys, interleaved tables, default values and
sequences. I also need to wrap this up in a roachtest.
Interesting that this problem happens on 19.2. Part of the reason I made
this a workload is so we can test on previous releases.
…On Sat, Mar 21, 2020 at 5:42 PM Nathan VanBenschoten < ***@***.***> wrote:
Good idea! We've been in need of this form of randomized testing around
schema changes for a while. Do you picture extending this to include
non-empty tables in order to stress the backfill machinery as well?
I haven't dug into this at all, but I did try to get the load gen running
and saw the same stall that you are reporting. I can also easily reproduce
on v19.2.4. It seems like this has already found something interesting to
explore.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#46402 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPJ75YCVKVMP6PVLHRTXV3RIUYFVANCNFSM4LRAZFZA>
.
|
|
I'd totally expect this to happen in 19.2 and earlier. We should backport the above fixes. |
|
Alright, I went and read the code and I'm not convinced this is a bug. The hangs are always during an explicit transaction when looking up the next operation (L280). That There are two options:
|
|
I'd note that |
1ecb0a0 to
9b8ecfe
Compare
|
Ah, thanks for the eagle-eyes, Andrew. I was suspicious I was doing something wrong given how easy this was to reproduce. I've gone ahead and plumbed |
|
Ok, The wedging happens very quickly (within a few seconds). It's possible my code is still doing something wrong. I'll poke around this some more this afternoon. |
|
I suspect now you're hitting the issue fixed by #46384. Let me pull your branch now and see. |
|
I haven't tried on top of #46384 yet, but what I'm seeing is multiple goroutines that look like: |
|
Nice catch. If you wait 5 minutes you'll find that it unblocks. Seems like somewhere we're not releasing a schema lease in the right place. Digging deeper. |
Heh, well that was the point of writing this test: finding bugs. Fingers-crossed that this is a real one and not too difficult to fix. FYI, I'm not going to have much time to work on this in the early part of this week. |
…load Randomly generate (concurrent) schema changes. The tables intentionally contain no actual data as the focus here is on stressing the machinery around schema changes, not the machinery around backfills. Release note: None Release justification: non-production code changes. This is only test code.
9b8ecfe to
59924e9
Compare
|
At the schema change meeting yesterday we discussed that if you don't mind Peter I should try to merge the workload part of this PR to unblock further work on roachtests for schema changes. |
|
Superseded by #46632 |
…nced This commit is to help @spaskob avoid a hang observed while working on turning cockroachdb#46632 (workload/schemachange) into a roachtest. The idea is that as a stopgap the roachtest can issue: ``` SET CLUSTER SETTING sql.lease_manager.remove_lease_once_deferenced = true; ``` The hang was discovered while looking at cockroachdb#46402. Release note: None
Randomly generate (concurrent) schema changes. The tables intentionally
contain no actual data as the focus here is on stressing the machinery
around schema changes, not the machinery around backfills.
Release note: None
Release justification: non-production code changes. This is only test
code.