roachtest: take storage checkpoints on failed consistency checks#117235
roachtest: take storage checkpoints on failed consistency checks#117235craig[bot] merged 5 commits intocockroachdb:masterfrom
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
@renatolabs @pav-kv Submitting this for an initial look. Will need a few tests and tweaks. |
|
@renatolabs Do you have any thoughts on this before I wrap it up? |
renatolabs
left a comment
There was a problem hiding this comment.
Nice!
Left minor comments for consideration. The direction here make sense to me.
|
On second thought, I wonder if we should just do something simpler here, which won't require any API changes: add an envvar (e.g. Wdyt @pav-kv? |
b9ba8f1 to
0d31775
Compare
|
@erikgrinaker Upd: ah no, we can just not set the |
|
I figured terminating the node seemed fine, since we were going to fail the test anyway. I guess that prevents us from collecting debug.zip data from the node, so maybe we do want to keep it running -- but if the inconsistency happened to be caught by the normal consistency queue, then we'd be in this situation anyway. It just seems like a much more light-weight change -- I'm not sure I like modifying the public APIs just for this. |
|
If terminating is fine, then perhaps an env var is fine. Who would set the var? |
|
We'd have to set it for all roachtests. |
|
Can we use some of the static constants instead of env var? Like the Another benefit of a static test-only check is that it can actually be done across nodes. So we can terminate or not terminate nodes as we like. |
|
Roachtests use arbitrary binaries, in principle we don't control whether they're built with
I don't think we want to do this. For example, it will break all tests of |
We may choose to never fatal nodes in tests. Similarly to how we mock out the fatal behaviour in unit tests. Also, it doesn't have to be a build flag. Can be any semi-static indicator that we're in a [specific] roachtest, similar to testing knobs for the unit tests. |
How can we do this? Roachtests use an arbitrary CRDB binary, they can only control runtime flags like envvars, not build-time flags. |
|
Right, then env var is the answer. I was thinking of some "code level" env vars basically, but it appears we don't have them. |
0d31775 to
2d3dd23
Compare
|
Ok, I've added |
On multi-store nodes, this allow using e.g. `{store-dir:2}` to reference
the second store on the node (NB: not the store ID, but its local
index).
Epic: none
Release note: None
This allows using e.g. `{store-dir}` in the source path.
Epic: none
Release note: None
This environment variable makes `crdb_internal.check_consistency()` take storage checkpoints and terminate nodes on any inconsistencies when run with `stats_only: false`. This is the same as the background consistency checker, and is done by changing the mode CHECK_FULL to CHECK_VIA_QUEUE. Epic: none Release note: None
This allows us to debug range inconsistencies. Epic: none Release note: None
Epic: none Release note: None
2d3dd23 to
583c105
Compare
|
This should be ready for final review now. Became a fair bit simpler.
Doesn't look like we have a test harness in place for roachtest artifacts collection, and I don't want to start building one now, so I dropped this. Verified manually with the |
|
@renatolabs I don't think there should be anything controversial here, so I'll merge this, but let me know if you have further comments and I'll address them separately. TFTRs! bors r+ |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from f9ebbde to blathers/backport-release-23.2-117235: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
roachprod: add store index parameter for
{store-dir}expansionOn multi-store nodes, this allow using e.g.
{store-dir:2}to reference the second store on the node (NB: not the store ID, but its local index).roachprod: expand parameters in
getsourceThis allows using e.g.
{store-dir}in the source path.sql: add
COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATALThis environment variable makes
crdb_internal.check_consistency()take storage checkpoints and terminate nodes on any inconsistencies when run withstats_only: false. This is the same as the background consistency checker, and is done by changing the mode CHECK_FULL to CHECK_VIA_QUEUE.roachtest: take storage checkpoints on failed consistency checks
This allows us to debug range inconsistencies.
roachtest: collect Pebble checkpoint artifacts
Resolves #115770.
Epic: none
Release note: None