storage: take an engine checkpoint during failing consistency checks#36867
storage: take an engine checkpoint during failing consistency checks#36867craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
A failed consistency check will effectively leak the checkpoint, which I think is fine for now -- customers running into one are always supposed to (and likely to) contact support. If we're paranoid about someone running out of disk space after not realizing that they have RocksDB checkpoints created after inconsistencies, we need to figure out a better UX but for now I'd say we add an env var that's set by default when using |
1d38985 to
d44e47f
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
c-deps/libroach/db.cc, line 269 at r2 (raw file):
return ToDBStatus(status); } status = cp_ptr->CreateCheckpoint(cp_dir, 0 /* log_size_for_flush */);
consider saying something about what that 0 does, if it's interesting. I think a lot of people (like myself right now) will stop reading here and not dig into CreateCheckpoint() to see.
c-deps/libroach/db.cc, line 274 at r2 (raw file):
return ToDBStatus(status); } return kSuccess;
can't we always return ToDBStatus(status); ?
c-deps/libroach/include/libroach.h, line 104 at r2 (raw file):
// Creates a RocksDB checkpoint in the specified directory (which must not exist). DBStatus DBCreateCheckpoint(DBEngine* db, DBSlice dir);
pls put some words about what a checkpoint is - something about hardlinks to current SSTs
pkg/roachpb/api.proto, line 1144 at r2 (raw file):
// If set, a checkpoint (i.e. cheap backup) of the engine will be taken. This // is expected to be set only if we already know that there is an // inconsistency and we want to preserve as much state as possible.
pls say where the checkpoint will live (how to find it)
pkg/storage/consistency_queue_test.go, line 196 at r2 (raw file):
defer cache.Release() for i := 0; i < numStores; i++ {
say something about not using in-memory engines because we want to take snapshots. But actually, why can't we use in-mem engines (and still take snapshots)?
pkg/storage/replica_consistency.go, line 92 at r2 (raw file):
// queue is asking us to report a diff, so it has run a previous check // which failed. Checkpoint: args.WithDiff && args.Mode == roachpb.ChecksumMode_CHECK_VIA_QUEUE,
this already confusing dance is becoming even more complex. How about just plumbing a checkpoint bool through args?
pkg/storage/replica_proposal.go, line 173 at r2 (raw file):
if cc.Checkpoint { checkpointBase := filepath.Join(r.store.engine.GetAuxiliaryDir(), "checkpoints") _ = os.MkdirAll(checkpointBase, 0700)
nit: 0700 /* owner rwx /*
pkg/storage/replica_proposal.go, line 178 at r2 (raw file):
log.Warningf(ctx, "unable to create checkpoint %s: %s", checkpointDir, err) } else { log.Infof(ctx, "created checkpoing %s", checkpointDir)
poing poing
pkg/storage/engine/engine_test.go, line 804 at r2 (raw file):
defer cleanup() // TODO(tbg): use the method below, but only for on-disk stores.
sorry, what's going on here?
pkg/storage/engine/engine_test.go, line 821 at r2 (raw file):
assert.NoError(t, err) assert.NoError(t, db.CreateCheckpoint(dir))
can you open and read the checkpoint you've just created? (as a RocksDB engine)
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @tbg)
pkg/storage/consistency_queue_test.go, line 196 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
say something about not using in-memory engines because we want to take snapshots. But actually, why can't we use in-mem engines (and still take snapshots)?
what does engine.GetAuxiliaryDir() return for in-mem engines?
pkg/storage/replica_proposal.go, line 172 at r2 (raw file):
snap := r.store.engine.NewSnapshot() if cc.Checkpoint { checkpointBase := filepath.Join(r.store.engine.GetAuxiliaryDir(), "checkpoints")
should we protect against in-mem engines here somehow? What's going to happen for those?
pkg/storage/replica_proposal.go, line 173 at r2 (raw file):
if cc.Checkpoint { checkpointBase := filepath.Join(r.store.engine.GetAuxiliaryDir(), "checkpoints") _ = os.MkdirAll(checkpointBase, 0700)
why MkdirAll and not Mkdir?
We've repeatedly wanted this to preserve state when finding replica inconsistencies. See, for example: cockroachdb#36861 Release note: None
d44e47f to
4e8d768
Compare
tbg
left a comment
There was a problem hiding this comment.
@petermattis could you give the C++ changes a glance to make sure I'm not doing something silly with memory mgmt?
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
c-deps/libroach/db.cc, line 269 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
consider saying something about what that 0 does, if it's interesting. I think a lot of people (like myself right now) will stop reading here and not dig into
CreateCheckpoint()to see.
Done.
c-deps/libroach/db.cc, line 274 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
can't we always
return ToDBStatus(status);?
Done.
c-deps/libroach/include/libroach.h, line 104 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls put some words about what a checkpoint is - something about hardlinks to current SSTs
Done.
pkg/roachpb/api.proto, line 1144 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls say where the checkpoint will live (how to find it)
Done.
pkg/storage/consistency_queue_test.go, line 196 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
what does
engine.GetAuxiliaryDir()return for in-mem engines?
RocksDB abstracts away the file system through an Env. An in-memory engine just has an in-memory env. So in-memory engines work if and only if you write everything through the RocksDB env, which basically means you have to slap a full POSIX API into the Engine interface. We did some of that (had to) for encryption at rest, but it's not enough. This isn't a yak worth shaving.
In-mem engines actually return a temp dir as the auxiliary dir, mostly to make things "mostly work", but that's not behavior that makes a lot of sense or that I want to rely on in this test.
pkg/storage/replica_consistency.go, line 92 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
this already confusing dance is becoming even more complex. How about just plumbing a
checkpointbool throughargs?
Done. Agreed that this is better.
pkg/storage/engine/engine_test.go, line 804 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
sorry, what's going on here?
Removed. The comment was saying "I want to use runWithAllEngines so that pebble automatically gets this test as well, but we need a version of runWithAllEngines that uses non-mem engines". Not worth it right now. The method is in the interface, so pebble will have to learn it anyway.
pkg/storage/engine/engine_test.go, line 821 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
can you open and read the checkpoint you've just created? (as a RocksDB engine)
Done.
tbg
left a comment
There was a problem hiding this comment.
PS not sure why I LGTM'ed myself. The button looked too inviting, I guess.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
This takes a checkpoint on the nodes with replicas of a failing range, before the failure leads to nodes shutting down. The checkpoint will, for the replicas of the affected range, be taken at the same Raft log position. Release note: None
4e8d768 to
8b1fc16
Compare
|
LGTM |
|
TFTR! bors r=andreimatei |
|
👎 Rejected by code reviews |
|
bors r=andreimatei |
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @tbg)
36867: storage: take an engine checkpoint during failing consistency checks r=andreimatei a=tbg This takes a checkpoint on the nodes with replicas of a failing range, before the failure leads to nodes shutting down. The checkpoint will, for the replicas of the affected range, be taken at the same Raft log position. Release note: None 36915: changefeedccl: remove deprecated changefeed.min_high_water metric r=nvanbenschoten a=danhhz It was replaced by max_behind_nanos in 19.1. Release note (backwards-incompatible change): Removed the deprecated `changefeed.min_high_water` metric; use `changefeed.max_behind_nanos` instead. Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com> Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Build succeeded |
This takes a checkpoint on the nodes with replicas of a failing range,
before the failure leads to nodes shutting down. The checkpoint will, for
the replicas of the affected range, be taken at the same Raft log position.
Release note: None