kvserver/loqrecovery: write replica recovery events to rangelog#74333
kvserver/loqrecovery: write replica recovery events to rangelog#74333craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
de465e3 to
74cfbb8
Compare
tbg
left a comment
There was a problem hiding this comment.
Third commit looks good overall, I'll give it another pass when the prereqs have merged.
Reviewed 6 of 13 files at r1, 15 of 21 files at r2, 6 of 6 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)
pkg/kv/kvserver/loqrecovery/record.go, line 117 at r3 (raw file):
// recovery event. func UpdateRangeLogWithRecovery( ctx context.Context, exec sqlutil.InternalExecutor, event loqrecoverypb.ReplicaRecoveryRecord,
To keep things tidy+testable and to avoid a large dependency, take a narrow interface here instead of InternalExecutor:
type DB interface {
Exec(ctx context.Context, stmt string, args ...interface{}) (int, error)
}and let the caller deal with making the InternalExecutor look like a DB. In fact, it might be easiest to not use an interface but simply a func exec(....) (int, error) so that we don't need to make a dummy type to implement the interface.
cd77859 to
33c6aa8
Compare
aliher1911
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)
pkg/kv/kvserver/loqrecovery/record.go, line 117 at r3 (raw file):
Previously, tbg (Tobias Grieger) wrote…
To keep things tidy+testable and to avoid a large dependency, take a narrow interface here instead of
InternalExecutor:type DB interface { Exec(ctx context.Context, stmt string, args ...interface{}) (int, error) }and let the caller deal with making the
InternalExecutorlook like aDB. In fact, it might be easiest to not use an interface but simply afunc exec(....) (int, error)so that we don't need to make a dummy type to implement the interface.
Passing a functions seem to be a no brainer! Makes testing way simpler as well.
tbg
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)
pkg/server/server.go, line 1983 at r4 (raw file):
} if err := loqrecovery.UpdateRangeLogWithRecovery(ctx, sqlExec, record); err != nil { log.Errorf(ctx, "Loss of quorum recovery failed to write RangeLog entry: %s", err)
loss
pkg/kv/kvserver/loqrecovery/record_test.go, line 26 at r4 (raw file):
// TestPublishRangeLogEvents verifies that inserting recovery events into RangeLog handles // sql execution errors correctly and doesn't request deletion of failed records. It also
What does this mean? You're just checking that a SQL error results in an error returned from the function, right?
Code quote:
doesn't request deletion of failed recordspkg/kv/kvserver/loqrecovery/record_test.go, line 41 at r4 (raw file):
success bool }{ {"success", 7, 1021, 1, nil, true},
nit: const rangeID = 7 at the top would make this easier to read. Similar const ts1021 = 1021.
Also, I thought we had a linter that forced you to put the field names, but maybe not. I like putting them because it makes it easier to read but I see why with two test cases it may not be a big deal, your call as for any nit.
pkg/kv/kvserver/loqrecovery/record_test.go, line 65 at r4 (raw file):
require.Error(t, err) } require.Equal(t, 6, len(actualArgs), "No actual args were provided")
nit: I think there isn't really any place where we capitalize strings like these, neither here and in logging (maybe with the exception of a few messages that are part of the early boot sequence):
$ git grep -E 'log.Errorf\(ctx, "[A-Z]' | grep -v _test.go | wc -l
15
$ git grep -E 'log.Errorf\(ctx, "[a-z]' | grep -v _test.go | wc -l
171
I don't bring this up because I feel strongly about the particulars in this PR, but we try to be consistent in general.
0b79887 to
93af558
Compare
erikgrinaker
left a comment
There was a problem hiding this comment.
Reviewed 5 of 13 files at r1, 23 of 23 files at r5, 21 of 21 files at r6, 7 of 7 files at r7, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)
-- commits, line 29 at r7:
s/or/of/ ?
pkg/kv/kvserver/loqrecovery/record.go, line 128 at r7 (raw file):
UpdatedDesc: &event.RangeDescriptor, AddedReplica: &event.NewReplica, Reason: kvserverpb.ReasonAdminRequest,
I'd introduce a new reason here, to disambiguate it from actual admin request events. As far as I can tell, the DB Console will handle that just fine in older versions.
pkg/kv/kvserver/loqrecovery/record.go, line 139 at r7 (raw file):
event.RangeID, event.NewReplica.StoreID, kvserverpb.RangeLogEventType_remove_voter.String(),
If we're going to use the remove_voter event type here, maybe we should insert one event for each replica that was removed, and another for the new replica that was added? I mean, that is what these events mean, so maybe the range log should reflect that. Not a strongly held opinion though.
pkg/kv/kvserver/loqrecovery/record.go, line 149 at r7 (raw file):
} if rows != 1 { return errors.Errorf("%d row(s) affected by RangeLog insert while expected 1",
nit: I don't think we strictly need to check this, we can assume that an INSERT query works as expected (we have plenty of tests for that elsewhere). Suppose it doesn't hurt though.
pkg/server/server.go, line 1766 at r7 (raw file):
// We are returning false here to prevent entries from being deleted from // store. We need those entries at the end of startup to populate rangelog // and possibly other durable cluster-replicated destinations.
Isn't this going to cause duplicate metrics to be recorded? Let's move metrics to the other registration where the events get removed.
pkg/kv/kvserver/loqrecovery/record_test.go, line 41 at r4 (raw file):
I thought we had a linter that forced you to put the field names, but maybe not
That's only for exported structs, with the justification that you don't want to rely on struct field ordering in some third-party package. It's totally kosher to use unkeyed literals for internal structs (and arguably within the same module), and I often prefer them as compact input representations in tests. No strong opinion here though.
pkg/kv/kvserver/loqrecovery/record_test.go, line 40 at r7 (raw file):
returnedRowCount int queryExecError error success bool
Consider prefixing the expectations with expect or something, to make it clearer which fields are inputs and which are assertions.
93af558 to
c248b66
Compare
aliher1911
left a comment
There was a problem hiding this comment.
I added a version gate to it, because of new enum types for RangeLog.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/kv/kvserver/loqrecovery/record.go, line 128 at r7 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
I'd introduce a new reason here, to disambiguate it from actual admin request events. As far as I can tell, the DB Console will handle that just fine in older versions.
I don't think that's true as it queries the values and then coverts them to string in the endpoint. I'll add a version gate for that.
pkg/kv/kvserver/loqrecovery/record.go, line 139 at r7 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
If we're going to use the
remove_voterevent type here, maybe we should insert one event for each replica that was removed, and another for the new replica that was added? I mean, that is what these events mean, so maybe the range log should reflect that. Not a strongly held opinion though.
I think that makes sense. I created draft early to see if the general idea seem reasonable. That might require passing more data in recovery event, but that's ok. I don't think we should go over the board here because you can see what's going on from the descriptor to descriptor change, but multiple entries are fine.
pkg/kv/kvserver/loqrecovery/record.go, line 149 at r7 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
nit: I don't think we strictly need to check this, we can assume that an
INSERTquery works as expected (we have plenty of tests for that elsewhere). Suppose it doesn't hurt though.
Honestly, that is almost a copy&paste from existing code that records to rangelog from other place and it does check error count. Not sure what would be the reason to insert >1 or 0 entries as we don't have a non auto generated primary key I think.
pkg/kv/kvserver/loqrecovery/record_test.go, line 26 at r4 (raw file):
Previously, tbg (Tobias Grieger) wrote…
What does this mean? You're just checking that a SQL error results in an error returned from the function, right?
Yup, but it is now just an error propagation so more clear.
c248b66 to
3987ff4
Compare
tbg
left a comment
There was a problem hiding this comment.
Looks good! I'm still going to give this a final look when the preceding commits are gone (never quite 100% sure with reviewable but also didn't feel like re-reviewing the whole commit).
Reviewed 28 of 28 files at r8, 23 of 23 files at r9, 13 of 13 files at r10, 1 of 1 files at r11, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)
pkg/kv/kvserver/kvserverpb/log.proto, line 35 at r11 (raw file):
remove_non_voter = 5; // UnsafeQuorumRecovery is the event type recorded when all replicas are // replaced by a new one that acts as the source of truth possibly losing
It's not really accurate to say that "all replicas are replaced by a new one". Rather, one of them is promoted to be the sole voter, and all others will be deleted.
unrelated to the PR but related to the project: remind me, do we synchronously delete the other (non-chosen) replicas? I think we should or requests can hang in the cluster, or at least I remember seeing that in the past.
pkg/kv/kvserver/loqrecovery/record.go, line 64 at r11 (raw file):
// RegisterOfflineRecoveryEvents checks if recovery data was captured in the // store and notifies callback about all registered events. Its up to the
nit: It's
pkg/kv/kvserver/loqrecovery/record.go, line 78 at r11 (raw file):
ctx context.Context, readWriter storage.ReadWriter, registerEvent func(context.Context, loqrecoverypb.ReplicaRecoveryRecord) error,
Rather than this somewhat complex interplay between registerEvent which can return an error and removeEvents, we could say registerEvent func(context.Context, logrecoverypb.ReplicaRecoveryRecord) (removeEvent bool, _ error) and then it's somewhat natural (in the Go world) that the bool has no effect if there is an error (or rather, that the combination (bool, non-nil) is unidiomatic).
pkg/kv/kvserver/loqrecovery/record.go, line 92 at r11 (raw file):
iter.SeekGE(storage.MVCCKey{Key: keys.LocalStoreUnsafeReplicaRecoveryKeyMin}) valid, err := iter.Valid() for ; valid && err == nil; valid, err = iter.Valid() {
Isn't this simpler? The way it's written at this diff my brain has to work fairly hard to convince itself that everything checks out.
for {
valid, err := iter.Valid()
if err != nil {
return eventCount, errors.Wrapf(err, "failed to iterate replica recovery record keys")
}
if !valid {
break
}
// Code
}
return eventCountpkg/kv/kvserver/loqrecovery/record.go, line 119 at r11 (raw file):
// UpdateRangeLogWithRecovery inserts a range log update to system.rangelog // using information from recovery event. // useOldRangeLogEvents is enabling new reason that we can only use in upgraded
It's the other way around, the bool disables the new reason.
erikgrinaker
left a comment
There was a problem hiding this comment.
Looks good overall. 👍
While we're at it, would you mind submitting a PR for the admin server to not choke on new event types so we don't have to do the whole version gate dance again the next time we add a range log event type?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)
pkg/kv/kvserver/loqrecovery/record.go, line 128 at r7 (raw file):
Previously, aliher1911 (Oleg) wrote…
I don't think that's true as it queries the values and then coverts them to string in the endpoint. I'll add a version gate for that.
Not for the reason, I think -- seems like it's just used as-is:
Line 1491 in 73fa2ec
Unless there's something else going on in the web frontend maybe?
But you're right about the event type.
pkg/kv/kvserver/loqrecovery/record.go, line 139 at r7 (raw file):
Previously, aliher1911 (Oleg) wrote…
I think that makes sense. I created draft early to see if the general idea seem reasonable. That might require passing more data in recovery event, but that's ok. I don't think we should go over the board here because you can see what's going on from the descriptor to descriptor change, but multiple entries are fine.
Yeah, I agree. My thinking was that if we were to use the remove_voter type then we should retain its existing semantics -- with public APIs like these we can never really know what people are using them for.
Anyway, I see that we moved to a separate event type instead, and I agree that's a better approach anyway.
pkg/kv/kvserver/loqrecovery/record.go, line 78 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Rather than this somewhat complex interplay between
registerEventwhich can return an error andremoveEvents, we could sayregisterEvent func(context.Context, logrecoverypb.ReplicaRecoveryRecord) (removeEvent bool, _ error)and then it's somewhat natural (in the Go world) that the bool has no effect if there is an error (or rather, that the combination(bool, non-nil)is unidiomatic).
I tend to agree with this. I suppose this goes for the previous PR though.
pkg/kv/kvserver/loqrecovery/record.go, line 125 at r11 (raw file):
sqlExec func(ctx context.Context, stmt string, args ...interface{}) (int, error), event loqrecoverypb.ReplicaRecoveryRecord, useOldRangeLogEvents bool,
I'm not sure if we necessarily need to do this. I think it's perfectly fine to simply not emit a rangelog entry if we're not upgraded yet, as I expect this to be rare. Recovery will still work, which is the important thing. Your call though, I'm fine with leaving it in too.
4cf5341 to
cadc258
Compare
|
Ping me when this is rebased on top of #73785 and I'll quickly give it a last glance. |
75c66e2 to
64589b0
Compare
tbg
left a comment
There was a problem hiding this comment.
Looks good! The only sort of substantial comment is a suggestion to clean up the server start path a bit by pulling out some code.
Reviewed 1 of 20 files at r13.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)
pkg/kv/kvserver/kvserverpb/log.go, line 26 at r14 (raw file):
ReasonAdminRequest RangeLogEventReason = "admin request" ReasonAbandonedLearner RangeLogEventReason = "abandoned learner replica" ReasonUnsafeRecovery RangeLogEventReason = "unsafe loss of quorum recovery"
The UI works with this, right? I assume you've checked this but if you haven't please do, I know too little about the DB console to know if it needs special casing.
pkg/kv/kvserver/loqrecovery/record.go, line 128 at r14 (raw file):
// UpdateRangeLogWithRecovery inserts a range log update to system.rangelog // using information from recovery event. // useOldRangeLogEvents is enabling new reason that we can only use in upgraded
Stale comment.
pkg/server/server.go, line 1980 at r14 (raw file):
// startup fails, and write to range log once the server is running as we need // to run sql statements to update rangelog. if err := s.node.stores.VisitStores(func(s *kvserver.Store) error {
Could you pull this out into a standalone method that we simply call here? There's already quite a bit of code here and I have another suggestion below.
pkg/server/server.go, line 2008 at r14 (raw file):
}); err != nil { // We don't want to abort server if we can't record recovery events // as it is the last thing we need if cluster is already unhealthy.
Unfortunately, if the cluster is unhealthy, it is also possible for this call to time out. I think this is no worse than what we already have in the start-up path, but I wonder if we can afford to make this asynchronous. This will be a little annoying for the unit tests (needs SucceedsSoon) but more robust in making sure that clusters work better when quorum has lost.
74e91c5 to
eeab75b
Compare
aliher1911
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/kv/kvserver/kvserverpb/log.go, line 26 at r14 (raw file):
Previously, tbg (Tobias Grieger) wrote…
The UI works with this, right? I assume you've checked this but if you haven't please do, I know too little about the DB console to know if it needs special casing.
There's a special handling for new enum in the admin and also a JS change to mirror that. I tried that manually and it shows fine in console on the new version, and we also don't post events if we are in the middle of upgrade.
pkg/kv/kvserver/kvserverpb/log.proto, line 35 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
It's not really accurate to say that "all replicas are replaced by a new one". Rather, one of them is promoted to be the sole voter, and all others will be deleted.
unrelated to the PR but related to the project: remind me, do we synchronously delete the other (non-chosen) replicas? I think we should or requests can hang in the cluster, or at least I remember seeing that in the past.
They still hang in the cluster. Maybe it's time to start killing them proactively.
pkg/kv/kvserver/loqrecovery/record.go, line 128 at r7 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Not for the reason, I think -- seems like it's just used as-is:
Line 1491 in 73fa2ec
Unless there's something else going on in the web frontend maybe?
But you're right about the event type.
Event type seems to be a pickle with handling new values universally. It needs to convert to enum, but there's no default there. I guess we can introduce an "unknown" reason for future events.
pkg/server/server.go, line 2008 at r14 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Unfortunately, if the cluster is unhealthy, it is also possible for this call to time out. I think this is no worse than what we already have in the start-up path, but I wonder if we can afford to make this asynchronous. This will be a little annoying for the unit tests (needs SucceedsSoon) but more robust in making sure that clusters work better when quorum has lost.
I pushed that to async task on stopper.
b63b29b to
fc85dea
Compare
tbg
left a comment
There was a problem hiding this comment.
Reviewed 5 of 20 files at r13, 4 of 22 files at r14, 3 of 4 files at r15, 2 of 2 files at r16, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/kv/kvserver/kvserverpb/log.proto, line 35 at r11 (raw file):
Previously, aliher1911 (Oleg) wrote…
They still hang in the cluster. Maybe it's time to start killing them proactively.
Mind filing an issue and linking it so that we don't lose track?
pkg/server/server.go, line 2035 at r16 (raw file):
} }); err != nil { // Should only happen if server is being stopped before fully started.
nit if you eat another CI: no reason to even log this, you could just _ = stopper.RunAsyncTask(...)
pkg/ui/workspaces/db-console/src/views/reports/containers/range/logTable.tsx, line 44 at r16 (raw file):
case protos.cockroach.kv.kvserver.storagepb.RangeLogEventType.merge: return "Merge"; case protos.cockroach.kv.kvserver.storagepb.RangeLogEventType
Ah, I had missed that while reviewing earlier.
aliher1911
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)
pkg/kv/kvserver/kvserverpb/log.proto, line 35 at r11 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Mind filing an issue and linking it so that we don't lose track?
pkg/server/server.go, line 2035 at r16 (raw file):
Previously, tbg (Tobias Grieger) wrote…
nit if you eat another CI: no reason to even log this, you could just
_ = stopper.RunAsyncTask(...)
After trying to bruteforce js formatting even 5 runs doesn't seem alot.
fc85dea to
275f940
Compare
Previously a fact of loss of quorum replica recovery was only written as a structured log entry. This information is local to the node and does not survive if node is decommissioned. It would be beneficial to preserve this information longer. Range log while being limited to 30 days still provide a good reference data for investigations if recovery wasn't performed too long ago. This patch adds entries to rangelog for every updated range first time node that holds survivor replica is started after recovery. Release note: None
275f940 to
a7b2ee8
Compare
|
bors r=tbg,erikgrinaker |
|
Build failed: |
|
bors r=tbg,erikgrinaker |
|
Build failed (retrying...): |
|
Build succeeded: |
Previously a fact of loss of quorum replica recovery was only written
as a structured log entry. This information is local to the node and
does not survive if node is decommissioned. It would be beneficial
to preserve this information longer. Range log while being limited
to 30 days still provide a good reference data for investigations if
recovery wasn't performed too long ago.
This patch adds entries to rangelog for every updated range first time
node that holds survivor replica is started after recovery.
Release note: None
Addresses #73679