asim: fix simulator non-deterministic behavior #108768
asim: fix simulator non-deterministic behavior #108768craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
4cd8572 to
f4c6976
Compare
3042428 to
f1bb738
Compare
|
Do we want to include all nodes with an active membership? Should we use |
f1bb738 to
6e33b0c
Compare
kvoli
left a comment
There was a problem hiding this comment.
Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)
-- commits line 5 at r1:
nit: set
-- commits line 5 at r1:
nit: consider replacing "current time" with "system time"
-- commits line 15 at r1:
nit: You can drop the markdown link, and instead just reference the SHA e.g. 4a52da5c0d90e40c6e1d6e2b26f4a72dd04527ef
pkg/kv/kvserver/asim/state/impl.go line 1080 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Do we want to include all nodes with an active membership? Should we use
livenesspb.MembershipStatus_ACTIVEinstead? It also excludesNodeLivenessStatus_DECOMMISSIONING.
I don't think there is an equivalent mapping in the simulator yet, which separates liveness and membership.
You could add this in another commit, or file an issue to add the concept of membership, and separate it out from liveness status.
pkg/kv/kvserver/asim/state/liveness.go line 51 at r1 (raw file):
// Following the timeline of the states that a liveness record goes through: // // |---LIVE---|---UNAVAILABLE---|---DEAD---> time
Is this diagram also somewhere else in the code? I think referencing the existing comments/code for this will age better.
6e33b0c to
8519fc7
Compare
|
Added as a todo here. |
|
Previously, kvoli (Austen) wrote…
I'm using a permanent link here. Are there better ways to link to the code snippet? |
kvoli
left a comment
There was a problem hiding this comment.
with removing the link, and referencing a fn name.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)
pkg/kv/kvserver/asim/state/impl.go line 1077 at r2 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Added as a todo here.
Thanks
pkg/kv/kvserver/asim/state/liveness.go line 51 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
I'm using a permanent link here. Are there better ways to link to the code snippet?
You can reference the function naming, if it is unlikely to change. For this one, something like "see liveness.LivenessStatus" would be fine.
Using a perma link is a bit clunky, and doesn't map so well within an editor.
pkg/kv/kvserver/asim/state/liveness.go line 100 at r1 (raw file):
true, timeUntilNodeDead, 0,
nit: consider naming the function arguments if they're not obvious e.g. 0, /* timeAfterNodeSuspect */.
Previously, the allocator simulator exhibits non-deterministic behavior because it set the liveness expiration record using the real system time plus one minute. In addition, the liveness record was not updated between ticks. While this was typically acceptable, it could lead to dead nodes in deadlock test scenarios where tests took longer. To address this, the following changes were made in this patch: 1. Remove the usage of TestNodeVitality as the simulator uses a simulated clock rather than a real-time clock. 2. Replace the existing TestNodeVitality mapping with an enum map for Liveness Status. The majority of the code changes just revert to the state before this [commit](4a52da5). 3. To hook the liveness component with the liveness status mapping, this patch also introduces a conversion method which constructs node vitality in a manner that respects all node liveness status properties. Fixes: cockroachdb#105904, cockroachdb#108092 Release Note: None
8519fc7 to
fe2524c
Compare
|
Previously, kvoli (Austen) wrote…
Done. |
|
TFTR! bors r=kvoli |
|
Build failed: |
|
Failure Free disk space requirements could not be met looks irrelevant. bors r+ |
|
Build succeeded: |
Previously, the allocator simulator exhibits non-deterministic behavior
because it set the liveness expiration record using the real system time
plus one minute. In addition, the liveness record was not updated between
ticks. While this was typically acceptable, it could lead to dead nodes in
deadlock test scenarios where tests took longer.
To address this, the following changes were made in this patch:
rather than a real-time clock.
Status. The majority of the code changes just revert to the state before this
commit.
also introduces a conversion method which constructs node vitality in a manner
that respects all node liveness status properties.
Fixes: #105904, #108092
Release Note: None