Conversation
sergepetrenko
left a comment
There was a problem hiding this comment.
Hi! Good job on covering all the corner cases!
The tests are fine, my comments are mostly about the test setup.
|
Thanks for your comments and proposed fixes. I've fixed everything you mentioned. Strangely enough test_fail_limbo_acked_promote didn't flack for me at all before your patch. |
sergepetrenko
left a comment
There was a problem hiding this comment.
Thanks for the fixes!
It's much easier to see what's happening when the before_ helpers are near the test they prepare.
Please, see my remaining comments and squash everything into a single commit once you're done.
sergepetrenko
left a comment
There was a problem hiding this comment.
Thanks for the fixes!
LGTM.
Please, ask @Gerold103 to review this once you're ready.
|
@sergepetrenko I've commited some minor changes, take a look pls. |
419de49 to
2ddab2f
Compare
97b53fb to
5923bcc
Compare
4c200ef to
69869cc
Compare
|
@sergepetrenko, @Gerold103 thanks for your replies, I've completely overhauled all the tests, please take a look. |
sergepetrenko
left a comment
There was a problem hiding this comment.
Hi! Thanks for working on this!
There are some set-up related comments left. Please, address them.
fca1b5a to
477599a
Compare
477599a to
50ead38
Compare
8c4f42c to
9447818
Compare
|
@Gerold103, @sergepetrenko - thanks for indepth review! I rewrote those test almost from scratch (taking into account all previous comments). Take a look, when you have time. |
Gerold103
left a comment
There was a problem hiding this comment.
Thanks for the patch!
9447818 to
7af31d1
Compare
|
@Gerold103 Thanks for review! I've fixed/commented all found issues. Take a look, when you have time to. |
7af31d1 to
6836b87
Compare
Gerold103
left a comment
There was a problem hiding this comment.
Good work! Some important places are covered. But I found a couple of new ones:
diff --git a/src/box/box.cc b/src/box/box.cc
index 881932727..e7b5b853e 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1975,9 +1975,12 @@ box_promote_qsync(void)
if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)
return 0;
int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
- if (wait_lsn < 0)
+ if (wait_lsn < 0) {
+ assert(false);
return -1;
+ }
if (raft->state != RAFT_STATE_LEADER) {
+ assert(false);
diag_set(ClientError, ER_NOT_LEADER, raft->leader);
return -1;
}with these the tests pass. Can they be covered in this patch too?
sergepetrenko
left a comment
There was a problem hiding this comment.
Thanks for working on this!
We are close to the finish line, I hope.
a730937 to
6b6f1df
Compare
|
I've fixed test that was flaky on @sergepetrenko's machine and fixed several test, that broke after split-brain detection was introduced. @Gerold103 As for box_promote_qsync coverage - I didn't find an easy way to test those lines. I can create new issues for testing box_promote_qsync and box_wait_limbo_acked. |
sergepetrenko
left a comment
There was a problem hiding this comment.
Thanks for the fixes!
Good job on going through all the reviews.
LGTM.
Gerold103
left a comment
There was a problem hiding this comment.
Thanks for the patch!
As for box_promote_qsync coverage - I didn't find an easy way to test those lines. I can create new issues for testing box_promote_qsync and box_wait_limbo_acked.
yes, please.
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
to be emptied
10. Interfering promote while waiting for limbo to be acked
(similar to replication/gh-5430-qsync-promote-crash.test.lua)
Closes #6033
NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
6b6f1df to
a2ff951
Compare
|
Backported to 2.10. |
Covered most of box_promote and box_demote with tests.
Didn’t find a way to test failing box_wait_limbo_acked in box_demote without adding dedicated ERRINJ.
IMO box_wait_limbo_acked should be covered by its own tests.
Also found interesting behaviour when simultaneously promoting two servers with election_mode = 'off' for the first time - sometimes they can both become limbo owners for some time.
wip_test.lua
Can lead to following result:
Adding 3rd server, who calls box.ctl.promote before test starts fixes flack in the test, but there might be a problem with possibility of two simultaneous limbo owners.
Closes #6033