Skip to content

test: box_promote and box_demote#6752

Merged
kyukhin merged 1 commit intomasterfrom
grafin/gh-6033-box-promote-demote-tests
Jun 30, 2022
Merged

test: box_promote and box_demote#6752
kyukhin merged 1 commit intomasterfrom
grafin/gh-6033-box-promote-demote-tests

Conversation

@grafin
Copy link
Member

@grafin grafin commented Dec 29, 2021

Covered most of box_promote and box_demote with tests.
Didn’t find a way to test failing box_wait_limbo_acked in box_demote without adding dedicated ERRINJ.
IMO box_wait_limbo_acked should be covered by its own tests.

Also found interesting behaviour when simultaneously promoting two servers with election_mode = 'off' for the first time - sometimes they can both become limbo owners for some time.

wip_test.lua
local luatest = require('luatest')
local helpers = require('test.luatest_helpers')
local cluster = require('test.luatest_helpers.cluster')
local g = luatest.group('gh-6033-box-promote-demote')

g.before_each(function(g)
    g.cluster = cluster:new({})

    g.box_cfg = {
        election_mode = 'off';
        replication = {
            helpers.instance_uri('server_', 1);
            helpers.instance_uri('server_', 2);
        };
    }

    g.server_1 = g.cluster:build_and_add_server(
        {alias = 'server_1', box_cfg = g.box_cfg})
    g.server_2 = g.cluster:build_and_add_server(
        {alias = 'server_2', box_cfg = g.box_cfg})
    g.cluster:start()
end)

g.after_each(function(g)
    g.cluster:stop()
    g.cluster.servers = nil
end)

g.test_interfering_promote_in_wal = function(g)
    local wal_write_count = g.server_1:exec(function()
        box.error.injection.set('ERRINJ_WAL_DELAY', true)
        return box.error.injection.get('ERRINJ_WAL_WRITE_COUNT')
    end)

    g.server_2:exec(function()
        box.ctl.promote()
        box.error.injection.set('ERRINJ_WAL_DELAY', true)
    end)
    g.server_1:wait_wal_write_count(wal_write_count + 1)

    local ok, err = g.server_1:exec(function()
        local f = require('fiber').new(box.ctl.promote); f:set_joinable(true)
        box.error.injection.set('ERRINJ_WAL_DELAY', false)
        return f:join()
    end)

    print(string.format("Server %d thinks that limbo is owned by %d",
        g.server_1:exec(function()
            return box.info.id, box.info.synchro.queue.owner
    end)))

    print(string.format("Server %d thinks that limbo is owned by %d",
        g.server_2:exec(function()
            return box.info.id, box.info.synchro.queue.owner
    end)))

    g.server_2:exec(function()
        box.error.injection.set('ERRINJ_WAL_DELAY', false)
    end)

    luatest.assert(
        not ok and err.code == box.error.INTERFERING_PROMOTE,
        'interfering promote not handled')
end

Can lead to following result:

TAP version 13
1..1
# Started on Wed Dec 29 08:18:31 2021
# Starting group: gh-6033-box-promote-demote
Server 1 thinks that limbo is owned by 1
Server 2 thinks that limbo is owned by 2
not ok 1        gh-6033-box-promote-demote.test_interfering_promote_in_wal
#   ...v/github/tarantool/test/replication-luatest/wip_test.lua:63: interfering promote not handled
#   expected: a value evaluating to true, actual: false
#   stack traceback:
#       ...v/github/tarantool/test/replication-luatest/wip_test.lua:63: in function 'gh-6033-box-promote-demote.test_interfering_promote_in_wal'
#       ...
#       [C]: in function 'xpcall'
# Ran 1 tests in 2.703 seconds, 0 successes, 1 fail

Adding 3rd server, who calls box.ctl.promote before test starts fixes flack in the test, but there might be a problem with possibility of two simultaneous limbo owners.

Closes #6033

@grafin grafin requested a review from sergepetrenko December 29, 2021 05:22
@github-actions
Copy link

Check coverity results on coverity.com site
Coverity Status

Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Good job on covering all the corner cases!

The tests are fine, my comments are mostly about the test setup.

@grafin
Copy link
Member Author

grafin commented Jan 2, 2022

Thanks for your comments and proposed fixes. I've fixed everything you mentioned. Strangely enough test_fail_limbo_acked_promote didn't flack for me at all before your patch.

@grafin grafin requested a review from sergepetrenko January 2, 2022 17:35
Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes!

It's much easier to see what's happening when the before_ helpers are near the test they prepare.

Please, see my remaining comments and squash everything into a single commit once you're done.

@grafin grafin requested a review from sergepetrenko January 12, 2022 09:15
Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes!
LGTM.
Please, ask @Gerold103 to review this once you're ready.

@Gerold103 Gerold103 self-requested a review January 12, 2022 21:24
@grafin grafin self-assigned this Jan 17, 2022
@grafin
Copy link
Member Author

grafin commented Jan 17, 2022

@sergepetrenko I've commited some minor changes, take a look pls.

@sergepetrenko sergepetrenko self-requested a review January 18, 2022 11:57
@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch from 419de49 to 2ddab2f Compare January 19, 2022 06:08
@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch 3 times, most recently from 97b53fb to 5923bcc Compare January 24, 2022 10:54
@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch 2 times, most recently from 4c200ef to 69869cc Compare February 9, 2022 20:45
@grafin
Copy link
Member Author

grafin commented Feb 9, 2022

@sergepetrenko, @Gerold103 thanks for your replies, I've completely overhauled all the tests, please take a look.

Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for working on this!

There are some set-up related comments left. Please, address them.

@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch 2 times, most recently from fca1b5a to 477599a Compare February 16, 2022 10:25
@grafin grafin requested a review from sergepetrenko February 16, 2022 11:06
@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch from 477599a to 50ead38 Compare February 17, 2022 09:04
@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch from 8c4f42c to 9447818 Compare May 31, 2022 19:13
@grafin
Copy link
Member Author

grafin commented May 31, 2022

@Gerold103, @sergepetrenko - thanks for indepth review! I rewrote those test almost from scratch (taking into account all previous comments). Take a look, when you have time.

@grafin grafin requested review from Gerold103 and sergepetrenko May 31, 2022 19:37
@grafin grafin assigned sergepetrenko and unassigned grafin and sergepetrenko May 31, 2022
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch from 9447818 to 7af31d1 Compare June 15, 2022 13:20
@grafin
Copy link
Member Author

grafin commented Jun 15, 2022

@Gerold103 Thanks for review! I've fixed/commented all found issues. Take a look, when you have time to.

@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch from 7af31d1 to 6836b87 Compare June 15, 2022 14:04
@Gerold103 Gerold103 self-requested a review June 15, 2022 20:50
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! Some important places are covered. But I found a couple of new ones:

diff --git a/src/box/box.cc b/src/box/box.cc
index 881932727..e7b5b853e 100644
--- a/src/box/box.cc
+++ b/src/box/box.cc
@@ -1975,9 +1975,12 @@ box_promote_qsync(void)
 	if (txn_limbo_replica_term(&txn_limbo, instance_id) == raft->term)
 		return 0;
 	int64_t wait_lsn = box_wait_limbo_acked(TIMEOUT_INFINITY);
-	if (wait_lsn < 0)
+	if (wait_lsn < 0) {
+		assert(false);
 		return -1;
+	}
 	if (raft->state != RAFT_STATE_LEADER) {
+		assert(false);
 		diag_set(ClientError, ER_NOT_LEADER, raft->leader);
 		return -1;
 	}

with these the tests pass. Can they be covered in this patch too?

Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!
We are close to the finish line, I hope.

@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch 3 times, most recently from a730937 to 6b6f1df Compare June 20, 2022 14:16
@grafin
Copy link
Member Author

grafin commented Jun 22, 2022

I've fixed test that was flaky on @sergepetrenko's machine and fixed several test, that broke after split-brain detection was introduced. @Gerold103 As for box_promote_qsync coverage - I didn't find an easy way to test those lines. I can create new issues for testing box_promote_qsync and box_wait_limbo_acked.

Copy link
Collaborator

@sergepetrenko sergepetrenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes!
Good job on going through all the reviews.
LGTM.

@sergepetrenko sergepetrenko requested a review from Gerold103 June 27, 2022 09:18
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

As for box_promote_qsync coverage - I didn't find an easy way to test those lines. I can create new issues for testing box_promote_qsync and box_wait_limbo_acked.

yes, please.

@grafin
Copy link
Member Author

grafin commented Jun 28, 2022

Follow-up issues:
box_promote_qsync coverage: #7317
box_wait_limbo_acked coverage: #7318

@grafin grafin added the full-ci Enables all tests for a pull request label Jun 30, 2022
Covered most of box_promote and box_demote with tests:
1. Promote/demote unconfigured box
2. Promoting current leader with elections on and off
3. Demoting follower with elections on and off
4. Promoting current leader, but not limbo owner with elections on
5. Demoting current leader with elections on and off
6. Simultaneous promote/demote
7. Promoting voter
8. Interfering promote/demote while writing new term to wal
9. Interfering promote/demote while waiting for synchro queue
   to be emptied
10. Interfering promote while waiting for limbo to be acked
    (similar to replication/gh-5430-qsync-promote-crash.test.lua)

Closes #6033

NO_DOC=testing stuff
NO_CHANGELOG=testing stuff
@grafin grafin force-pushed the grafin/gh-6033-box-promote-demote-tests branch from 6b6f1df to a2ff951 Compare June 30, 2022 11:45
@kyukhin kyukhin merged commit 5a8dca7 into master Jun 30, 2022
@kyukhin kyukhin deleted the grafin/gh-6033-box-promote-demote-tests branch June 30, 2022 11:49
@kyukhin
Copy link
Contributor

kyukhin commented Jun 30, 2022

Backported to 2.10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full-ci Enables all tests for a pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

box.ctl.promote() code in some simple places is not tested at all

4 participants