Skip to content

test: monitor thrasher wait until quorum#51570

Merged
yuriw merged 1 commit intoceph:mainfrom
NitzanMordhai:wip-nitzan-test-mon-thrasher-quorum-delay-inc
May 25, 2023
Merged

test: monitor thrasher wait until quorum#51570
yuriw merged 1 commit intoceph:mainfrom
NitzanMordhai:wip-nitzan-test-mon-thrasher-quorum-delay-inc

Conversation

@NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented May 18, 2023

With 1 sec. delay we may sometimes fail to get correct length of quorum since the monitor didn't updated on time.
With the following fix, we will wait for quorum and check every few seconds (3) until timeout (30).

Fixes: https://tracker.ceph.com/issues/52316

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@NitzanMordhai NitzanMordhai requested a review from a team as a code owner May 18, 2023 13:41
@github-actions github-actions bot added the core label May 18, 2023
@kamoltat
Copy link
Member

@NitzanMordhai instead of increasing the thrash_delay, is there a way we can do this without relying on just waiting 5 seconds and do 1 check? I wonder if we can have something like wait_until_quorum(3, 30) where we check the quorum every like 5 seconds for quorum size of 3, but the time limit is 30s total so you get 6 tries, but if you hit 3 in the first try then we move on.

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-test-mon-thrasher-quorum-delay-inc branch from 2acfa40 to d3649c2 Compare May 21, 2023 06:13
@github-actions github-actions bot added the tests label May 21, 2023
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-test-mon-thrasher-quorum-delay-inc branch 4 times, most recently from a3b6299 to a1a4ecb Compare May 21, 2023 11:07
@NitzanMordhai
Copy link
Contributor Author

@NitzanMordhai instead of increasing the thrash_delay, is there a way we can do this without relying on just waiting 5 seconds and do 1 check? I wonder if we can have something like wait_until_quorum(3, 30) where we check the quorum every like 5 seconds for quorum size of 3, but the time limit is 30s total so you get 6 tries, but if you hit 3 in the first try then we move on.

@kamoltat good idea, my code updated, please take a look

@NitzanMordhai
Copy link
Contributor Author

I ran the failing test 50 times with the new code, no new occurrence happened
http://pulpito.front.sepia.ceph.com/nmordech-2023-05-21_12:48:41-rados:monthrash-main-distro-default-smithi/

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-test-mon-thrasher-quorum-delay-inc branch from a1a4ecb to 678abb2 Compare May 22, 2023 04:41
@NitzanMordhai
Copy link
Contributor Author

jenkins test make check

Copy link
Member

@kamoltat kamoltat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling

other than that LGTM

break
self.log("quorum is size %d" % len(s['quorum']))

self.log("finel quorum is size %d" % len(s['quorum']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kamoltat
Copy link
Member

Also, minor thing, please change the PR title and commit message to reflect the change you made, currently it is still the old commit message where we are checking just once with 5 seconds. Thanks!

@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-test-mon-thrasher-quorum-delay-inc branch from 678abb2 to 8bac454 Compare May 22, 2023 13:47
@NitzanMordhai NitzanMordhai requested a review from kamoltat May 22, 2023 13:48
@NitzanMordhai NitzanMordhai changed the title test: monitor thrasher increase thrash_delay test: monitor thrasher wait until quorum May 22, 2023
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-test-mon-thrasher-quorum-delay-inc branch from 8bac454 to c2ca265 Compare May 22, 2023 13:53
With 1 sec. delay we may sometimes fail to get correct length of
quorum since the monitor didn't updated on time.
With the following fix, we will wait for quorum and check every few
seconds (3) until timeout (30).

Fixes: https://tracker.ceph.com/issues/52316
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-test-mon-thrasher-quorum-delay-inc branch from c2ca265 to fbd10ba Compare May 22, 2023 13:54
Copy link
Member

@kamoltat kamoltat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ljflores
Copy link
Member

@yuriw yuriw merged commit e513690 into ceph:main May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants