ceph-monstore-tool: use a large enough paxos/{first,last}_committed by tchaikov · Pull Request #27465 · ceph/ceph

tchaikov · 2019-04-09T14:35:43Z

set monitor/cluster_fingerprint for the newly created monstore,
otherwise, the leader will create a new paxos proposal and there is
chance that the quorum will accept it.

when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.

so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we set "monitor/cluster_fingerprint" when rebuilding the
monstore.

Fixes: http://tracker.ceph.com/issues/38219
Signed-off-by: Kefu Chai kchai@redhat.com

References tracker ticket
Updates documentation if necessary
Includes tests for new functionality or reproducer for bug

neha-ojha · 2019-04-09T16:22:31Z

retest this please

tchaikov · 2019-04-09T16:43:50Z

http://pulpito.ceph.com/kchai-2019-04-09_16:42:22-rados-wip-38219-distro-basic-smithi/

tchaikov · 2019-04-11T12:50:26Z

http://pulpito.ceph.com/kchai-2019-04-11_12:48:09-rados-wip-38219-distro-basic-mira/

tchaikov · 2019-04-11T14:13:04Z

the tests still fail like a hell..

stale · 2019-06-11T12:51:33Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

stale · 2019-08-20T15:15:15Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

jdurgin · 2019-08-20T20:24:03Z

unstale

stale · 2019-10-19T20:46:14Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

ideepika · 2021-06-08T18:42:47Z

@tchaikov would it be possible to get this PR in nautilus seeing this issue pops up every few months in teuthology: https://tracker.ceph.com/issues/38219

ideepika · 2021-06-08T18:45:07Z

jenkins retest this please

tchaikov · 2021-06-10T01:11:47Z

@ideepika will revisit this pr by the end of this week.

so the rebuild paxos transaction won't be overwritten by the ones created before recovery completes. when the quorum is recovering, the leader will collect the paxos transactions from peons. if the quorum accept the proposal for setting the fingerprint, the peon will update the monitor with the paxos transaction with a newer "last_committed" than the one created using update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is always 0. so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos transaction, we use a large enough number for {first,last}_committed. Fixes: http://tracker.ceph.com/issues/38219 Signed-off-by: Kefu Chai <kchai@redhat.com>

tchaikov · 2021-06-10T02:46:05Z

changelog

rebased against master

tchaikov · 2021-06-10T04:51:44Z

being tested at https://pulpito.ceph.com/kchai-2021-06-10_04:50:42-rados:singleton-wip-38219-kefu-distro-basic-smithi/

for better reading experience. Signed-off-by: Kefu Chai <kchai@redhat.com>

for better readability Signed-off-by: Kefu Chai <kchai@redhat.com>

more consistent this way. Signed-off-by: Kefu Chai <kchai@redhat.com>

for better readability Signed-off-by: Kefu Chai <kchai@redhat.com>

mon_tick_interval is 5 seconds by default. monitors update their rotating keys every mon_tick_interval. before monitors forms a quorum, the auth requests from clients are put into the wait list. these requests are re-enqueued once the monitors form a quorum. but there is a small window of mon_tick_interval, before they are able to serve the auth requests even after their claim to be able to server requests. if these re-enqueued requests happen to be served in this window, and if authx is enabled, they will be greeted with errors like handle_auth_bad_method server allowed_methods [2] but i only support [2] in the case of ceph cli, the error would look like: [errno 13] RADOS permission denied (error connecting to the cluster) so, to address this issue, the EACCES error is ignored when waiting for a quorum. Signed-off-by: Kefu Chai <kchai@redhat.com>

ideepika · 2021-06-10T13:10:26Z

@ideepika will revisit this pr by the end of this week.

@tchaikov sure thanks, checked with Josh, it might be okay to skip backport to nautilus.

tchaikov · 2021-06-10T15:06:42Z

in the failed tests, monitors were able to form quorum. but there were two categories of failures:

osd failed to authorize itself, like https://pulpito.ceph.com/kchai-2021-06-10_15:05:32-rados:singleton-wip-38219-kefu-distro-basic-smithi/6164944/

2021-06-10T15:27:09.266+0000 7f65183d9700  1 --2- 172.21.15.95:0/15418 >> [v2:172.21.15.95:6814/15628,v1:172.21.15.95:6815/15628] conn(0x55bddefa5800 0x55bdde53c300 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0).handle_auth_bad_method method=2 result (13)
Permission denied, allowed methods=[2], allowed modes=[1,2]

osd failed to boot -- cannot even find the preambles like messages from rocksdb. see https://pulpito.ceph.com/kchai-2021-06-11_06:23:05-rados:singleton-wip-38219-kefu-2-distro-basic-smithi/6166231/

but i still think this changeset is an improvement. as it addresses two issues:

use a large-enough paxos/{first,last}_committed
tolerate the EACCES in the 5-second window.

tchaikov · 2021-06-11T09:02:20Z

changelog

use bash lexer to render bash code in doc/rados/troubleshooting
add a couple cleanups using the make_scope_guard() helper
tolerate the EACCES in the 5-second window after the quorum is formed

@neha-ojha could you take another look?

neha-ojha · 2021-06-15T00:19:07Z

changelog

use bash lexer to render bash code in doc/rados/troubleshooting

add a couple cleanups using the make_scope_guard() helper

tolerate the EACCES in the 5-second window after the quorum is formed

@neha-ojha could you take another look?

@tchaikov sure, will take a look at it tomorrow

neha-ojha

@tchaikov a couple of questions 1. do you understand why https://tracker.ceph.com/issues/38219 is not seen in releases after nautilus? 2. how far should we backport this patch? overall, this change makes sense to me

tchaikov · 2021-06-16T01:38:26Z

@tchaikov a couple of questions

do you understand why https://tracker.ceph.com/issues/38219 is not seen in releases after nautilus?

@neha-ojha , thank you for reviewing the changes. yes, the frequency of the failures is much lower in recent releases. but it still surfaces occasionally. i checked mon/Monitor.cc, seems there are no changes after nautilus in the "fingerprint" related logic. so i am not able to explain why it's less reproducible after nautilus =(

how far should we backport this patch? overall, this change makes sense to me

as far as possible, i'd say. even back to nautilus.

tchaikov added bug-fix tools labels Apr 9, 2019

tchaikov requested a review from neha-ojha April 9, 2019 14:35

tchaikov added the needs-qa label Apr 9, 2019

neha-ojha approved these changes Apr 9, 2019

View reviewed changes

tchaikov force-pushed the wip-38219 branch from dbfeb22 to 5ff08e5 Compare April 10, 2019 11:59

liewegas added the wip-sage-testing label Apr 10, 2019

tchaikov force-pushed the wip-38219 branch from 5ff08e5 to cd00661 Compare April 11, 2019 06:54

tchaikov changed the title ~~ceph-monstore-tool: set monitor/cluster_fingerprint when rebuilding~~ ceph-monstore-tool: use a large enough paxos/{first,last}_committed Apr 11, 2019

tchaikov added wip-kefu2-testing DNM labels Apr 11, 2019

liewegas removed wip-sage-testing needs-qa labels Apr 12, 2019

stale bot added the stale label Jun 11, 2019

tchaikov self-assigned this Jun 21, 2019

stale bot removed the stale label Jun 21, 2019

tchaikov removed the wip-kefu2-testing label Jun 29, 2019

stale bot added the stale label Aug 20, 2019

stale bot removed the stale label Aug 20, 2019

stale bot added the stale label Oct 19, 2019

tchaikov force-pushed the wip-38219 branch from cd00661 to 5475ef7 Compare June 10, 2021 02:44

tchaikov added 5 commits June 10, 2021 20:29

doc/rados/troubleshooting: highlight bash script with bash lexer

5d431ce

for better reading experience. Signed-off-by: Kefu Chai <kchai@redhat.com>

tools/ceph_monstore_tool: use make_scope_guard() for cleanup

99af14f

for better readability Signed-off-by: Kefu Chai <kchai@redhat.com>

tools/ceph_monstore_tool: s/BOOST_SCOPE_EXIT/make_scope_guard/

1a2976d

more consistent this way. Signed-off-by: Kefu Chai <kchai@redhat.com>

tasks/ceph_manager: use safe_while() to refactor the wait for quorum

3908c1f

for better readability Signed-off-by: Kefu Chai <kchai@redhat.com>

github-actions bot added core documentation tests labels Jun 10, 2021

neha-ojha removed the DNM label Jun 10, 2021

tchaikov requested a review from neha-ojha June 11, 2021 09:02

tchaikov removed their assignment Jun 11, 2021

tchaikov requested a review from jdurgin June 11, 2021 09:28

neha-ojha reviewed Jun 15, 2021

View reviewed changes

tchaikov merged commit 6f58a26 into ceph:master Jun 16, 2021

tchaikov deleted the wip-38219 branch June 16, 2021 01:38

tchaikov mentioned this pull request Jun 16, 2021

nautilus: ceph-monstore-tool: use a large enough paxos/{first,last}_committed #41874

Merged

3 tasks

tchaikov mentioned this pull request Jul 20, 2021

pacific: ceph-monstore-tool: use a large enough paxos/{first,last}_committed #42411

Merged

3 tasks

cfsnyder mentioned this pull request Sep 22, 2021

octopus: ceph-monstore-tool: use a large enough paxos/{first,last}_committed #43263

Merged

Conversation

tchaikov commented Apr 9, 2019

Uh oh!

neha-ojha commented Apr 9, 2019

Uh oh!

tchaikov commented Apr 9, 2019

Uh oh!

tchaikov commented Apr 11, 2019

Uh oh!

tchaikov commented Apr 11, 2019

Uh oh!

stale bot commented Jun 11, 2019

Uh oh!

stale bot commented Aug 20, 2019

Uh oh!

jdurgin commented Aug 20, 2019

Uh oh!

stale bot commented Oct 19, 2019

Uh oh!

ideepika commented Jun 8, 2021

Uh oh!

ideepika commented Jun 8, 2021

Uh oh!

tchaikov commented Jun 10, 2021

Uh oh!

tchaikov commented Jun 10, 2021

Uh oh!

tchaikov commented Jun 10, 2021

Uh oh!

ideepika commented Jun 10, 2021

Uh oh!

tchaikov commented Jun 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tchaikov commented Jun 11, 2021

Uh oh!

neha-ojha commented Jun 15, 2021

Uh oh!

neha-ojha left a comment

Choose a reason for hiding this comment

Uh oh!

tchaikov commented Jun 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tchaikov commented Jun 10, 2021 •

edited

Loading