Skip to content

ceph-monstore-tool: use a large enough paxos/{first,last}_committed#27465

Merged
tchaikov merged 6 commits intoceph:masterfrom
tchaikov:wip-38219
Jun 16, 2021
Merged

ceph-monstore-tool: use a large enough paxos/{first,last}_committed#27465
tchaikov merged 6 commits intoceph:masterfrom
tchaikov:wip-38219

Conversation

@tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Apr 9, 2019

set monitor/cluster_fingerprint for the newly created monstore,
otherwise, the leader will create a new paxos proposal and there is
chance that the quorum will accept it.

when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.

so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we set "monitor/cluster_fingerprint" when rebuilding the
monstore.

Fixes: http://tracker.ceph.com/issues/38219
Signed-off-by: Kefu Chai kchai@redhat.com

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

@neha-ojha
Copy link
Member

retest this please

@tchaikov
Copy link
Contributor Author

tchaikov commented Apr 9, 2019

@tchaikov tchaikov changed the title ceph-monstore-tool: set monitor/cluster_fingerprint when rebuilding ceph-monstore-tool: use a large enough paxos/{first,last}_committed Apr 11, 2019
@tchaikov
Copy link
Contributor Author

@tchaikov
Copy link
Contributor Author

the tests still fail like a hell..

@stale
Copy link

stale bot commented Jun 11, 2019

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Jun 11, 2019
@tchaikov tchaikov self-assigned this Jun 21, 2019
@stale stale bot removed the stale label Jun 21, 2019
@stale
Copy link

stale bot commented Aug 20, 2019

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Aug 20, 2019
@jdurgin
Copy link
Member

jdurgin commented Aug 20, 2019

unstale

@stale stale bot removed the stale label Aug 20, 2019
@stale
Copy link

stale bot commented Oct 19, 2019

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Oct 19, 2019
@ideepika
Copy link
Member

ideepika commented Jun 8, 2021

@tchaikov would it be possible to get this PR in nautilus seeing this issue pops up every few months in teuthology: https://tracker.ceph.com/issues/38219

@ideepika
Copy link
Member

ideepika commented Jun 8, 2021

jenkins retest this please

@tchaikov
Copy link
Contributor Author

@ideepika will revisit this pr by the end of this week.

so the rebuild paxos transaction won't be overwritten by the ones
created before recovery completes.

when the quorum is recovering, the leader will collect the paxos
transactions from peons. if the quorum accept the proposal for setting
the fingerprint, the peon will update the monitor with the paxos
transaction with a newer "last_committed" than the one created using
update_paxos() in ceph_monstore_tool.cc. the latter "last_committed" is
always 0.

so, to avoid this extra paxos proposal obsoleting the "rebuilding" paxos
transaction, we use a large enough number for {first,last}_committed.

Fixes: http://tracker.ceph.com/issues/38219
Signed-off-by: Kefu Chai <kchai@redhat.com>
@tchaikov
Copy link
Contributor Author

changelog

  • rebased against master

@tchaikov
Copy link
Contributor Author

tchaikov added 5 commits June 10, 2021 20:29
for better reading experience.

Signed-off-by: Kefu Chai <kchai@redhat.com>
for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
more consistent this way.

Signed-off-by: Kefu Chai <kchai@redhat.com>
for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
mon_tick_interval is 5 seconds by default. monitors update their
rotating keys every mon_tick_interval. before monitors forms a
quorum, the auth requests from clients are put into the wait list.
these requests are re-enqueued once the monitors form a quorum. but
there is a small window of mon_tick_interval, before they are able
to serve the auth requests even after their claim to be able to
server requests. if these re-enqueued requests happen to be served
in this window, and if authx is enabled, they will be greeted with
errors like

handle_auth_bad_method server allowed_methods [2] but i only support [2]

in the case of ceph cli, the error would look like:

[errno 13] RADOS permission denied (error connecting to the cluster)

so, to address this issue, the EACCES error is ignored when waiting
for a quorum.

Signed-off-by: Kefu Chai <kchai@redhat.com>
@ideepika
Copy link
Member

@ideepika will revisit this pr by the end of this week.

@tchaikov sure thanks, checked with Josh, it might be okay to skip backport to nautilus.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jun 10, 2021

in the failed tests, monitors were able to form quorum. but there were two categories of failures:

  1. osd failed to authorize itself, like https://pulpito.ceph.com/kchai-2021-06-10_15:05:32-rados:singleton-wip-38219-kefu-distro-basic-smithi/6164944/
2021-06-10T15:27:09.266+0000 7f65183d9700  1 --2- 172.21.15.95:0/15418 >> [v2:172.21.15.95:6814/15628,v1:172.21.15.95:6815/15628] conn(0x55bddefa5800 0x55bdde53c300 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=1 rev1=1 rx=0 tx=0).handle_auth_bad_method method=2 result (13)
Permission denied, allowed methods=[2], allowed modes=[1,2]
  1. osd failed to boot -- cannot even find the preambles like messages from rocksdb. see https://pulpito.ceph.com/kchai-2021-06-11_06:23:05-rados:singleton-wip-38219-kefu-2-distro-basic-smithi/6166231/

but i still think this changeset is an improvement. as it addresses two issues:

  1. use a large-enough paxos/{first,last}_committed
  2. tolerate the EACCES in the 5-second window.

@neha-ojha neha-ojha removed the DNM label Jun 10, 2021
@tchaikov
Copy link
Contributor Author

changelog

  • use bash lexer to render bash code in doc/rados/troubleshooting
  • add a couple cleanups using the make_scope_guard() helper
  • tolerate the EACCES in the 5-second window after the quorum is formed

@neha-ojha could you take another look?

@tchaikov tchaikov requested a review from neha-ojha June 11, 2021 09:02
@tchaikov tchaikov removed their assignment Jun 11, 2021
@tchaikov tchaikov requested a review from jdurgin June 11, 2021 09:28
@neha-ojha
Copy link
Member

changelog

  • use bash lexer to render bash code in doc/rados/troubleshooting
  • add a couple cleanups using the make_scope_guard() helper
  • tolerate the EACCES in the 5-second window after the quorum is formed

@neha-ojha could you take another look?

@tchaikov sure, will take a look at it tomorrow

Copy link
Member

@neha-ojha neha-ojha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tchaikov a couple of questions 1. do you understand why https://tracker.ceph.com/issues/38219 is not seen in releases after nautilus? 2. how far should we backport this patch? overall, this change makes sense to me

@tchaikov
Copy link
Contributor Author

@tchaikov a couple of questions

  1. do you understand why https://tracker.ceph.com/issues/38219 is not seen in releases after nautilus?

@neha-ojha , thank you for reviewing the changes. yes, the frequency of the failures is much lower in recent releases. but it still surfaces occasionally. i checked mon/Monitor.cc, seems there are no changes after nautilus in the "fingerprint" related logic. so i am not able to explain why it's less reproducible after nautilus =(

  1. how far should we backport this patch? overall, this change makes sense to me

as far as possible, i'd say. even back to nautilus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants