mon/MonmapMonitor: do not propose on error in prepare_update by batrick · Pull Request #50503 · ceph/ceph

batrick · 2023-03-13T17:14:52Z

Also: correct a serious protocol error where the monitor would reply to a command before the proposal is committed.

Fixes: https://tracker.ceph.com/issues/58974

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

ajarr

I think the commit could be split into three separate commits that solve separate issues.

MonMapMonitor: do not propose changes on error in prepare_update/prepare_command
MonMapMonitor: commit proposed map changes initiated by a ceph mon command before replying
(This seems like a more important fix than the one above. It changes the monitor's handling of ceph mon commands that has existed since #6854)
MonMapMonitor: do not propose changes when mon enable msgr-2 command is called on monitors that already have msgr2 enabled

src/mon/MonmapMonitor.cc

batrick · 2023-03-28T14:17:20Z

I think the commit could be split into three separate commits that solve separate issues.

* MonMapMonitor: do not propose changes on error in prepare_update/prepare_command

* MonMapMonitor: commit proposed map changes initiated by a `ceph mon` command before replying
  (This seems like a more important fix than the one above. It changes the monitor's handling of `ceph mon` commands that has existed since #6854)

I've broken this out into a separate commit but its changes end up modified again as part of mon/MonmapMonitor: do not propose on error in prepare_update.

* MonMapMonitor: do not propose changes when `mon enable msgr-2` command is called on monitors that already have msgr2 enabled

I restructured the code but I do not think it proposed when mon enable msgr-2 was a no-op. So I don't think this separate commit makes sense?

ajarr

No longer need "Also: correct a serious protocol error where the monitor would reply to
a command before the proposal is committed" in the commit message of 5440634" ?

ajarr · 2023-03-29T15:10:03Z

* MonMapMonitor: do not propose changes when `mon enable msgr-2` command is called on monitors that already have msgr2 enabled
I restructured the code but I do not think it proposed when mon enable msgr-2 was a no-op. So I don't think this separate commit makes sense?

Yes, makes sense.

batrick · 2023-03-29T15:12:24Z

No longer need "Also: correct a serious protocol error where the monitor would reply to a command before the proposal is committed" in the commit message of 5440634" ?

good point. Fixed

batrick · 2023-06-22T18:31:46Z

@yuriw please include in the next RADOS run.

batrick · 2023-06-22T18:31:53Z

jenkins test make check arm64

ljflores · 2023-07-12T18:23:29Z

Hey @batrick, I'm seeing some failures that look related:

Slow ops from a mon command that was changed in this PR
/a/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324842

2023-07-03T20:55:47.552 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:55:47.550+0000 7f838f917700 -1 mon.b@1(electing) e5685 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:55:52.552 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:55:52.550+0000 7f838f917700 -1 mon.b@1(electing) e5686 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:55:57.552 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:55:57.551+0000 7f838f917700 -1 mon.b@1(electing) e5687 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:56:02.553 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:56:02.550+0000 7f838f917700 -1 mon.b@1(electing) e5689 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:56:07.553 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:56:07.551+0000 7f838f917700 -1 mon.b@1(electing) e5690 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:56:12.266 INFO:tasks.workunit.client.0.smithi123.stderr://home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1: test_mon_mon:  rm -fr /tmp/cephtool.svX

Here was the test link:
http://pulpito.front.sepia.ceph.com/?branch=wip-yuri-testing-2023-06-23-0831

And here are a few more jobs that were affected:
http://pulpito.front.sepia.ceph.com/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324842/
http://pulpito.front.sepia.ceph.com/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324846/
http://pulpito.front.sepia.ceph.com/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324850/
http://pulpito.front.sepia.ceph.com/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324858/

ljflores · 2023-07-12T18:32:33Z

Go ahead and re-add "needs-qa" when it's ready for testing again!

batrick · 2023-07-13T19:46:14Z

@ljflores this is ready for QA again but please don't QA with Ilya's PR again. 😆

batrick · 2023-07-13T19:47:52Z

@rzarzynski are you on the hook for reviewing monitor PRs? This one has unfortunately gotten more complex. See mon: add context list for commit wait , mon: use wait_for_commit to reply. The reason for those two commits is to support mon/MonmapMonitor: wait for commit before reply .

batrick · 2023-07-14T20:54:35Z

So another added benefit I didn't plan for is that the commits mentioned in #50503 (comment) make the replies from the monitors significantly faster. For example, on a humble vstart cluster:

2023-07-13T15:37:17.196-0400 7fe7e26ea700  0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "osd pool create", "pool": "cephfs.b.meta"} v 0) v1
...
2023-07-13T15:37:17.278-0400 7fe7dfee5700 10 mon.a@0(leader) e1 refresh_from_paxos
2023-07-13T15:37:17.278-0400 7fe7dfee5700  0 log_channel(audit) log [INF] : from='mgr.4127 ' entity='mgr.x' cmd='[{"prefix": "osd pool create", "pool": "cephfs.b.meta"}]': finished
2023-07-13T15:37:17.278-0400 7fe7dfee5700  1 -- [v2:192.168.230.102:40693/0,v1:192.168.230.102:40694/0] --> [v2:192.168.230.102:40693/0,v1:192.168.230.102:40694/0] -- log(1 entries from seq 268 at 2023-07-13T15:37:17.279871-0400) v1 -- 0x5597f6e288c0 con 0x5597f1530c00
2023-07-13T15:37:17.278-0400 7fe7dfee5700  2 mon.a@0(leader) e1 send_reply 0x5597f6e1f320 0x5597f69baea0 mon_command_ack([{"prefix": "osd pool create", "pool": "cephfs.b.meta"}]=0 pool 'cephfs.b.meta' created v83) v1
...
2023-07-13T15:37:17.286-0400 7fe7dfee5700 10 mon.a@0(leader).osd e83 create_pending e 84
2023-07-13T15:37:17.286-0400 7fe7dfee5700 10 mon.a@0(leader).osd e83 update_logger

RTT is 82ms instead of what would normally be ~90ms. That's an ~9% improvement. Note replies are currently only sent after pending is created.

It can be hard to benchmark this because mon command routing can add latency. I have a test which creates a single RADOS session but I haven't yet figured out how to force it to connect to the leader for the benchmark.

This is useful for benchmarks particularly that require consistent rank choice (i.e. leader). Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

This will replace many uses of "wait_for_finished_proposal" where a reply is simply waiting for pending to commit. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

Once PaxosService::refresh is called, the commit is final and will not be rolled back. We should immediately send out any replies depending on that event. This avoids a problem where PaxosService::update_from_paxos indicates the mons need to re-bootstrap (because of a monmap change) and these pending reply contexts are dropped in PaxosService::restart. The cherry on top for this change is that mon commands enjoy a shorter RTT latency as replies are sent out faster. Using the new bench_commit.py script: main branch: min/max/mean/stddev: 0.015813/0.613919/0.031615/0.024087 min/max/mean/stddev: 0.015737/0.255008/0.031492/0.017930 min/max/mean/stddev: 0.014242/0.205763/0.031969/0.018022 min/max/mean/stddev: 0.014172/0.270256/0.032070/0.021079 min/max/mean/stddev: 0.017767/0.471187/0.032751/0.025317 this commit: min/max/mean/stddev: 0.010476/0.158475/0.026324/0.013662 min/max/mean/stddev: 0.016866/0.099403/0.027938/0.010508 min/max/mean/stddev: 0.014013/0.127512/0.026847/0.010340 min/max/mean/stddev: 0.013098/0.172725/0.028979/0.012998 min/max/mean/stddev: 0.016934/0.292218/0.029252/0.014904 About a 10-20% reduction in latency. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

If the monmap is changed, do not reply to command until committed! Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

Fixes: https://tracker.ceph.com/issues/58974 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

batrick · 2023-07-21T14:14:03Z

rebased + commit message changes

batrick · 2023-07-31T16:45:42Z

jenkins test api

ljflores · 2023-07-31T18:26:56Z

jenkins test api

ljflores

Looks good, I like the performance analysis you added!

rzarzynski

LGTM.

rzarzynski · 2023-08-01T09:34:23Z

src/mon/MonClient.cc

  _start_hunting();

+  if (rank == -1) {
+    rank = cct->_conf.get_val<int64_t>("mon_client_target_rank");


rzarzynski · 2023-08-01T09:47:15Z

src/mon/MonmapMonitor.cc

-  // we are returning to the user; do not propose.
+  if (propose) {
+    wait_for_commit(op, new Monitor::C_Command(mon, op, err, rs, get_last_committed() + 1));
+  } else {


rzarzynski · 2023-08-01T09:49:11Z

src/mon/MonmapMonitor.cc

    }
+    if (strategy == pending_map.strategy) {
+      err = 0;
+      goto reply_no_propose;


rzarzynski · 2023-08-01T10:10:31Z

~~Adding needs-qa~~. When it comes to backporting, I would prefer to bake these changes in main for a while.

rishabh-d-dave · 2023-09-11T14:18:42Z

@batrick If you haven't started testing this already, I can put it through testing in the run I am starting today/tomorrow.

batrick · 2023-09-11T15:20:38Z

@batrick If you haven't started testing this already, I can put it through testing in the run I am starting today/tomorrow.

I'll handle it, thanks @rishabh-d-dave

* refs/pull/50503/head: mon: do not change pending if strategy is unchanged mon/MonmapMonitor: do not propose on error in prepare_update mon/MonmapMonitor: wait for commit before reply mon: use wait_for_commit to reply mon: add context list for commit wait mon: remove unused method test/mon: add commit benchmark script mon/MonClient: provide config to target specific rank

batrick · 2023-09-15T16:10:44Z

Nothing out of the ordinary from this PR: https://tracker.ceph.com/projects/cephfs/wiki/Main#2023-Sep-12

I think we can merge this unless @ljflores @rzarzynski you would like more rados suite testing.

ljflores · 2023-09-21T15:39:06Z

Apologies @batrick, this was tested and rados-approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrellocomcpMEWaauy1825-wip-yuri3-testing-2023-08-15-0955

I missed adding a comment here.

batrick · 2023-09-21T15:51:35Z

Apologies @batrick, this was tested and rados-approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrellocomcpMEWaauy1825-wip-yuri3-testing-2023-08-15-0955

I missed adding a comment here.

Great, thanks Laura!

batrick added core mon labels Mar 13, 2023

batrick requested a review from a team as a code owner March 13, 2023 17:14

ajarr reviewed Mar 23, 2023

View reviewed changes

src/mon/MonmapMonitor.cc Show resolved Hide resolved

batrick force-pushed the i58974 branch from 3f5889b to 2445913 Compare March 28, 2023 14:11

batrick force-pushed the i58974 branch from 2445913 to 5440634 Compare March 28, 2023 15:04

ajarr approved these changes Mar 29, 2023

View reviewed changes

batrick force-pushed the i58974 branch 2 times, most recently from c9b7c07 to d91c42d Compare March 29, 2023 15:12

batrick force-pushed the i58974 branch from d91c42d to 2d5d542 Compare April 3, 2023 12:38

batrick added the needs-review label Apr 6, 2023

batrick added needs-qa wip-pdonnell-testing and removed needs-review labels Jun 22, 2023

ljflores added the wip-yuri-testing label Jun 23, 2023

ljflores removed needs-qa wip-yuri-testing labels Jul 12, 2023

idryomov mentioned this pull request Jul 12, 2023

mon/MonClient: resurrect original client_mount_timeout handling #52124

Merged

14 tasks

batrick force-pushed the i58974 branch from 2d5d542 to 1337926 Compare July 13, 2023 16:39

github-actions bot added the cephfs Ceph File System label Jul 13, 2023

batrick added 8 commits July 21, 2023 10:09

mon/MonClient: provide config to target specific rank

3480f96

This is useful for benchmarks particularly that require consistent rank choice (i.e. leader). Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

test/mon: add commit benchmark script

6580611

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

mon: remove unused method

fa7d15b

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

mon: add context list for commit wait

dc8c321

This will replace many uses of "wait_for_finished_proposal" where a reply is simply waiting for pending to commit. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

mon/MonmapMonitor: wait for commit before reply

99e9e59

If the monmap is changed, do not reply to command until committed! Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

mon/MonmapMonitor: do not propose on error in prepare_update

c0f3695

Fixes: https://tracker.ceph.com/issues/58974 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

mon: do not change pending if strategy is unchanged

27f1021

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>

batrick force-pushed the i58974 branch from 33b2c23 to 27f1021 Compare July 21, 2023 14:13

ljflores approved these changes Jul 31, 2023

View reviewed changes

ljflores requested a review from rzarzynski July 31, 2023 18:53

ljflores added the performance label Jul 31, 2023

rzarzynski reviewed Aug 1, 2023

View reviewed changes

rzarzynski removed the needs-review label Aug 1, 2023

ljflores added the wip-yuri3-testing label Aug 15, 2023

batrick added the wip-pdonnell-testing label Sep 9, 2023

batrick merged commit ab3e5ba into ceph:main Sep 21, 2023

batrick deleted the i58974 branch September 21, 2023 15:52

This was referenced Mar 22, 2024

reef: mon/MonmapMonitor: do not propose on error in prepare_update #56400

Merged

quincy: mon/MonmapMonitor: do not propose on error in prepare_update #56401

Closed

Conversation

batrick commented Mar 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

ajarr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

batrick commented Mar 28, 2023

Uh oh!

ajarr left a comment

Choose a reason for hiding this comment

Uh oh!

ajarr commented Mar 29, 2023

Uh oh!

batrick commented Mar 29, 2023

Uh oh!

batrick commented Jun 22, 2023

Uh oh!

batrick commented Jun 22, 2023

Uh oh!

ljflores commented Jul 12, 2023

Uh oh!

ljflores commented Jul 12, 2023

Uh oh!

batrick commented Jul 13, 2023

Uh oh!

batrick commented Jul 13, 2023

Uh oh!

batrick commented Jul 14, 2023

Uh oh!

batrick commented Jul 21, 2023

Uh oh!

batrick commented Jul 31, 2023

Uh oh!

ljflores commented Jul 31, 2023

Uh oh!

ljflores left a comment

Choose a reason for hiding this comment

Uh oh!

rzarzynski left a comment

Choose a reason for hiding this comment

Uh oh!

rzarzynski Aug 1, 2023

Choose a reason for hiding this comment

Uh oh!

rzarzynski Aug 1, 2023

Choose a reason for hiding this comment

Uh oh!

rzarzynski Aug 1, 2023

Choose a reason for hiding this comment

Uh oh!

rzarzynski commented Aug 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rishabh-d-dave commented Sep 11, 2023

Uh oh!

batrick commented Sep 11, 2023

Uh oh!

batrick commented Sep 15, 2023

Uh oh!

ljflores commented Sep 21, 2023

Uh oh!

batrick commented Sep 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

batrick commented Mar 13, 2023 •

edited

Loading

ajarr left a comment •

edited

Loading

rzarzynski commented Aug 1, 2023 •

edited

Loading