Skip to content

mon/MonmapMonitor: do not propose on error in prepare_update#50503

Merged
batrick merged 8 commits intoceph:mainfrom
batrick:i58974
Sep 21, 2023
Merged

mon/MonmapMonitor: do not propose on error in prepare_update#50503
batrick merged 8 commits intoceph:mainfrom
batrick:i58974

Conversation

@batrick
Copy link
Member

@batrick batrick commented Mar 13, 2023

Also: correct a serious protocol error where the monitor would reply to a command before the proposal is committed.

Fixes: https://tracker.ceph.com/issues/58974

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@batrick batrick requested a review from a team as a code owner March 13, 2023 17:14
Copy link
Contributor

@ajarr ajarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the commit could be split into three separate commits that solve separate issues.

  • MonMapMonitor: do not propose changes on error in prepare_update/prepare_command

  • MonMapMonitor: commit proposed map changes initiated by a ceph mon command before replying
    (This seems like a more important fix than the one above. It changes the monitor's handling of ceph mon commands that has existed since #6854)

  • MonMapMonitor: do not propose changes when mon enable msgr-2 command is called on monitors that already have msgr2 enabled

@batrick
Copy link
Member Author

batrick commented Mar 28, 2023

I think the commit could be split into three separate commits that solve separate issues.

* MonMapMonitor: do not propose changes on error in prepare_update/prepare_command

* MonMapMonitor: commit proposed map changes initiated by a `ceph mon` command before replying
  (This seems like a more important fix than the one above. It changes the monitor's handling of `ceph mon` commands that has existed since #6854)

I've broken this out into a separate commit but its changes end up modified again as part of mon/MonmapMonitor: do not propose on error in prepare_update.

* MonMapMonitor: do not propose changes when `mon enable msgr-2` command is called on monitors that already have msgr2 enabled

I restructured the code but I do not think it proposed when mon enable msgr-2 was a no-op. So I don't think this separate commit makes sense?

Copy link
Contributor

@ajarr ajarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer need "Also: correct a serious protocol error where the monitor would reply to
a command before the proposal is committed" in the commit message of 5440634" ?

@ajarr
Copy link
Contributor

ajarr commented Mar 29, 2023

* MonMapMonitor: do not propose changes when `mon enable msgr-2` command is called on monitors that already have msgr2 enabled

I restructured the code but I do not think it proposed when mon enable msgr-2 was a no-op. So I don't think this separate commit makes sense?

Yes, makes sense.

@batrick batrick force-pushed the i58974 branch 2 times, most recently from c9b7c07 to d91c42d Compare March 29, 2023 15:12
@batrick
Copy link
Member Author

batrick commented Mar 29, 2023

No longer need "Also: correct a serious protocol error where the monitor would reply to a command before the proposal is committed" in the commit message of 5440634" ?

good point. Fixed

@batrick
Copy link
Member Author

batrick commented Jun 22, 2023

@yuriw please include in the next RADOS run.

@batrick
Copy link
Member Author

batrick commented Jun 22, 2023

jenkins test make check arm64

@ljflores
Copy link
Member

Hey @batrick, I'm seeing some failures that look related:

Slow ops from a mon command that was changed in this PR
/a/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324842

2023-07-03T20:55:47.552 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:55:47.550+0000 7f838f917700 -1 mon.b@1(electing) e5685 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:55:52.552 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:55:52.550+0000 7f838f917700 -1 mon.b@1(electing) e5686 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:55:57.552 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:55:57.551+0000 7f838f917700 -1 mon.b@1(electing) e5687 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:56:02.553 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:56:02.550+0000 7f838f917700 -1 mon.b@1(electing) e5689 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:56:07.553 INFO:tasks.ceph.mon.b.smithi123.stderr:2023-07-03T20:56:07.551+0000 7f838f917700 -1 mon.b@1(electing) e5690 get_health_metrics reporting 1 slow ops, oldest is mon_command({"prefix": "mon set election_strategy", "strategy": "disallow"} v 0)
2023-07-03T20:56:12.266 INFO:tasks.workunit.client.0.smithi123.stderr://home/ubuntu/cephtest/clone.client.0/qa/workunits/cephtool/test.sh:1: test_mon_mon:  rm -fr /tmp/cephtool.svX

Here was the test link:
http://pulpito.front.sepia.ceph.com/?branch=wip-yuri-testing-2023-06-23-0831

And here are a few more jobs that were affected:
http://pulpito.front.sepia.ceph.com/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324842/
http://pulpito.front.sepia.ceph.com/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324846/
http://pulpito.front.sepia.ceph.com/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324850/
http://pulpito.front.sepia.ceph.com/yuriw-2023-07-03_15:28:45-rados-wip-yuri-testing-2023-06-23-0831-distro-default-smithi/7324858/

@ljflores
Copy link
Member

Go ahead and re-add "needs-qa" when it's ready for testing again!

@batrick
Copy link
Member Author

batrick commented Jul 13, 2023

@ljflores this is ready for QA again but please don't QA with Ilya's PR again. 😆

@batrick
Copy link
Member Author

batrick commented Jul 13, 2023

@rzarzynski are you on the hook for reviewing monitor PRs? This one has unfortunately gotten more complex. See mon: add context list for commit wait , mon: use wait_for_commit to reply. The reason for those two commits is to support mon/MonmapMonitor: wait for commit before reply .

@batrick
Copy link
Member Author

batrick commented Jul 14, 2023

So another added benefit I didn't plan for is that the commits mentioned in #50503 (comment) make the replies from the monitors significantly faster. For example, on a humble vstart cluster:

2023-07-13T15:37:17.196-0400 7fe7e26ea700  0 mon.a@0(leader) e1 handle_command mon_command({"prefix": "osd pool create", "pool": "cephfs.b.meta"} v 0) v1
...
2023-07-13T15:37:17.278-0400 7fe7dfee5700 10 mon.a@0(leader) e1 refresh_from_paxos
2023-07-13T15:37:17.278-0400 7fe7dfee5700  0 log_channel(audit) log [INF] : from='mgr.4127 ' entity='mgr.x' cmd='[{"prefix": "osd pool create", "pool": "cephfs.b.meta"}]': finished
2023-07-13T15:37:17.278-0400 7fe7dfee5700  1 -- [v2:192.168.230.102:40693/0,v1:192.168.230.102:40694/0] --> [v2:192.168.230.102:40693/0,v1:192.168.230.102:40694/0] -- log(1 entries from seq 268 at 2023-07-13T15:37:17.279871-0400) v1 -- 0x5597f6e288c0 con 0x5597f1530c00
2023-07-13T15:37:17.278-0400 7fe7dfee5700  2 mon.a@0(leader) e1 send_reply 0x5597f6e1f320 0x5597f69baea0 mon_command_ack([{"prefix": "osd pool create", "pool": "cephfs.b.meta"}]=0 pool 'cephfs.b.meta' created v83) v1
...
2023-07-13T15:37:17.286-0400 7fe7dfee5700 10 mon.a@0(leader).osd e83 create_pending e 84
2023-07-13T15:37:17.286-0400 7fe7dfee5700 10 mon.a@0(leader).osd e83 update_logger

RTT is 82ms instead of what would normally be ~90ms. That's an ~9% improvement. Note replies are currently only sent after pending is created.

It can be hard to benchmark this because mon command routing can add latency. I have a test which creates a single RADOS session but I haven't yet figured out how to force it to connect to the leader for the benchmark.

batrick added 8 commits July 21, 2023 10:09
This is useful for benchmarks particularly that require consistent rank
choice (i.e. leader).

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This will replace many uses of "wait_for_finished_proposal" where a
reply is simply waiting for pending to commit.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Once PaxosService::refresh is called, the commit is final and will not
be rolled back. We should immediately send out any replies depending on
that event. This avoids a problem where PaxosService::update_from_paxos
indicates the mons need to re-bootstrap (because of a monmap change) and
these pending reply contexts are dropped in PaxosService::restart.

The cherry on top for this change is that mon commands enjoy a shorter
RTT latency as replies are sent out faster. Using the new
bench_commit.py script:

main branch:
        min/max/mean/stddev: 0.015813/0.613919/0.031615/0.024087
        min/max/mean/stddev: 0.015737/0.255008/0.031492/0.017930
        min/max/mean/stddev: 0.014242/0.205763/0.031969/0.018022
        min/max/mean/stddev: 0.014172/0.270256/0.032070/0.021079
        min/max/mean/stddev: 0.017767/0.471187/0.032751/0.025317

this commit:
        min/max/mean/stddev: 0.010476/0.158475/0.026324/0.013662
        min/max/mean/stddev: 0.016866/0.099403/0.027938/0.010508
        min/max/mean/stddev: 0.014013/0.127512/0.026847/0.010340
        min/max/mean/stddev: 0.013098/0.172725/0.028979/0.012998
        min/max/mean/stddev: 0.016934/0.292218/0.029252/0.014904

About a 10-20% reduction in latency.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
If the monmap is changed, do not reply to command until committed!

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick
Copy link
Member Author

batrick commented Jul 21, 2023

rebased + commit message changes

@batrick
Copy link
Member Author

batrick commented Jul 31, 2023

jenkins test api

1 similar comment
@ljflores
Copy link
Member

jenkins test api

Copy link
Member

@ljflores ljflores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I like the performance analysis you added!

Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

_start_hunting();

if (rank == -1) {
rank = cct->_conf.get_val<int64_t>("mon_client_target_rank");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

// we are returning to the user; do not propose.
if (propose) {
wait_for_commit(op, new Monitor::C_Command(mon, op, err, rs, get_last_committed() + 1));
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

}
if (strategy == pending_map.strategy) {
err = 0;
goto reply_no_propose;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK.

@rzarzynski
Copy link
Contributor

rzarzynski commented Aug 1, 2023

Adding needs-qa. When it comes to backporting, I would prefer to bake these changes in main for a while.

@rishabh-d-dave
Copy link
Contributor

@batrick If you haven't started testing this already, I can put it through testing in the run I am starting today/tomorrow.

@batrick
Copy link
Member Author

batrick commented Sep 11, 2023

@batrick If you haven't started testing this already, I can put it through testing in the run I am starting today/tomorrow.

I'll handle it, thanks @rishabh-d-dave

batrick added a commit to batrick/ceph that referenced this pull request Sep 12, 2023
* refs/pull/50503/head:
	mon: do not change pending if strategy is unchanged
	mon/MonmapMonitor: do not propose on error in prepare_update
	mon/MonmapMonitor: wait for commit before reply
	mon: use wait_for_commit to reply
	mon: add context list for commit wait
	mon: remove unused method
	test/mon: add commit benchmark script
	mon/MonClient: provide config to target specific rank
@batrick
Copy link
Member Author

batrick commented Sep 15, 2023

Nothing out of the ordinary from this PR: https://tracker.ceph.com/projects/cephfs/wiki/Main#2023-Sep-12

I think we can merge this unless @ljflores @rzarzynski you would like more rados suite testing.

@ljflores
Copy link
Member

Apologies @batrick, this was tested and rados-approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrellocomcpMEWaauy1825-wip-yuri3-testing-2023-08-15-0955

I missed adding a comment here.

@batrick
Copy link
Member Author

batrick commented Sep 21, 2023

Apologies @batrick, this was tested and rados-approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrellocomcpMEWaauy1825-wip-yuri3-testing-2023-08-15-0955

I missed adding a comment here.

Great, thanks Laura!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants