Skip to content

LogMonitor: set no_reply for forward MLog commands#61933

Merged
SrinivasaBharath merged 1 commit intoceph:mainfrom
NitzanMordhai:wip-nitzan-logmonitor-forward-msg-noreply
Mar 7, 2025
Merged

LogMonitor: set no_reply for forward MLog commands#61933
SrinivasaBharath merged 1 commit intoceph:mainfrom
NitzanMordhai:wip-nitzan-logmonitor-forward-msg-noreply

Conversation

@NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Feb 20, 2025

On streach mod clusters we can see slow ops when
removing and adding osds with --zap --force when osds connected to peon monitor and forwarding the MLog to leader. the no_reply is set only when we are connected to the leader, this fix will add also the other option - so no_reply set anyway.

Fixes: https://tracker.ceph.com/issues/54489

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

On streach mod clusters we can see slow ops when
removing and adding osds with --zap --force when osds
connected to peon monitor and forwarding the MLog to leader.
the no_reply is set only when we are connected to the leader,
this fix will add also the other option - so no_reply set anyway.

Fixes: https://tracker.ceph.com/issues/54489
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
Copy link
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with just a nit on indentation.

done:
mon.no_reply(op);
return true;
done:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's keep the indentation as it was previously.

return true;
done:
mon.no_reply(op);
return (!num_new);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, for the nothing new case (num_new == 0), the function returns true.

goto done;
}

return false;
Copy link
Contributor

@rzarzynski rzarzynski Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just writing down my understanding)

For this case the preprocess_log() can be executed twice: once by a peon, once by leader. This is directed by the PaxosService:

  // preprocess
  if (preprocess_query(op))
    return true;  // easy!
    
  // leader?
  if (!mon.is_leader()) {
    mon.forward_request_leader(op);
    return true;
  } 

Monitor::no_reply() doesn't do much on peons, but on leader it generates the MRoute which absence is the crux of the bug.

void Monitor::no_reply(MonOpRequestRef op)
{   
  MonSession *session = op->get_session();
  Message *req = op->get_req();
    
  if (session->proxy_con) {
    dout(10) << "no_reply to " << req->get_source_inst()
             << " via " << session->proxy_con->get_peer_addr()
             << " for request " << *req << dendl;
    session->proxy_con->send_message(new MRoute(session->proxy_tid, NULL));
    op->mark_event("no_reply: send routed request");
  } else {
    dout(10) << "no_reply to " << req->get_source_inst()
             << " " << *req << dendl;
    op->mark_event("no_reply");
  } 
}   

OK!

@Naveenaidu
Copy link
Contributor

RADOS Approved: https://tracker.ceph.com/issues/70274#note-5

@SrinivasaBharath
Copy link
Contributor

jenkins test api

@SrinivasaBharath SrinivasaBharath merged commit b1e4a2b into ceph:main Mar 7, 2025
11 of 15 checks passed
NitzanMordhai pushed a commit to NitzanMordhai/ceph that referenced this pull request Mar 10, 2025
…r-forward-msg-noreply

LogMonitor: set no_reply for forward MLog commands
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants