monclient: try to resend the mon commands to the same monitor if avai…#57718
Conversation
|
@NitzanMordhai looks like this is in draft form, but we saw this on bug scrub so please assign a reviewer whenever it's ready! |
19f3edc to
ee8e66a
Compare
| cmd->sent_name = monmap.get_name(active_con->get_con()->get_peer_addr()); | ||
| } else if (active_con && cmd->sent_name.length() && | ||
| cmd->sent_name != monmap.get_name(active_con->get_con()->get_peer_addr()) && | ||
| monmap.contains(cmd->sent_name)) { |
There was a problem hiding this comment.
Could simplify to
(active_con && cmd->sent_name != monmap.get_name(active_con->get_con()->get_peer_addr()))
There was a problem hiding this comment.
but we can have situation that this mon disconnected and is no longer part of the monmap, we will try to reopen session for it
There was a problem hiding this comment.
You have if (!monmap.contains(cmd->sent_name)) above, so technically you don't need the opposite predicate monmap.contains(cmd_sent_name) in the else statement
…lable When we have a socket failure or connection issue, we may send a mon command and never check if it completed. If we resend the command to another monitor, the resent command may complete before the first sent command. This can cause users to send the command twice, which can lead to issues in automated environments. For example: We have 2 monitors: mon.a and mon.b 1. Send command to delete pool - monclient targets mon.a 2. A socket failure occurs, and mon.a has a delay in response 3. Monclient hunts for another monitor to resend the delete pool command and finds mon.b 4. Mon.b removes the pool and sends an acknowledgment 5. The user script now sends a create pool command, but mon.a now sends the acknowledgment for the pool delete from step #1 We end up without a pool, as mon.a deleted it. The mon_client_hunt_on_resent configuration was added to control the behavior of retrying commands on monitor connection failures. By default, this option is enabled to prevent situations where a command is retried on the same monitor, potentially missing better monitor candidates. Clients experiencing specific conditions that require retrying on the same monitor can disable this feature by setting the configuration to false. Fixes: https://tracker.ceph.com/issues/63789 Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
mon_clent_hunt_on_resend is default to true, we want to disable it and let that test resend the commands to the same monitor that was failed. Fixes: https://tracker.ceph.com/issues/63789 Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
ee8e66a to
08112e6
Compare
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
|
jenkins retest this please |
|
Hi @bill-scales, would you mind re-review? |
bill-scales
left a comment
There was a problem hiding this comment.
Happy to approve, just the compile error needs fixing
Co-authored-by: Bill Scales <156200352+bill-scales@users.noreply.github.com> Signed-off-by: NitzanMordhai <97529641+NitzanMordhai@users.noreply.github.com>
|
jenkins retest this please |
|
Hi all, apologies for the delay in QA approval. There was a problem with another PR in the batch, which took some time to work out. But we are retesting and will have an update soon. |
…lable
When we have a socket failure or connection issue, we may send a mon command
and never check if it completed. If we resend the command to another monitor,
the resent command may complete before the first sent command. This can cause
users to send the command twice, which can lead to issues in automated
environments. For example:
We have 2 monitors: mon.a and mon.b
and finds mon.b
acknowledgment for the pool delete from step 1
We end up without a pool, as mon.a deleted it.
The mon_client_hunt_on_resent configuration was added to control the behavior of
retrying commands on monitor connection failures.
By default, this option is enabled to prevent situations where a command is retried
on the same monitor, potentially missing better monitor candidates.
Clients experiencing specific conditions that require retrying on the same monitor
can disable this feature by setting the configuration to false.
Fixes: https://tracker.ceph.com/issues/63789
Signed-off-by: Nitzan Mordechai nmordec@redhat.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e