Skip to content

AsyncMessenger: Don't decrease l_msgr_active_connections if it is negative#57951

Merged
yuriw merged 1 commit intoceph:mainfrom
mohit84:issue_66231
Jul 23, 2024
Merged

AsyncMessenger: Don't decrease l_msgr_active_connections if it is negative#57951
yuriw merged 1 commit intoceph:mainfrom
mohit84:issue_66231

Conversation

@mohit84
Copy link
Contributor

@mohit84 mohit84 commented Jun 10, 2024

The counter (msgr_active_connections) can be an anomaly in case if a server daemon is blocked on accept_conn and the client sends a disconnect request to the server daemon. As the server receives an unregister_conn request it decrease the counter without checking the connection status so decreases the counter only if the previous value is positive.

Fixes: https://tracker.ceph.com/issues/66231
Signed-off-by: Mohit Agrawal moagrawa@redhat.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@mohit84 mohit84 requested a review from a team as a code owner June 10, 2024 05:59
The counter (msgr_active_connections) can be an anomaly in case
if the counter is decrese before increase and initial value is 0. It can be
happen while the server daemon is blocked on accept_conn and client sends
a disconnect request.To avoid the situation increase the counter at first
step in add_accept during accepting a request so that the counter would not
be 0 during the decrease operation.

Fixes: https://tracker.ceph.com/issues/66231
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
@YiteGu
Copy link
Member

YiteGu commented Jun 12, 2024

I have verified this patch is successful.

@mohit84
Copy link
Contributor Author

mohit84 commented Jun 12, 2024

I have verified this patch is successful.

Thanks Yite for validate the same.

listen_addr.is_msgr2(), false);
conn->accept(std::move(cli_socket), listen_addr, peer_addr);
accepting_conns.insert(conn);
w->get_perf_counter()->inc(l_msgr_active_connections);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the change we were decreasing the counter very lately (when connection was only in deleted_conns) while increasing very lately (after the exchanging a few frames).

After the change l_msgr_active_connections is increased early, just after accept. This extends the boundaries a connection is considered active.

@ljflores
Copy link
Member

@yuriw yuriw merged commit c9f8088 into ceph:main Jul 23, 2024
NitzanMordhai pushed a commit to NitzanMordhai/ceph that referenced this pull request Aug 1, 2024
AsyncMessenger: Don't decrease l_msgr_active_connections if it is negative

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants