Skip to content

osd: don't send stale hb msgr's addresses in MOSDBoot#50422

Merged
rzarzynski merged 1 commit intoceph:mainfrom
rzarzynski:wip-bug-58915
Feb 12, 2024
Merged

osd: don't send stale hb msgr's addresses in MOSDBoot#50422
rzarzynski merged 1 commit intoceph:mainfrom
rzarzynski:wip-bug-58915

Conversation

@rzarzynski
Copy link
Contributor

@rzarzynski rzarzynski commented Mar 7, 2023

See comments in the ticket for the RCA.

NOTE: we can't just hold a reference to what get_myaddrs() returns as the safe_item_history is involved:

template<class T>
class safe_item_history {
  //...
  T *current = nullptr;

  // ...
  const T& operator=(const T& other) {
    std::lock_guard l(lock);
    history.push_back(other);
    current = &history.back();
    return *current;
  }

Fixes: https://tracker.ceph.com/issues/58915

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

See comments in the ticket for the RCA.

NOTE: we can't just hold a reference to what `get_myaddrs()`
returns as the `safe_item_history` is involved:

```cpp
template<class T>
class safe_item_history {
  //...
  T *current = nullptr;

  // ...
  const T& operator=(const T& other) {
    std::lock_guard l(lock);
    history.push_back(other);
    current = &history.back();
    return *current;
  }
```

Fixes: https://tracker.ceph.com/issues/58915
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
@rzarzynski rzarzynski requested a review from a team as a code owner March 7, 2023 12:52
@github-actions github-actions bot added the core label Mar 7, 2023
@ljflores
Copy link
Member

ljflores commented Mar 7, 2023

jenkins test make check

@ljflores
Copy link
Member

ljflores commented Apr 5, 2023

@rzarzynski I had to open this new tracker, which reproduced twice: https://tracker.ceph.com/issues/59333

Can you check that this is not related to your PR? If it's not, feel free to merge.

Rados suite review:
https://pulpito.ceph.com/?branch=wip-yuri11-testing-2023-03-28-0950
https://pulpito.ceph.com/?branch=wip-yuri11-testing-2023-03-31-1108

Failures, unrelated:
1. https://tracker.ceph.com/issues/58585
2. https://tracker.ceph.com/issues/58946
3. https://tracker.ceph.com/issues/58265
4. https://tracker.ceph.com/issues/59271
5. https://tracker.ceph.com/issues/59057
6. https://tracker.ceph.com/issues/59333
7. https://tracker.ceph.com/issues/59334
8. https://tracker.ceph.com/issues/59335

Details:
1. rook: failed to pull kubelet image - Ceph - Orchestrator
2. cephadm: KeyError: 'osdspec_affinity' - Ceph - Orchestrator
3. TestClsRbd.group_snap_list_max_read failure during upgrade/parallel tests - Ceph - RBD
4. mon: FAILED ceph_assert(osdmon()->is_writeable()) - Ceph - RADOS
5. rados/test_envlibrados_for_rocksdb.sh: No rule to make target 'rocksdb_env_librados_test' on centos 8 - Ceph - RADOS
6. PgScrubber: timeout on reserving replicas - Ceph - RADOS
7. test_pool_create_with_quotas: Timed out after 60s and 0 retries - Ceph - Mgr - Dashboard
8. Found coredumps on smithi related to sqlite3 - Ceph - Cephsqlite

@github-actions
Copy link

github-actions bot commented Sep 7, 2023

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Sep 7, 2023
@github-actions
Copy link

github-actions bot commented Oct 7, 2023

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Oct 7, 2023
@ljflores ljflores reopened this Feb 7, 2024
@ljflores
Copy link
Member

ljflores commented Feb 7, 2024

@rzarzynski ping

@github-actions github-actions bot removed the stale label Feb 7, 2024
@rzarzynski
Copy link
Contributor Author

Thanks for resurrecting this, @ljflores! Let's get it intomain, though I don't perceive the bug a deal breaker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants