Project

General

Profile

Actions

Bug #61948

closed

Failed assert "pg_upmap_primaries.empty()" in the read balancer

Added by Laura Flores over 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
reef,squid
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
Fixed In:
v19.3.0-3736-gbaaf7c85a2
Released In:
v20.2.0~2389
Upkeep Timestamp:
2025-11-01T01:21:08+00:00

Description

Reported by Stefan:

This bug was hit on the RC (v18.1.2).

ceph crash info
2023-07-10T19:02:42.116150Z_8061077a-2257-49c7-8088-c94a5a580f6a
{
     "assert_condition": "pg_upmap_primaries.empty()",
     "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.1.2/rpm/el8/BUILD/ceph-18.1.2/src/osd/OSDMap.cc",
     "assert_func": "void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const",
     "assert_line": 3239,
     "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.1.2/rpm/el8/BUILD/ceph-18.1.2/src/osd/OSDMap.cc: In function 'void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7f95f67cb700 time 2023-07-10T19:02:42.106910+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.1.2/rpm/el8/BUILD/ceph-18.1.2/src/osd/OSDMap.cc: 3239: FAILED ceph_assert(pg_upmap_primaries.empty())\n",
     "assert_thread_name": "ms_dispatch",
     "backtrace": [
         "/lib64/libpthread.so.0(+0x12cf0) [0x7f9601c12cf0]",
         "gsignal()",
         "abort()",
         "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f96043f0e15]",
         "/usr/lib64/ceph/libceph-common.so.2(+0x2a9f81) [0x7f96043f0f81]",
         "(OSDMap::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x1229) [0x7f96048a4a19]",
         "(OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xe0) [0x55d71adf3d30]",
         "(OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x326) [0x55d71ae27616]",
         "(OSDMonitor::build_latest_full(unsigned long)+0x306) [0x55d71ae27986]",
         "(OSDMonitor::check_osdmap_sub(Subscription*)+0x7a) [0x55d71ae2b81a]",
         "(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xeed) [0x55d71ac4b25d]",
         "(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x7d6) [0x55d71ac5fea6]",
         "(Monitor::_ms_dispatch(Message*)+0x406) [0x55d71ac610a6]",
         "(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5d) [0x55d71ac91bcd]",
         "(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x478) [0x7f9604669c08]",
         "(DispatchQueue::entry()+0x50f) [0x7f9604666daf]",
         "(DispatchQueue::DispatchThread::entry()+0x11) [0x7f960472d391]",
         "/lib64/libpthread.so.0(+0x81ca) [0x7f9601c081ca]",
         "clone()" 
     ],
     "ceph_version": "18.1.2",
     "crash_id": "2023-07-10T19:02:42.116150Z_8061077a-2257-49c7-8088-c94a5a580f6a",
     "entity_name": "mon.reef03",
     "os_id": "centos",
     "os_name": "CentOS Stream",
     "os_version": "8",
     "os_version_id": "8",
     "process_name": "ceph-mon",
     "stack_sig": "36e64bb49053b7de85c9cf46c6739fe42dd443b8320310aef2318e06c48ee87e",
     "timestamp": "2023-07-10T19:02:42.116150Z",
     "utsname_hostname": "reef03",
     "utsname_machine": "x86_64",
     "utsname_release": "5.4.0-147-generic",
     "utsname_sysname": "Linux",
     "utsname_version": "#164-Ubuntu SMP Tue Mar 21 14:23:17 UTC 2023" 
}

Error EINVAL: osd.7 is not in acting set for pg 2.23

But osd.7 was in up and acting set:

2.23          36                   0         0          0        0
150994944            0           0    53         0        53
active+clean  2023-07-10T14:33:28.050243+0000    48'53    76:178
[4,1,7]           4  [4,1,7]               4         0'0
2023-07-10T13:47:36.917296+0000              0'0
2023-07-10T13:47:36.917296+0000              0                    0
periodic scrub scheduled @ 2023-07-11T15:01:20.025433+0000
   0                0

Details that led to the bug:
- Installed v18.1.2 cephadm
- Deployed a Ceph cluster with cephadm, three nodes (VMs) with 3 OSDs each, which installed Ceph v18.0.0-4795-g2f6e4f7d by default
- Set the min_compat_client to reef
- Hit the crash
- Redeployed the cluster to v18.1.2
- Deleted the pool, recreated it, and repeated the above steps
- Mon daemon crashed again

Affected logs attached.


Files

osd.6.log (90.9 KB) osd.6.log Laura Flores, 07/10/2023 07:39 PM
reef.mon3.log (90.9 KB) reef.mon3.log Laura Flores, 07/10/2023 07:39 PM
om (8.17 KB) om Stefan Kooman, 07/10/2023 07:55 PM
om_old (6.39 KB) om_old Stefan Kooman, 07/10/2023 07:59 PM

Related issues 6 (1 open5 closed)

Related to RADOS - Bug #66260: mon, osd, *: require-min-compat-client is not really honoredNewRadoslaw Zarzynski

Actions
Related to RADOS - Bug #66329: mon: {rm-,}pg-upmap-primary should require the feature support also from mons and osdsResolvedRadoslaw Zarzynski

Actions
Related to RADOS - Bug #66285: osd, mon: decoders of OSDMap use old version for comparison of with struct_compat of DECODE_STARTResolvedRadoslaw Zarzynski

Actions
Has duplicate RADOS - Bug #66274: Online balancing with upmap-read mode performed without setting "min_compat_client" to reefDuplicateLaura Flores

Actions
Copied to RADOS - Backport #66298: squid: Failed assert "pg_upmap_primaries.empty()" in the read balancerResolvedRadoslaw ZarzynskiActions
Copied to RADOS - Backport #66299: reef: Failed assert "pg_upmap_primaries.empty()" in the read balancerResolvedRadoslaw ZarzynskiActions
Actions #1

Updated by Laura Flores over 2 years ago

  • Assignee set to Laura Flores

Updated by Stefan Kooman over 2 years ago

In the attachment the first osd map I used (om_old), and the osd map used the second time with version 18.1.2 (om).

Actions #3

Updated by Laura Flores over 2 years ago

Thanks Stefan!

Actions #4

Updated by Laura Flores over 2 years ago

The crash occurred during a step when the osdmap is encoded.
Since the read balancer is a new feature in reef, the encoder is required to be at the most updated version (10). However, this cluster detected an older encoder version (<10), which assumes there are no pg upmap primary entries. However, there were pg upmap primary entries in the osdmap, and this caused a crash.
https://github.com/ceph/ceph/blob/v18.1.2/src/osd/OSDMap.cc#L3236-L3240

    if (v >= 10) {
      encode(pg_upmap_primaries, bl);
    } else {
      ceph_assert(pg_upmap_primaries.empty());
    }

The information to figure out is why this cluster detected older features and settled on an older encoder version.

This may be relevant in the mon log:

  -341> 2023-07-10T19:02:16.674+0000 7f9604fb2c80  0 mon.reef03@-1(???).osd e184 crush map has features 3314933000854323200, adjusting msgr requires
  -340> 2023-07-10T19:02:16.674+0000 7f9604fb2c80  0 mon.reef03@-1(???).osd e184 crush map has features 2738472248550883328, adjusting msgr requires
  -339> 2023-07-10T19:02:16.674+0000 7f9604fb2c80  0 mon.reef03@-1(???).osd e184 crush map has features 2738472248550883328, adjusting msgr requires
  -338> 2023-07-10T19:02:16.674+0000 7f9604fb2c80  0 mon.reef03@-1(???).osd e184 crush map has features 2738472248550883328, adjusting msgr requires

Actions #5

Updated by Stefan Kooman over 2 years ago

I have created a new cluster. Upgraded it to 18.1.2 (from 18.0.0) before I created any OSDs (and pool/image). I redid the tests. The OSDs asserted again (same like before). The monitor(s) did hit an assert this time. I had debug logging for the monitors set to 20/20 ... but not for the OSDs. I will try again with debug set to 20/20 for the OSDs. Strangely ceph-crash did not report the OSDs had crashed (it did in the other cluster). But that might be another issue / bug...

Pretty sure (I checked the timestamps) that the "crush map has features 2738472248550883328, adjusting msgr requires" logging is a result of "ceph osd set-require-min-compat-client reef"

Actions #6

Updated by Stefan Kooman over 2 years ago

I deleted all OSDs, and recreated them. Repeated the tests, but could not reproduce. So there must be (a) condition(s) that trigger it, but not sure what. I hope that my observations proof to be helpful.

Actions #7

Updated by Laura Flores over 2 years ago

Hey Stefan,

Thanks for your analysis. This failure indicates that there is a client-server version mismatch (could be a deeper issue, but this is what the failure points to). If you have mon and osd logs with a 20/20 logging level, that would definitely be helpful.

In the meantime, I am working to see if I can reproduce your issue on my end.

Actions #8

Updated by Stefan Kooman over 2 years ago

I recreated yet another cluster and enabled debug logging for osd / mon, but I could not reproduce.

Actions #9

Updated by Laura Flores over 2 years ago

Thanks Stefan. We're continuing to look into it. If you're able to reproduce the issue again, it may help to dump your mon_status, as it contains a feature map. There is some kind of feature incompatibility going on here, so that could be telling.

`ceph tell mon.* mon_status`

Or use this if the mons are not accessible:
`ceph --admin-daemon /var/run/ceph/<cluster id>/ceph-mon.<mon id>.asok mon_status`

Actions #10

Updated by Laura Flores over 2 years ago

Hi Stefan,

Can you tell me whether just one mon crashed when you applied the upmaps, or did they all crash?

Thanks,
Laura

Actions #11

Updated by Stefan Kooman over 2 years ago

Can you tell me whether just one mon crashed when you applied the upmaps, or did they all crash?

The second cluster only had one monitor crash (I don't have logs anymore of the first cluster).

Actions #12

Updated by Radoslaw Zarzynski over 2 years ago

  • Status changed from New to Need More Info

Need for info for now but in longer term we might close if reproduction is impossible.

Actions #17

Updated by Laura Flores almost 2 years ago

  • Affected Versions v18.2.2 added
Actions #18

Updated by Laura Flores almost 2 years ago · Edited

Upgrading from 18.2.0 to 18.2.1 does not reproduce the bug.

I modified the upgrade path on this branch:
https://github.com/ceph/ceph/compare/reef...ljflores:ceph:wip-reef-tracker-61948-upgrade-18.2.0-18.2.1?expand=1

Tested here:
https://pulpito.ceph.com/lflores-2024-05-23_06:02:47-upgrade:reef-p2p:reef-p2p-parallel-reef-distro-default-smithi/

This checks out with the telemetry links above, which only track crashes on 18.2.2 clusters.

Likely the first reported instance of this (on the RC) resulted from a bad installation, but the crashes we're seeing now would affect anyone upgrading to 18.2.2.

Actions #19

Updated by Christoffer Anselm almost 2 years ago

These steps should reproduce the issue, though I have not verified that as I don't have a free test-cluster available atm:

1. set the min-compat-client to reef on a v18.2.1 cluster
2. applied some pg-upmap-primary mappings
3. upgraded some, but not all osds/mons to v18.2.2
4. observe (only) the upgraded services crashing

Based on the conversation in https://github.com/ceph/ceph/pull/55712 I expect that the crashes would stop once all ceph services are upgraded to 18.2.2.

---
For readers only looking for a way to upgrade their cluster:

If you are not running reef yet, just upgrade directly to 18.2.2 as that should skip the releases causing the issue.

Otherwise the following steps should allow working around the issue:

1. Ensure your ceph cluster is healthy and running ceph reef (e.g. 18.2.1)
2. `ceph osd dump`
3. `ceph osd rm-pg-upmap-primary <each upmap primary id from dump>`
4. wait until cluster finishes backfills and is healthy
5. upgrade to 18.2.2

Just be sure that no new upmap primaries are created until all services have been upgraded as that will cause the 18.2.2 services to crash when receiving updates from services running the older reef release.

Actions #20

Updated by Laura Flores almost 2 years ago

  • Status changed from Need More Info to In Progress
Actions #21

Updated by Radoslaw Zarzynski almost 2 years ago · Edited

Laura Flores wrote in #note-13:

Another occurrence here: https://github.com/ceph/ceph/pull/55712#issuecomment-2112585048

I have mixed feelings towards reusing this tracker. Although the underlying mechanism is the same, there is huge difference in reproduction rate between the original report and the recent one.

Before 18.2.2 it was possible to hit this assertion only if a really ancient peer (typically client) bumped the v down to be less than 10:

void OSDMap::encode(ceph::buffer::list& bl, uint64_t features) const
{
  // ...
  // meta-encoding: how we include client-used and osd-specific data
  ENCODE_START(8, 7, bl);

  {
    // NOTE: any new encoding dependencies must be reflected by
    // SIGNIFICANT_FEATURES
    uint8_t v = 10;
    if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
      v = 3;
    } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
      v = 6;
    } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
      v = 7;
    } /* else if (!HAVE_FEATURE(features, SERVER_REEF)) {
      v = 9;
    } */

    // ...

    if (v >= 10) {
      encode(pg_upmap_primaries, bl);
    } else {
      ceph_assert(pg_upmap_primaries.empty());
    }
    ENCODE_FINISH(bl); // client-usable data
  }

Since 18.2.2 (which has the fix for https://tracker.ceph.com/issues/63389) the same, assert-enriched code path in encoder can be taken also for pre-Reef decoders.

However, even for 18.2.0 and 18.2.1 it's possible to run into the situation the assert was intended for – having clients that don't understand the feature in a cluster with pg_primary_upmap mappings. It's because of https://tracker.ceph.com/issues/66260.

Still, all these problems require that the Reef's pg-upmap-primary feature has been used in a cluster. The read balancer doesn't use it currently (this will change in Squid).

Actions #22

Updated by Radoslaw Zarzynski almost 2 years ago

  • Priority changed from High to Immediate
  • Severity changed from 3 - minor to 2 - major
Actions #23

Updated by Radoslaw Zarzynski almost 2 years ago · Edited

The RCA candidate

Presence of a decoder (in a service or in a client) for which an OSDMap's encoders in reef would generate pre-reef (18.2.2) or pre-nautilus (18.2.{0, 1}) bytestream if there is at least one pg-upmap-primary mapping.

See:

UPDATE: one extra note on the "a decoder for which encoder would generate pre-<version> bytestream" wording. As on reef.{0, 1} SERVER_REEF is missed in OSDMap::SIGNIFICANT_FEATURES, encoders see them as pre-reef.

Actions #24

Updated by Ilya Dryomov almost 2 years ago

  • Related to Bug #66260: mon, osd, *: require-min-compat-client is not really honored added
Actions #25

Updated by Laura Flores almost 2 years ago

  • Backport set to reef,squid
Actions #26

Updated by Radoslaw Zarzynski almost 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 57776
Actions #27

Updated by Radoslaw Zarzynski almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
Actions #28

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #66298: squid: Failed assert "pg_upmap_primaries.empty()" in the read balancer added
Actions #29

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #66299: reef: Failed assert "pg_upmap_primaries.empty()" in the read balancer added
Actions #31

Updated by Laura Flores almost 2 years ago

  • Related to Bug #66274: Online balancing with upmap-read mode performed without setting "min_compat_client" to reef added
Actions #32

Updated by Laura Flores almost 2 years ago · Edited

Adding a summary of the bug from the email list here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GUQCIRZRMGQ3JOXS2PYZL7EPO3ZMYV6R/

For users looking to remove existing pg-upmap-primary mappings, you can do so by:
$ `ceph osd dump`
For each pg_upmap_primary entry in the above output:
$ `ceph osd rm-pg-upmap-primary <pgid>`

You may alternatively run this script to remove pg-upmap-primary mappings:
https://raw.githubusercontent.com/ljflores/ceph_read_balancer_2023/main/remove_pg_upmap_primaries.sh

Actions #33

Updated by Laura Flores almost 2 years ago

  • Related to Bug #66329: mon: {rm-,}pg-upmap-primary should require the feature support also from mons and osds added
Actions #34

Updated by Laura Flores almost 2 years ago

  • Related to Bug #66285: osd, mon: decoders of OSDMap use old version for comparison of with struct_compat of DECODE_START added
Actions #35

Updated by Laura Flores over 1 year ago

  • Related to deleted (Bug #66274: Online balancing with upmap-read mode performed without setting "min_compat_client" to reef)
Actions #36

Updated by Laura Flores over 1 year ago

  • Has duplicate Bug #66274: Online balancing with upmap-read mode performed without setting "min_compat_client" to reef added
Actions #37

Updated by Laura Flores over 1 year ago

  • Status changed from Pending Backport to Resolved
Actions #38

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to baaf7c85a2c3cb725a6c34cef8760fa73e00385d
  • Fixed In set to v19.3.0-3736-gbaaf7c85a2c
  • Upkeep Timestamp set to 2025-07-12T04:44:37+00:00
Actions #39

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-3736-gbaaf7c85a2c to v19.3.0-3736-gbaaf7c85a2
  • Upkeep Timestamp changed from 2025-07-12T04:44:37+00:00 to 2025-07-14T23:40:07+00:00
Actions #40

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2389
  • Upkeep Timestamp changed from 2025-07-14T23:40:07+00:00 to 2025-11-01T01:21:08+00:00
Actions

Also available in: Atom PDF