The RCA candidate¶

Related to RADOS - Bug #66329: mon: {rm-,}pg-upmap-primary should require the feature support also from mons and osds

Resolved

Related to RADOS - Bug #66285: osd, mon: decoders of OSDMap use old version for comparison of with struct_compat of DECODE_START

Resolved

Has duplicate RADOS - Bug #66274: Online balancing with upmap-read mode performed without setting "min_compat_client" to reef

Duplicate

Laura Flores

Copied to RADOS - Backport #66298: squid: Failed assert "pg_upmap_primaries.empty()" in the read balancer

Resolved

Copied to RADOS - Backport #66299: reef: Failed assert "pg_upmap_primaries.empty()" in the read balancer

Resolved

Copy link Download all files

Updated by Laura Flores over 2 years ago

Assignee set to Laura Flores

Actions

Updated by Stefan Kooman over 2 years ago

File om om added
File om_old om_old added

In the attachment the first osd map I used (om_old), and the osd map used the second time with version 18.1.2 (om).

Actions

Updated by Laura Flores over 2 years ago

Thanks Stefan!

Actions

Updated by Laura Flores over 2 years ago

The crash occurred during a step when the osdmap is encoded.
Since the read balancer is a new feature in reef, the encoder is required to be at the most updated version (10). However, this cluster detected an older encoder version (<10), which assumes there are no pg upmap primary entries. However, there were pg upmap primary entries in the osdmap, and this caused a crash.
https://github.com/ceph/ceph/blob/v18.1.2/src/osd/OSDMap.cc#L3236-L3240

    if (v >= 10) {
      encode(pg_upmap_primaries, bl);
    } else {
      ceph_assert(pg_upmap_primaries.empty());
    }

The information to figure out is why this cluster detected older features and settled on an older encoder version.

This may be relevant in the mon log:

  -341> 2023-07-10T19:02:16.674+0000 7f9604fb2c80  0 mon.reef03@-1(???).osd e184 crush map has features 3314933000854323200, adjusting msgr requires
  -340> 2023-07-10T19:02:16.674+0000 7f9604fb2c80  0 mon.reef03@-1(???).osd e184 crush map has features 2738472248550883328, adjusting msgr requires
  -339> 2023-07-10T19:02:16.674+0000 7f9604fb2c80  0 mon.reef03@-1(???).osd e184 crush map has features 2738472248550883328, adjusting msgr requires
  -338> 2023-07-10T19:02:16.674+0000 7f9604fb2c80  0 mon.reef03@-1(???).osd e184 crush map has features 2738472248550883328, adjusting msgr requires

Actions

Updated by Stefan Kooman over 2 years ago

I have created a new cluster. Upgraded it to 18.1.2 (from 18.0.0) before I created any OSDs (and pool/image). I redid the tests. The OSDs asserted again (same like before). The monitor(s) did hit an assert this time. I had debug logging for the monitors set to 20/20 ... but not for the OSDs. I will try again with debug set to 20/20 for the OSDs. Strangely ceph-crash did not report the OSDs had crashed (it did in the other cluster). But that might be another issue / bug...

Pretty sure (I checked the timestamps) that the "crush map has features 2738472248550883328, adjusting msgr requires" logging is a result of "ceph osd set-require-min-compat-client reef"

Actions

Updated by Stefan Kooman over 2 years ago

I deleted all OSDs, and recreated them. Repeated the tests, but could not reproduce. So there must be (a) condition(s) that trigger it, but not sure what. I hope that my observations proof to be helpful.

Actions

Updated by Laura Flores over 2 years ago

Hey Stefan,

Thanks for your analysis. This failure indicates that there is a client-server version mismatch (could be a deeper issue, but this is what the failure points to). If you have mon and osd logs with a 20/20 logging level, that would definitely be helpful.

In the meantime, I am working to see if I can reproduce your issue on my end.

Actions

Updated by Stefan Kooman over 2 years ago

I recreated yet another cluster and enabled debug logging for osd / mon, but I could not reproduce.

Actions

Updated by Laura Flores over 2 years ago

Thanks Stefan. We're continuing to look into it. If you're able to reproduce the issue again, it may help to dump your mon_status, as it contains a feature map. There is some kind of feature incompatibility going on here, so that could be telling.

`ceph tell mon.* mon_status`

Or use this if the mons are not accessible:
`ceph --admin-daemon /var/run/ceph/<cluster id>/ceph-mon.<mon id>.asok mon_status`

Actions

#10

Updated by Laura Flores over 2 years ago

Hi Stefan,

Can you tell me whether just one mon crashed when you applied the upmaps, or did they all crash?

Thanks,
Laura

Actions

#11

Updated by Stefan Kooman over 2 years ago

Can you tell me whether just one mon crashed when you applied the upmaps, or did they all crash?

The second cluster only had one monitor crash (I don't have logs anymore of the first cluster).

Actions

#12

Updated by Radoslaw Zarzynski over 2 years ago

Status changed from New to Need More Info

Need for info for now but in longer term we might close if reproduction is impossible.

Actions

#13

Updated by Laura Flores almost 2 years ago

Another occurrence here: https://github.com/ceph/ceph/pull/55712#issuecomment-2112585048

Actions

#14

Updated by Laura Flores almost 2 years ago · Edited

Reproducer

I created a reproducer on this branch:
https://github.com/ceph/ceph/compare/reef...ljflores:wip-reef-tracker-61948?expand=1

Tested the reproducer here:
https://pulpito.ceph.com/lflores-2024-05-23_00:43:27-upgrade:reef-p2p:reef-p2p-parallel-reef-distro-default-smithi/

Workaround

One workaround is to remove the pg-upmap-primary entries before upgrading.

This branch demonstrates how:
https://github.com/ceph/ceph/compare/reef...ljflores:wip-reef-tracker-61948?expand=1

And here it was tested:
https://pulpito.ceph.com/lflores-2024-05-23_04:33:33-upgrade:reef-p2p:reef-p2p-parallel-reef-distro-default-smithi/

Actions

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?var-sig_v2=c5449e00870cb10128b9b1d6a9fda677a6781982ee6ea1e5f81fe939e8d002e9&orgId=1

#15

Updated by Laura Flores almost 2 years ago

Actions

http://telemetry.front.sepia.ceph.com:4000/d/jByk5HaMz/crash-spec-x-ray?var-sig_v2=ccf7b97344c6c787cba071e014d31eeae4ae7710a07be998d4976461dfd5735a&orgId=1

#16

Updated by Laura Flores almost 2 years ago

Actions

#17

Updated by Laura Flores almost 2 years ago

Affected Versions v18.2.2 added

Actions

#18

Updated by Laura Flores almost 2 years ago · Edited

Upgrading from 18.2.0 to 18.2.1 does not reproduce the bug.

I modified the upgrade path on this branch:
https://github.com/ceph/ceph/compare/reef...ljflores:ceph:wip-reef-tracker-61948-upgrade-18.2.0-18.2.1?expand=1

Tested here:
https://pulpito.ceph.com/lflores-2024-05-23_06:02:47-upgrade:reef-p2p:reef-p2p-parallel-reef-distro-default-smithi/

This checks out with the telemetry links above, which only track crashes on 18.2.2 clusters.

Likely the first reported instance of this (on the RC) resulted from a bad installation, but the crashes we're seeing now would affect anyone upgrading to 18.2.2.

Actions

#19

Updated by Christoffer Anselm almost 2 years ago

These steps should reproduce the issue, though I have not verified that as I don't have a free test-cluster available atm:

1. set the min-compat-client to reef on a v18.2.1 cluster
2. applied some pg-upmap-primary mappings
3. upgraded some, but not all osds/mons to v18.2.2
4. observe (only) the upgraded services crashing

Based on the conversation in https://github.com/ceph/ceph/pull/55712 I expect that the crashes would stop once all ceph services are upgraded to 18.2.2.

---
For readers only looking for a way to upgrade their cluster:

If you are not running reef yet, just upgrade directly to 18.2.2 as that should skip the releases causing the issue.

Otherwise the following steps should allow working around the issue:

1. Ensure your ceph cluster is healthy and running ceph reef (e.g. 18.2.1)
2. `ceph osd dump`
3. `ceph osd rm-pg-upmap-primary <each upmap primary id from dump>`
4. wait until cluster finishes backfills and is healthy
5. upgrade to 18.2.2

Just be sure that no new upmap primaries are created until all services have been upgraded as that will cause the 18.2.2 services to crash when receiving updates from services running the older reef release.

Actions

#20

Updated by Laura Flores almost 2 years ago

Status changed from Need More Info to In Progress

Actions

#21

Updated by Radoslaw Zarzynski almost 2 years ago · Edited

Laura Flores wrote in #note-13:

Another occurrence here: https://github.com/ceph/ceph/pull/55712#issuecomment-2112585048

I have mixed feelings towards reusing this tracker. Although the underlying mechanism is the same, there is huge difference in reproduction rate between the original report and the recent one.

Before 18.2.2 it was possible to hit this assertion only if a really ancient peer (typically client) bumped the v down to be less than 10:

void OSDMap::encode(ceph::buffer::list& bl, uint64_t features) const
{
  // ...
  // meta-encoding: how we include client-used and osd-specific data
  ENCODE_START(8, 7, bl);

  {
    // NOTE: any new encoding dependencies must be reflected by
    // SIGNIFICANT_FEATURES
    uint8_t v = 10;
    if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
      v = 3;
    } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
      v = 6;
    } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
      v = 7;
    } /* else if (!HAVE_FEATURE(features, SERVER_REEF)) {
      v = 9;
    } */

    // ...

    if (v >= 10) {
      encode(pg_upmap_primaries, bl);
    } else {
      ceph_assert(pg_upmap_primaries.empty());
    }
    ENCODE_FINISH(bl); // client-usable data
  }

Since 18.2.2 (which has the fix for https://tracker.ceph.com/issues/63389) the same, assert-enriched code path in encoder can be taken also for pre-Reef decoders.

However, even for 18.2.0 and 18.2.1 it's possible to run into the situation the assert was intended for – having clients that don't understand the feature in a cluster with pg_primary_upmap mappings. It's because of https://tracker.ceph.com/issues/66260.

Still, all these problems require that the Reef's pg-upmap-primary feature has been used in a cluster. The read balancer doesn't use it currently (this will change in Squid).

Actions

#22

Updated by Radoslaw Zarzynski almost 2 years ago

Priority changed from High to Immediate
Severity changed from 3 - minor to 2 - major

Actions

#23

Updated by Radoslaw Zarzynski almost 2 years ago · Edited

The RCA candidate¶

Presence of a decoder (in a service or in a client) for which an OSDMap's encoders in reef would generate pre-reef (18.2.2) or pre-nautilus (18.2.{0, 1}) bytestream if there is at least one pg-upmap-primary mapping.

See:

UPDATE: one extra note on the "a decoder for which encoder would generate pre-<version> bytestream" wording. As on reef.{0, 1} SERVER_REEF is missed in OSDMap::SIGNIFICANT_FEATURES, encoders see them as pre-reef.

Actions

#24

Updated by Ilya Dryomov almost 2 years ago

Related to Bug #66260: mon, osd, *: require-min-compat-client is not really honored added

Actions

#25

Updated by Laura Flores almost 2 years ago

Backport set to reef,squid

Actions

#26

Updated by Radoslaw Zarzynski almost 2 years ago

Status changed from In Progress to Fix Under Review
Pull request ID set to 57776

Actions

#27

Updated by Radoslaw Zarzynski almost 2 years ago

Status changed from Fix Under Review to Pending Backport

Actions

#28

Updated by Upkeep Bot almost 2 years ago

Copied to Backport #66298: squid: Failed assert "pg_upmap_primaries.empty()" in the read balancer added

Actions

#29

Updated by Upkeep Bot almost 2 years ago

Copied to Backport #66299: reef: Failed assert "pg_upmap_primaries.empty()" in the read balancer added

Actions

#31

Updated by Laura Flores almost 2 years ago

Related to Bug #66274: Online balancing with upmap-read mode performed without setting "min_compat_client" to reef added

Actions

#32

Updated by Laura Flores almost 2 years ago · Edited

Adding a summary of the bug from the email list here:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GUQCIRZRMGQ3JOXS2PYZL7EPO3ZMYV6R/

For users looking to remove existing pg-upmap-primary mappings, you can do so by:
$ `ceph osd dump`
For each pg_upmap_primary entry in the above output:
$ `ceph osd rm-pg-upmap-primary <pgid>`

You may alternatively run this script to remove pg-upmap-primary mappings:
https://raw.githubusercontent.com/ljflores/ceph_read_balancer_2023/main/remove_pg_upmap_primaries.sh

Actions

#33

Updated by Laura Flores almost 2 years ago

Related to Bug #66329: mon: {rm-,}pg-upmap-primary should require the feature support also from mons and osds added

Actions

#34

Updated by Laura Flores almost 2 years ago

Related to Bug #66285: osd, mon: decoders of OSDMap use old version for comparison of with struct_compat of DECODE_START added

Actions

#35

Updated by Laura Flores over 1 year ago

Related to deleted (Bug #66274: Online balancing with upmap-read mode performed without setting "min_compat_client" to reef)

Actions

#36

Updated by Laura Flores over 1 year ago

Has duplicate Bug #66274: Online balancing with upmap-read mode performed without setting "min_compat_client" to reef added

Actions

#37

Updated by Laura Flores over 1 year ago

Status changed from Pending Backport to Resolved

Actions

#38

Updated by Upkeep Bot 8 months ago

Merge Commit set to baaf7c85a2c3cb725a6c34cef8760fa73e00385d
Fixed In set to v19.3.0-3736-gbaaf7c85a2c
Upkeep Timestamp set to 2025-07-12T04:44:37+00:00

Actions

#39

Updated by Upkeep Bot 8 months ago

Fixed In changed from v19.3.0-3736-gbaaf7c85a2c to v19.3.0-3736-gbaaf7c85a2
Upkeep Timestamp changed from 2025-07-12T04:44:37+00:00 to 2025-07-14T23:40:07+00:00

Actions