reef: src/osd/OSDMap.cc: Fix encoder to produce same bytestream by rzarzynski · Pull Request #55712 · ceph/ceph

rzarzynski · 2024-02-22T15:04:12Z

backport tracker: https://tracker.ceph.com/issues/64406

backport of #55401
parent tracker: https://tracker.ceph.com/issues/63389

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

Fixes: https://tracker.ceph.com/issues/63389 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit a3bdffb)

kamoltat

LGTM

kamoltat · 2024-02-27T18:16:19Z

make check failed due to https://tracker.ceph.com/issues/62934
re-running ...

kamoltat · 2024-02-27T18:19:01Z

jenkins test make check

ljflores · 2024-02-29T21:04:04Z

Rados approved: https://tracker.ceph.com/projects/rados/wiki/REEF#httpstrellocomcVQmIA4Tu1964-wip-yuri8-testing-2024-02-22-0734-reef

DaDummy · 2024-05-15T13:45:38Z

We are seeing mon and osd crashes near the code changed in this PR when updating our ceph cluster from v18.2.1 to v18.2.2:

{
    "assert_condition": "pg_upmap_primaries.empty()",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osd/OSDMap.cc",
    "assert_func": "void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const",
    "assert_line": 3242,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osd/OSDMap.cc: In function 'void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7fd73ff3b700 time 2024-05-15T12:58:09.562982+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osd/OSDMap.cc: 3242: FAILED ceph_assert(pg_upmap_primaries.empty())\n",
    "assert_thread_name": "ms_dispatch",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12d20) [0x7fd74b394d20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7fd74db73e6f]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7fd74db73fdb]",
        "(OSDMap::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x127f) [0x7fd74e02923f]",
        "(OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xe0) [0x561952a1fe30]",
        "(OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x326) [0x561952a53126]",
        "(OSDMonitor::build_latest_full(unsigned long)+0x306) [0x561952a53496]",
        "(OSDMonitor::check_osdmap_sub(Subscription*)+0x7a) [0x561952a5732a]",
        "(Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xeed) [0x56195287719d]",
        "(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x7d6) [0x56195288bee6]",
        "(Monitor::_ms_dispatch(Message*)+0x406) [0x56195288d0e6]",
        "(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5d) [0x5619528be03d]",
        "(Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x478) [0x7fd74ddedfe8]",
        "(DispatchQueue::entry()+0x50f) [0x7fd74ddeb18f]",
        "(DispatchQueue::DispatchThread::entry()+0x11) [0x7fd74deb1761]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7fd74b38a1ca]",
        "clone()"
    ],
    "ceph_version": "18.2.2",
    "crash_id": "2024-05-15T12:58:09.579369Z_749b2b03-c7ac-4b4a-8e6e-d599e56f9910",
    "entity_name": "mon.s",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mon",
    "stack_sig": "7d6e8671dd1c6ce0e7764f5b29243a4a5eea60602b7cb3836c7cc8fe31edc184",
    "timestamp": "2024-05-15T12:58:09.579369Z",
    "utsname_hostname": "rook-ceph-mon-s-5dc46f8db8-gjqfk",
    "utsname_machine": "x86_64",
    "utsname_release": "6.1.0-21-amd64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03)"
}

$ ceph osd get-require-min-compat-client
quincy

Any chance this change accidentally introduced a regression there?

DaDummy · 2024-05-15T15:08:19Z

The following steps appear to allow working around the issue:

Ensure your ceph cluster is healthy and running ceph reef (e.g. 18.2.1)
ceph osd set-require-min-compat-client reef
ceph balancer off
ceph osd dump
ceph osd rm-pg-upmap-primary <each upmap primary id from dump>
ceph osd rm-pg-upmap-items <each upmap item from dump>
wait until cluster finishes backfills and is healthy
upgrade to 18.2.2
wait till the upgrade is fully rolled out and the cluster is healthy
optional: ceph balancer on

rzarzynski · 2024-05-22T16:56:52Z

@kamoltat: could you please take a look as well?

rzarzynski · 2024-05-22T17:15:49Z

Hmm, I read the code around the failed assertion (https://github.com/rzarzynski/ceph/blob/e9880fefda543d9d785dba89fe90c5e5074bb62a/src/osd/OSDMap.cc#L3242):

void OSDMap::encode(ceph::buffer::list& bl, uint64_t features) const
{
  // ...
  {
    // NOTE: any new encoding dependencies must be reflected by
    // SIGNIFICANT_FEATURES
    uint8_t v = 10;
    if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
      v = 3;
    } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
      v = 6;
    } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
      v = 7;
    } else if (!HAVE_FEATURE(features, SERVER_REEF)) {
      v = 9;
    }
    // ...
    if (v >= 4) {
      encode(pg_upmap, bl);
      encode(pg_upmap_items, bl);
    } else {
      ceph_assert(pg_upmap.empty());
      ceph_assert(pg_upmap_items.empty());
    }

My understanding is the encoder was encoding for some very, very old decoder (features lacked SERVER_LUMINOUS).
@DaDummy: is there a log available by any chance? The backtrace is about a monitor handling a subscription for OSDMaps. This can come from any RADOS entity basically, not only clients. I would love to get more a trace from OSD as well.

At the moment I don't see any obvious relationship with the commit.

DaDummy · 2024-05-22T17:46:57Z

Sorry, I don't think I have any logs of the troubled situation. I have since applied the workaround and would need to recreate the broken state first to fetch logs.

I expect the issue to be reproducible by spinning up a ceph cluster with 18.2.1, turning the balancer on, running operations that cause upmap entries to be generated and then attempting to upgrade the cluster to 18.2.2. You will then witness the 18.2.2 osds and mons to enter a crash loop while the upgrade is being rolled out.

Not sure if the crash loop would stop once all nodes have been upgraded as the rollout got stuck half-way on our instance due to the cluster becoming unhealthy.

DaDummy · 2024-05-22T18:42:32Z

I think you might be looking at the wrong assertion ("assert_condition": "pg_upmap_primaries.empty()" vs. ceph_assert(pg_upmap.empty());)

DaDummy · 2024-05-22T18:50:50Z

void OSDMap::encode(ceph::buffer::list& bl, uint64_t features) const
{
  // ...
  {
    // NOTE: any new encoding dependencies must be reflected by
    // SIGNIFICANT_FEATURES
    uint8_t v = 10;
    if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
      v = 3;
    } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
      v = 6;
    } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
      v = 7;
    } else if (!HAVE_FEATURE(features, SERVER_REEF)) {
      v = 9;
    }
    // ...
    if (v >= 10) {
      encode(pg_upmap_primaries, bl);
    } else {
      ceph_assert(pg_upmap_primaries.empty());
    }

The change in src/osd/OSDMap.h causes SERVER_REEF to be added to the bitmask.

I guess while decoding the feature-bitmask is received from the sender, which when running 18.2.1 does not have that bit set. Thus 18.2.2 interprets 18.2.1 data as non-reef and thus enter the assertion during decoding.

DaDummy · 2024-05-22T18:53:37Z

So this basically means that this change causes incremental rollouts of upgrades from pre-18.2.2 reef versions to 18.2.2 to cause newly spawned services to enter a crash loop until all cluster osds and mons have been upgraded.

I recommend posting a notice about such upgrades not being seamless together with above workaround.

rzarzynski · 2024-05-22T20:21:55Z

Yes, you're right! 18.2.0 and 18.2.1 lacks SERVER_REEF in OSDMap::SIGNIFICANT_FEATURES, so the encoder falls back to the v = 9 which leads to the assertion on pg_upmap_primaries being empty which doesn't hold on these early Reefs. This structure is the one osd pg-upmap-primary and osd rm-pg-upmap-primary asok commands operate on.

However, I don't see the link with ceph osd rm-pg-upmap-items. This is actually the command we have since SERVER_LUMINOUS:

    if (v >= 4) {
      encode(pg_upmap, bl);
      encode(pg_upmap_items, bl);
    } else {
      ceph_assert(pg_upmap.empty());
      ceph_assert(pg_upmap_items.empty());
    }

Have you observed any issue with just the upmapped primaries removed? I guess erasing upmapped items nor disabling the balancer is not need:

$ grep -r rm-pg-upmap-items
./mon/MonCommands.h:COMMAND("osd rm-pg-upmap-items "
./mon/OSDMonitor.cc:             prefix == "osd rm-pg-upmap-items" ||
./mon/OSDMonitor.cc:    } else if (prefix == "osd rm-pg-upmap-items") {
./tools/osdmaptool.cc:    ss << cmd + " osd rm-pg-upmap-items " << i << std::endl;
./pybind/mgr/balancer/module.py:            ls.append('ceph osd rm-pg-upmap-items %s' % pgid)
./pybind/mgr/balancer/module.py:            self.log.info('ceph osd rm-pg-upmap-items %s', pgid)
./pybind/mgr/balancer/module.py:                'prefix': 'osd rm-pg-upmap-items',
Binary file ./pybind/mgr/balancer/__pycache__/module.cpython-38.pyc matches
./mgr/DaemonServer.cc:      "\"prefix\": \"osd rm-pg-upmap-items\", "
$ grep -r rm-pg-upmap-primary
mon/MonCommands.h:COMMAND("osd rm-pg-upmap-primary "
mon/OSDMonitor.cc:	     prefix == "osd rm-pg-upmap-primary") {
mon/OSDMonitor.cc:    } else if (prefix == "osd rm-pg-upmap-primary") {

I recommend posting a notice about such upgrades not being seamless together with above workaround.

Impossible to disagree 👍.

rzarzynski · 2024-05-22T21:39:18Z

@DaDummy: just to summarize my perception on the conditions to run into the problem:

upgrading from 18.2.0 or 18.2.1 (or RCs of Reef),
having a least one upmap_primary entry (which requires operator's interaction with recent osd pg-upmap-primary).

ljflores · 2024-05-22T23:17:29Z

Hi @DaDummy, there is a tracker for this issue: https://tracker.ceph.com/issues/61948

Can you please update the tracker when you can with exact steps to reproduce the issue?

I understand you:

set the min-compat-client to reef on a v18.2.1 cluster
applied some pg-upmap-primary mappings
upgraded to v18.2.2
experienced the bug

Please let me know if I am missing any steps in this reproducer.

DaDummy · 2024-05-23T09:54:01Z

(sorry I posted this with the wrong account before)

Yes, you're right! 18.2.0 and 18.2.1 lacks SERVER_REEF in OSDMap::SIGNIFICANT_FEATURES, so the encoder falls back to the v = 9 which leads to the assertion on pg_upmap_primaries being empty which doesn't hold on these early Reefs. This structure is the one osd pg-upmap-primary and osd rm-pg-upmap-primary asok commands operate on.
However, I don't see the link with ceph osd rm-pg-upmap-items. This is actually the command we have since SERVER_LUMINOUS:
    if (v >= 4) {
      encode(pg_upmap, bl);
      encode(pg_upmap_items, bl);
    } else {
      ceph_assert(pg_upmap.empty());
      ceph_assert(pg_upmap_items.empty());
    }
Have you observed any issue with just the upmapped primaries removed? I guess erasing upmapped items nor disabling the balancer is not need:
$ grep -r rm-pg-upmap-items
./mon/MonCommands.h:COMMAND("osd rm-pg-upmap-items "
./mon/OSDMonitor.cc:             prefix == "osd rm-pg-upmap-items" ||
./mon/OSDMonitor.cc:    } else if (prefix == "osd rm-pg-upmap-items") {
./tools/osdmaptool.cc:    ss << cmd + " osd rm-pg-upmap-items " << i << std::endl;
./pybind/mgr/balancer/module.py:            ls.append('ceph osd rm-pg-upmap-items %s' % pgid)
./pybind/mgr/balancer/module.py:            self.log.info('ceph osd rm-pg-upmap-items %s', pgid)
./pybind/mgr/balancer/module.py:                'prefix': 'osd rm-pg-upmap-items',
Binary file ./pybind/mgr/balancer/__pycache__/module.cpython-38.pyc matches
./mgr/DaemonServer.cc:      "\"prefix\": \"osd rm-pg-upmap-items\", "
$ grep -r rm-pg-upmap-primary
mon/MonCommands.h:COMMAND("osd rm-pg-upmap-primary "
mon/OSDMonitor.cc:	     prefix == "osd rm-pg-upmap-primary") {
mon/OSDMonitor.cc:    } else if (prefix == "osd rm-pg-upmap-primary") {

Oh I see. Yeah the workaround I posted might involve more steps than actually necessary. I encountered the error, found some suggestions online that seemed to make sense and after verifying that it looked sane just went with it.

To clearly answer the question: No, I have not tried removing only the primaries but leaving the items there. Though I agree that the implementation looks like that should work.

Can you please update the tracker when you can with exact steps to reproduce the issue?
I understand you:

set the min-compat-client to reef on a v18.2.1 cluster

applied some pg-upmap-primary mappings

upgraded to v18.2.2

experienced the bug

Please let me know if I am missing any steps in this reproducer.

These steps should reproduce the issue, though I have not verified that as I don't have a free test-cluster available atm:

set the min-compat-client to reef on a v18.2.1 cluster
applied some pg-upmap-primary mappings
upgraded some, but not all osds/mons to v18.2.2
observe (only) the upgraded services crashing

Based on the conversation here I expect, that the crashes would stop once all ceph services are upgraded to 18.2.2.

rzarzynski · 2024-05-23T19:27:44Z

Thanks for looking into this, @DaDummy!

Based on the conversation here I expect, that the crashes would stop once all ceph services are upgraded to 18.2.2.

My understanding is there shouldn't be a decoder (in a service or in a client) for which 18.2.2 encoder would generate pre-Reef bytestream if somebody wants to use pg-upmap-primary mappings.

This stance isn't particularly useful, so let's try to clarify. With a bunch of extra assumptions to-be-checked, my thinking goes along these lines:

Setting min-compat-client to reef should (verifying this) allow to strip pre-Reef decoders-in-clients from the reasoning.
The above stays true even for clients at 18.2.0 or 18.2.1 as their connection features don't depend on SIGNIFICANT_FEATURES.
Once all other entities are at 18.2.2 there shouldn't be a need to reencode. This restricts our interests towards mixed version clusters .
When a 18.2.{0,1} encoder encounters a non-SERVER_REEF decoder, it always takes the assert-free path – v=10 is basically hardcoded (you would need a decoder with pre-SERVER_NAUTILUS flags otherwise :-)) and the assert is effectively a dead code.
When a 18.2.2 encoder encounters a non-SERVER_REEF decoder, it switches to 9th version of the format (v=9) but this requires no pg-upmap-primary mappings. If upgrade happens from quincy, it should be guaranteed as the upmap-primary managerial commands are simply unavailable :-).
The funny part begins when there was an upgrade to 18.2.{0.1} (mgmt asoks become available) or there is some old / leftover entity in the cluster (partial upgrade). Anyway, this still should require having at least one upmap-primary mapping present.

I will be refining this further.

DaDummy · 2024-05-27T13:17:39Z

Looks like the problem may reemerge later even with only v18.2.2-services present.

I just observed a mon starting to crash again while I was draining a different node that was also running a mon. Downgrading the mons back to 18.2.1 stopped the crashes.

Curiously there are no upmap primaries listed in the output of ceph osd dump, only items.

Given that 18.2.1 was and is running without issue, I am tempted to stick to that release for now.

debug     -3> 2024-05-27T12:36:23.587+0000 7f4bb2e3b700  2 mon.s@2(peon) e37 send_reply 0x55fdbfa2d0e0 0x55fdbfa17520 auth_reply(proto 2 0 (0) Success) v1
debug     -2> 2024-05-27T12:36:23.591+0000 7f4bb2e3b700  5 mon.s@2(peon).osd e28597 send_incremental [27418..28597] to client.125586643
debug     -1> 2024-05-27T12:36:23.599+0000 7f4bb2e3b700 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osd/OSDMap.cc: In function 'void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7f4bb2e3b700 time 2024-05-27T12:36:23.597540+0000
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/18.2.2/rpm/el8/BUILD/ceph-18.2.2/src/osd/OSDMap.cc: 3242: FAILED ceph_assert(pg_upmap_primaries.empty())

 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x135) [0x7f4bc0a73e15]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f4bc0a73fdb]
 3: (OSDMap::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x127f) [0x7f4bc0f2923f]
 4: (OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xe0) [0x55fdbba70e30]
 5: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x326) [0x55fdbbaa4126]
 6: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool, boost::intrusive_ptr<MonOpRequest>)+0x520) [0x55fdbbaa63b0]
 7: (OSDMonitor::check_osdmap_sub(Subscription*)+0xd9) [0x55fdbbaa8389]
 8: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xeed) [0x55fdbb8c819d]
 9: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x7d6) [0x55fdbb8dcee6]
 10: (Monitor::_ms_dispatch(Message*)+0x406) [0x55fdbb8de0e6]
 11: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5d) [0x55fdbb90f03d]
 12: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x478) [0x7f4bc0cedfe8]
 13: (DispatchQueue::entry()+0x50f) [0x7f4bc0ceb18f]
 14: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f4bc0db1761]
 15: /lib64/libpthread.so.0(+0x81ca) [0x7f4bbe28a1ca]
 16: clone()

debug      0> 2024-05-27T12:36:23.599+0000 7f4bb2e3b700 -1 *** Caught signal (Aborted) **
 in thread 7f4bb2e3b700 thread_name:ms_dispatch

 ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)
 1: /lib64/libpthread.so.0(+0x12d20) [0x7f4bbe294d20]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f4bc0a73e6f]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f4bc0a73fdb]
 6: (OSDMap::encode(ceph::buffer::v15_2_0::list&, unsigned long) const+0x127f) [0x7f4bc0f2923f]
 7: (OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xe0) [0x55fdbba70e30]
 8: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x326) [0x55fdbbaa4126]
 9: (OSDMonitor::send_incremental(unsigned int, MonSession*, bool, boost::intrusive_ptr<MonOpRequest>)+0x520) [0x55fdbbaa63b0]
 10: (OSDMonitor::check_osdmap_sub(Subscription*)+0xd9) [0x55fdbbaa8389]
 11: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0xeed) [0x55fdbb8c819d]
 12: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x7d6) [0x55fdbb8dcee6]
 13: (Monitor::_ms_dispatch(Message*)+0x406) [0x55fdbb8de0e6]
 14: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x5d) [0x55fdbb90f03d]
 15: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x478) [0x7f4bc0cedfe8]
 16: (DispatchQueue::entry()+0x50f) [0x7f4bc0ceb18f]
 17: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f4bc0db1761]
 18: /lib64/libpthread.so.0(+0x81ca) [0x7f4bbe28a1ca]
 19: clone()

rzarzynski · 2024-05-28T18:36:53Z

I bet that among 27418..28597 there is at least one OSDMap with non-empty pg_upmap_primary mappings:

debug -2> 2024-05-27T12:36:23.591+0000 7f4bb2e3b700 5 mon.s@2(peon).osd e28597 send_incremental [27418..28597] to client.125586643

while client.125586643 is an old client (kernel one maybe?) and triggers reencode_full_map():

 7: (OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xe0) [0x55fdbba70e30]

It should never connect to the cluster per the set-require-min-compat-client = reef but this setting was never enforced :-/. Created https://tracker.ceph.com/issues/66260 for tracking the problem. I will have a PR very soon.

Sticking to v18.2.1 while not using pg_umap_primary should be fine. With mappings added even reef.0 and reef.1 could exhibit some misdirected messages from old clients that are unaware about the feature. WIP.

src/osd/OSDMap.cc: Fix encoder to produce same bytestream

e9880fe

Fixes: https://tracker.ceph.com/issues/63389 Signed-off-by: Kamoltat <ksirivad@redhat.com> (cherry picked from commit a3bdffb)

rzarzynski requested a review from a team as a code owner February 22, 2024 15:04

rzarzynski added this to the reef milestone Feb 22, 2024

rzarzynski added the core label Feb 22, 2024

rzarzynski requested review from kamoltat and neha-ojha February 22, 2024 15:04

yuriw added wip-yuri6-testing wip-yuri8-testing and removed wip-yuri6-testing labels Feb 22, 2024

kamoltat approved these changes Feb 27, 2024

View reviewed changes

kamoltat added the backport reef label Feb 27, 2024

yuriw merged commit b6a2288 into ceph:reef Feb 29, 2024

Conversation

rzarzynski commented Feb 22, 2024

Uh oh!

kamoltat left a comment

Choose a reason for hiding this comment

Uh oh!

kamoltat commented Feb 27, 2024

Uh oh!

kamoltat commented Feb 27, 2024

Uh oh!

ljflores commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaDummy commented May 15, 2024

Uh oh!

DaDummy commented May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzarzynski commented May 22, 2024

Uh oh!

rzarzynski commented May 22, 2024

Uh oh!

DaDummy commented May 22, 2024

Uh oh!

DaDummy commented May 22, 2024

Uh oh!

DaDummy commented May 22, 2024

Uh oh!

DaDummy commented May 22, 2024

Uh oh!

rzarzynski commented May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzarzynski commented May 22, 2024

Uh oh!

ljflores commented May 22, 2024

Uh oh!

DaDummy commented May 23, 2024

Uh oh!

rzarzynski commented May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaDummy commented May 27, 2024

Uh oh!

rzarzynski commented May 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ljflores commented Feb 29, 2024 •

edited

Loading

DaDummy commented May 15, 2024 •

edited

Loading

rzarzynski commented May 22, 2024 •

edited

Loading

rzarzynski commented May 23, 2024 •

edited

Loading