Project

General

Profile

Actions

Bug #70715

closed

balance_reads - FAILED ceph_assert(!get_acting_recovery_backfill().empty())

Added by Laura Flores 12 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
tentacle
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.3.0-949-gedf8b22bfb
Released In:
Upkeep Timestamp:
2025-12-12T01:00:26+00:00

Description

/a/yuriw-2025-03-27_15:03:25-rados-wip-yuri7-testing-2025-03-26-1605-distro-default-smithi/8213334

2025-03-27T17:59:24.561 INFO:tasks.ceph.osd.4.smithi007.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.0.0-826-g9ac08274/rpm/el9/BUILD/ceph-20.0.0-826-g9ac08274/src/osd/PrimaryLogPG.cc: In function 'bool PrimaryLogPG::is_degraded_or_backfilling_object(const hobject_t&)' thread 2e200640 time 2025-03-27T17:59:24.557104+0000
2025-03-27T17:59:24.561 INFO:tasks.ceph.osd.4.smithi007.stderr:/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.0.0-826-g9ac08274/rpm/el9/BUILD/ceph-20.0.0-826-g9ac08274/src/osd/PrimaryLogPG.cc: 664: FAILED ceph_assert(!get_acting_recovery_backfill().empty())
2025-03-27T17:59:24.623 INFO:tasks.ceph.osd.4.smithi007.stderr: ceph version 20.0.0-826-g9ac08274 (9ac0827416113553ae7c81a83803a0162488915a) tentacle (dev)
2025-03-27T17:59:24.623 INFO:tasks.ceph.osd.4.smithi007.stderr: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x11e) [0x52d152]
2025-03-27T17:59:24.623 INFO:tasks.ceph.osd.4.smithi007.stderr: 2: ceph-osd(+0x3bb78b) [0x4c378b]
2025-03-27T17:59:24.623 INFO:tasks.ceph.osd.4.smithi007.stderr: 3: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3ea7) [0x7b0927]
2025-03-27T17:59:24.623 INFO:tasks.ceph.osd.4.smithi007.stderr: 4: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x197) [0x6ebc47]
2025-03-27T17:59:24.623 INFO:tasks.ceph.osd.4.smithi007.stderr: 5: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x69) [0x94b019]
2025-03-27T17:59:24.623 INFO:tasks.ceph.osd.4.smithi007.stderr: 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xccf) [0x7037bf]
2025-03-27T17:59:24.624 INFO:tasks.ceph.osd.4.smithi007.stderr: 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x2a9) [0xc37989]
2025-03-27T17:59:24.624 INFO:tasks.ceph.osd.4.smithi007.stderr: 8: ceph-osd(+0xb2ff44) [0xc37f44]
2025-03-27T17:59:24.624 INFO:tasks.ceph.osd.4.smithi007.stderr: 9: /lib64/libc.so.6(+0x8a3b2) [0x55bf3b2]
2025-03-27T17:59:24.624 INFO:tasks.ceph.osd.4.smithi007.stderr: 10: clone()


Related issues 2 (0 open2 closed)

Copied to crimson - Bug #71007: Copy of: balance_reads - FAILED ceph_assert(!get_acting_recovery_backfill().empty())ResolvedMatan Breizman

Actions
Copied to RADOS - Backport #72119: tentacle: balance_reads - FAILED ceph_assert(!get_acting_recovery_backfill().empty())ResolvedNitzan MordechaiActions
Actions #1

Updated by Laura Flores 12 months ago

Also seen in /a/yuriw-2025-03-21_20:26:29-rados-wip-yuri7-testing-2025-03-21-0821-distro-default-smithi/8201893

Actions #2

Updated by Radoslaw Zarzynski 12 months ago

bool PrimaryLogPG::is_degraded_or_backfilling_object(const hobject_t& soid)
{
  /* The conditions below may clear (on_local_recover, before we queue
   * the transaction) before we actually requeue the degraded waiters
   * in on_global_recover after the transaction completes.
   */
  if (waiting_for_degraded_object.count(soid))
    return true;
  if (recovery_state.get_pg_log().get_missing().get_items().count(soid))
    return true;
  ceph_assert(!get_acting_recovery_backfill().empty());
  // ...
}

void PrimaryLogPG::do_op(OpRequestRef& op)
{
// ...
  if (write_ordered) {
    // degraded object?
    if (is_degraded_or_backfilling_object(head)) {
      if (can_backoff && g_conf()->osd_backoff_on_degraded) {
        add_backoff(session, head, head);
        maybe_kick_recovery(head);
      } else {
        wait_for_degraded_object(head, op);
      }
      return;
    }
    // ...
  }
  // ...
  if (r == -EAGAIN) {
    // If we're not the primary of this OSD, we just return -EAGAIN. Otherwise,
    // we have to wait for the object.
    if (is_primary()) {
      // missing the specific snap we need; requeue and wait.
      ceph_assert(!op->may_write()); // only happens on a read/cache
      wait_for_unreadable_object(missing_oid, op);
      return;
    }
  } else if (r == 0) {
    if (is_unreadable_object(obc->obs.oi.soid)) {
      dout(10) << __func__ << ": clone " << obc->obs.oi.soid
               << " is unreadable, waiting" << dendl;
      wait_for_unreadable_object(obc->obs.oi.soid, op);
      return;
    }

    // degraded object?  (the check above was for head; this could be a clone)
    if (write_ordered &&
        obc->obs.oi.soid.snap != CEPH_NOSNAP &&
        is_degraded_or_backfilling_object(obc->obs.oi.soid)) {
      dout(10) << __func__ << ": clone " << obc->obs.oi.soid
               << " is degraded, waiting" << dendl;
      wait_for_degraded_object(obc->obs.oi.soid, op);
      return;
    }
  }
  // ...
Actions #3

Updated by Shraddha Agrawal 12 months ago

/a/skanta-2025-03-23_11:08:44-rados-wip-bharath15-testing-2025-03-22-1514-distro-default-smithi/8204187

Actions #4

Updated by Nitzan Mordechai 12 months ago

  • Assignee set to Nitzan Mordechai
Actions #5

Updated by Laura Flores 12 months ago

I suspect this might have come from when we linked the read-affinity folder to rados/verify: https://github.com/ceph/ceph/pull/62439/files#diff-af2b2860aca81540587c23e9dd2f3661be3d2d98c85b8cb44c0c37733b32f3fa

So, no obvious link to src code that is causing this crash, but I think this is why we haven't seen it in teuthology yet.

I noticed that all the failures have a similar test description (note `read-affinity/balance`):

rados/verify/{centos_latest ceph clusters/fixed-4 d-thrash/none mon_election/connectivity msgr-failures/few msgr/async-v1only objectstore/{bluestore/{alloc$/{stupid} base compr$/{no$/{no}} mem$/{normal-1} write$/{write_v1}}} rados read-affinity/balance tasks/rados_api_tests validater/valgrind}

Actions #6

Updated by Laura Flores 12 months ago

Attaching https://github.com/ceph/ceph/pull/56180/ which might be relevant. This is where "rados replica read policy" was introduced.

Actions #7

Updated by Jaya Prakash 12 months ago

/a/yuriw-2025-03-21_20:27:12-rados-wip-yuri2-testing-2025-03-21-0820-distro-default-smithi/8201982

Actions #8

Updated by Nitzan Mordechai 12 months ago

  • Status changed from New to In Progress
Actions #9

Updated by Nitzan Mordechai 12 months ago

i was able to reproduce it locally. using "rados replica read policy = balance" in vstart cluster.
then using the command:

ceph_test_rados --max-ops 3 --objects 2 --max-in-flight 1 --size 4000000 --min-stride-size 400000 --max-stride-size 800000 --max-seconds 0 --op write 50 --pool unique_pool_0

I had 2 osds, osd.1 failed with:

    -1> 2025-04-10T17:35:24.194+0000 7f929063b640 -1 /home/nmordech/ceph/src/osd/PrimaryLogPG.cc: In function 'bool PrimaryLogPG::is_degraded_or_backfilling_object(const hobject_t &)' thread 7f929063b640 time 2025-04-10T17:35:24.189995+0000
/home/nmordech/ceph/src/osd/PrimaryLogPG.cc: 667: FAILED ceph_assert(!get_acting_recovery_backfill().empty())

 ceph version 20.0.0-1296-g7c544455292 (7c544455292833b341010bd4b0f4bfcfb3ce8d65) tentacle (dev)
 1: (PrimaryLogPG::is_degraded_or_backfilling_object(hobject_t const&)+0x67f) [0xc3c1bf]
 2: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x19da) [0xc4db6a]
 3: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xb6e) [0xc4bfce]
 4: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0xb5d687]
 5: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x58) [0xdd7018]
 6: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x2fbf) [0xb69e5f]
 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x49a) [0x11fd1aa]
 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x11feb60]
 9: (Thread::entry_wrapper()+0x95) [0x11efc85]
 10: /lib64/libc.so.6(+0x8a002) [0x7f92b768a002]
 11: /lib64/libc.so.6(+0x10f070) [0x7f92b770f070]

just before the assert, we are trying to read obj folio033153681-1

   -15> 2025-04-10T17:35:24.189+0000 7f929063b640 10 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] _handle_message: osd_op(client.4139.0:4 2.f 2.5e5ef>
   -14> 2025-04-10T17:35:24.189+0000 7f929063b640 20 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] do_op: op osd_op(client.4139.0:4 2.f 2:f7df7a7a:::f>
   -13> 2025-04-10T17:35:24.189+0000 7f929063b640 20 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] op_has_sufficient_caps session=0x5f25dc0 pool=2 (un>
   -12> 2025-04-10T17:35:24.189+0000 7f929063b640 10 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] do_op osd_op(client.4139.0:4 2.f 2:f7df7a7a:::folio>
   -11> 2025-04-10T17:35:24.189+0000 7f929063b640 10 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] is_degraded_or_backfilling_object 2:f7df7a7a:::foli>
    -1> 2025-04-10T17:35:24.194+0000 7f929063b640 -1 /home/nmordech/ceph/src/osd/PrimaryLogPG.cc: In function 'bool PrimaryLogPG::is_degraded_or_backfilling_object(const hobject_t &)' thread 7f929063b640 time 2025-04-10T17:35:24.189995+0000
     0> 2025-04-10T17:35:24.200+0000 7f929063b640 -1 *** Caught signal (Aborted) **

I grep for "folio033153681-1" all over osd.1:

2025-04-10T17:35:24.183+0000 7f9294643640 10 osd.1 pg_epoch: 14 pg[2.f( empty local-lis/les=11/12 n=0 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 crt=0'0 active mbc={}] do_repop 2:f7df7a7a:::folio033153681-1:head v 13'1 (transaction) 208
2025-04-10T17:35:24.183+0000 7f9294643640 10 osd.1 pg_epoch: 14 pg[2.f( empty local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 crt=0'0 active mbc={}] PeeringState::append_log append_log log((0'0,0'0], crt=0'0) [13'1 (0'0) modify   2:f7df7a7a:::fol>
2025-04-10T17:35:24.183+0000 7f9294643640 10 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=0'0 lcod 0'0 active mbc={}] PeeringState::add_log_entry add_log_entry 13'1 (0'0) modify >
2025-04-10T17:35:24.183+0000 7f9294643640 20 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] rollforward: entry=13'1 (0'0) modify   2:f7df7a7a:::folio03>
2025-04-10T17:35:24.183+0000 7f9294643640 20 bluestore(/home/nmordech/ceph/build/dev/osd1).collection(2.f_head 0x5ecefc0) get_onode oid #2:f7df7a7a:::folio033153681-1:head# key 0x7F8000000000000002F7DF'zz!folio033153681-1!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F
2025-04-10T17:35:24.183+0000 7f9294643640 20 bluestore.OnodeSpace(0x5ecf100 in 0x495e000) add_onode #2:f7df7a7a:::folio033153681-1:head# 0x6072300
2025-04-10T17:35:24.183+0000 7f9294643640 20 _add 0x495e000 #2:f7df7a7a:::folio033153681-1:head# added, num=2
2025-04-10T17:35:24.183+0000 7f9294643640 15 bluestore(/home/nmordech/ceph/build/dev/osd1) _touch 2.f_head #2:f7df7a7a:::folio033153681-1:head#
2025-04-10T17:35:24.183+0000 7f9294643640 10 bluestore(/home/nmordech/ceph/build/dev/osd1) _touch 2.f_head #2:f7df7a7a:::folio033153681-1:head# = 0
2025-04-10T17:35:24.183+0000 7f9294643640 15 bluestore(/home/nmordech/ceph/build/dev/osd1) _setattrs 2.f_head #2:f7df7a7a:::folio033153681-1:head# 2 keys
2025-04-10T17:35:24.183+0000 7f9294643640 10 bluestore(/home/nmordech/ceph/build/dev/osd1) _setattrs 2.f_head #2:f7df7a7a:::folio033153681-1:head# 2 keys = 0
2025-04-10T17:35:24.183+0000 7f9294643640 15 bluestore(/home/nmordech/ceph/build/dev/osd1) _write 2.f_head #2:f7df7a7a:::folio033153681-1:head# 0x65c27~74071
2025-04-10T17:35:24.183+0000 7f9294643640 20 bluestore(/home/nmordech/ceph/build/dev/osd1) _do_write #2:f7df7a7a:::folio033153681-1:head# 0x65c27~74071 - have 0x0 (0) bytes fadvise_flags 0x20 alloc_hint 0x0 expected_object_size 0 expected_write_size 0
2025-04-10T17:35:24.183+0000 7f9294643640 10 bluestore(/home/nmordech/ceph/build/dev/osd1) _write 2.f_head #2:f7df7a7a:::folio033153681-1:head# 0x65c27~74071 = 0
2025-04-10T17:35:24.183+0000 7f9294643640 20 bluestore.extentmap(0x6072488) update #2:f7df7a7a:::folio033153681-1:head#
2025-04-10T17:35:24.183+0000 7f9294643640 20 bluestore(/home/nmordech/ceph/build/dev/osd1) _record_onode onode #2:f7df7a7a:::folio033153681-1:head# is 1071 (357 bytes onode + 2 bytes spanning blobs + 712 bytes inline extents)
2025-04-10T17:35:24.183+0000 7f9294643640 10 osd.1 14 dequeue_op osd_repop(client.4139.0:1 2.f e13/11 2:f7df7a7a:::folio033153681-1:head v 13'1, pct=0'0) v3 finish
2025-04-10T17:35:24.184+0000 7f9294643640 10 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] repop_commit on op osd_repop(client.4139.0:1 2.f e13/11 2:f>
2025-04-10T17:35:24.184+0000 7f92a265f640 20 maybe_unpin 0x495e000 #2:f7df7a7a:::folio033153681-1:head# unpinned
2025-04-10T17:35:24.189+0000 7f929063b640 20 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] do_op: op osd_op(client.4139.0:4 2.f 2:f7df7a7a:::folio0331>
2025-04-10T17:35:24.189+0000 7f929063b640 10 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] do_op osd_op(client.4139.0:4 2.f 2:f7df7a7a:::folio03315368>
2025-04-10T17:35:24.189+0000 7f929063b640 10 osd.1 pg_epoch: 14 pg[2.f( v 13'1 (0'0,13'1] local-lis/les=11/12 n=1 ec=11/11 lis/c=11/11 les/c/f=12/12/0 sis=11) [0,1] r=1 lpr=11 pct=0'0 lua=0'0 crt=13'1 lcod 0'0 active mbc={}] is_degraded_or_backfilling_object 2:f7df7a7a:::folio0331536>

it doesn't looks like the write replica completed on osd.1, osd.0 logs show the objects.

the client output:

out/client.admin.3153681.log:2025-04-10T17:35:24.180+0000 7f7834811a80 20 client.4139.objecter _calc_target epoch 13 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_write
out/client.admin.3153681.log:2025-04-10T17:35:24.181+0000 7f7834811a80 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:24.181+0000 7f7834811a80 10 client.4139.objecter _op_submit oid folio033153681-1 '@2' '@2' [write 416807~475249 in=475249b,stat] tid 1 osd.0
out/client.admin.3153681.log:2025-04-10T17:35:24.184+0000 7f7832d82640 10 client.4139.objecter ms_dispatch2 0xf91990 osd_op_reply(1 folio033153681-1 [write 416807~475249,stat] v13'1 uv1 ondisk = 0)
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7834811a80 20 client.4139.objecter _calc_target epoch 13 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_write
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7834811a80 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7834811a80 10 client.4139.objecter _op_submit oid folio033153681-1 '@2' '@2' [write 1542129~474678 in=474678b] tid 2 osd.0
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7834811a80 20 client.4139.objecter _calc_target epoch 13 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_write
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7834811a80 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7834811a80 10 client.4139.objecter _op_submit oid folio033153681-1 '@2' '@2' [setxattr _header (55) in=62b,truncate 2016807] tid 3 osd.0
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7834811a80 20 client.4139.objecter _calc_target epoch 13 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_read
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7834811a80 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7834811a80 10 client.4139.objecter _op_submit oid folio033153681-1 '@2' '@2' [read 0~1] tid 4 osd.1
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 14 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_write
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 14 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_write
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 14 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_read
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7822ffd640 20 client.4139.objecter 2        2.5e5efbef      osd.0   folio033153681-1        [write 1542129~474678 in=474678b]
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7822ffd640 20 client.4139.objecter 3        2.5e5efbef      osd.0   folio033153681-1        [setxattr _header (55) in=62b,truncate 2016807]
out/client.admin.3153681.log:2025-04-10T17:35:24.188+0000 7f7822ffd640 20 client.4139.objecter 4        2.5e5efbef      osd.1   folio033153681-1        [read 0~1]
out/client.admin.3153681.log:2025-04-10T17:35:24.189+0000 7f7822ffd640 20 client.4139.objecter 2        2.5e5efbef      osd.0   folio033153681-1        [write 1542129~474678 in=474678b]
out/client.admin.3153681.log:2025-04-10T17:35:24.189+0000 7f7822ffd640 20 client.4139.objecter 3        2.5e5efbef      osd.0   folio033153681-1        [setxattr _header (55) in=62b,truncate 2016807]
out/client.admin.3153681.log:2025-04-10T17:35:24.189+0000 7f7822ffd640 20 client.4139.objecter 4        2.5e5efbef      osd.1   folio033153681-1        [read 0~1]
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 15 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_write
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 15 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_write
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 15 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_read
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 15 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_read
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter 2        2.5e5efbef      osd.0   folio033153681-1        [write 1542129~474678 in=474678b]
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter 3        2.5e5efbef      osd.0   folio033153681-1        [setxattr _header (55) in=62b,truncate 2016807]
out/client.admin.3153681.log:2025-04-10T17:35:27.184+0000 7f7822ffd640 20 client.4139.objecter 4        2.5e5efbef      osd.0   folio033153681-1        [read 0~1]
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 16 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_write
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 16 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_write
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7822ffd640 20 client.4139.objecter _calc_target epoch 16 base folio033153681-1 @2 precalc_pgid 0 pgid 0.0 is_read
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7822ffd640 20 client.4139.objecter _calc_target target folio033153681-1 @2 -> pgid 2.5e5efbef
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7822ffd640 20 client.4139.objecter 2        2.5e5efbef      osd.0   folio033153681-1        [write 1542129~474678 in=474678b]
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7822ffd640 20 client.4139.objecter 3        2.5e5efbef      osd.0   folio033153681-1        [setxattr _header (55) in=62b,truncate 2016807]
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7822ffd640 20 client.4139.objecter 4        2.5e5efbef      osd.0   folio033153681-1        [read 0~1]
out/client.admin.3153681.log:2025-04-10T17:35:28.190+0000 7f7832d82640 10 client.4139.objecter ms_dispatch2 0xf91990 osd_op_reply(2 folio033153681-1 [write 1542129~474678] v14'2 uv2 ondisk = 0)
out/client.admin.3153681.log:2025-04-10T17:35:28.191+0000 7f7832d82640 10 client.4139.objecter ms_dispatch2 0xf91990 osd_op_reply(3 folio033153681-1 [setxattr (55),truncate 2016807] v14'3 uv3 ondisk = 0)
out/client.admin.3153681.log:2025-04-10T17:35:28.191+0000 7f7832d82640 10 client.4139.objecter ms_dispatch2 0xf91990 osd_op_reply(4 folio033153681-1 [read 0~1 out=1b] v0'0 uv3 ondisk = 0)

Actions #10

Updated by Neha Ojha 12 months ago

Looking at the logs from /a/yuriw-2025-03-27_15:03:25-rados-wip-yuri7-testing-2025-03-26-1605-distro-default-smithi/8213334 looks the write did happen

2025-03-27T17:59:24.499+0000 2e200640 10 osd.4 pg_epoch: 44 pg[17.13( empty local-lis/les=40/42 n=0 ec=40/40 lis/c=40/40 les/c/f=42/43/0 sis=40) [0,4] r=1 lpr=40 crt=0'0 active mbc={}] do_repop 17:c90e42af:test-rados-api-smithi162-43481-9::testobj:head v 44'1 (transaction) 215
2025-03-27T17:59:24.499+0000 2e200640 10 osd.4 pg_epoch: 44 pg[17.13( empty local-lis/les=40/42 n=1 ec=40/40 lis/c=40/40 les/c/f=42/43/0 sis=40) [0,4] r=1 lpr=40 crt=0'0 active mbc={}] PeeringState::append_log append_log log((0'0,0'0], crt=0'0) [44'1 (0'0) modify   17:c90e42af:test-rados-api-smithi162-43481-9::testobj:head by client.4812.0:89 2025-03-27T17:59:24.485935+0000 0 ObjectCleanRegions clean_offsets: [8~18446744073709551607], clean_omap: true, new_object: false]
2025-03-27T17:59:24.499+0000 2e200640 10 osd.4 pg_epoch: 44 pg[17.13( v 44'1 (0'0,44'1] local-lis/les=40/42 n=1 ec=40/40 lis/c=40/40 les/c/f=42/43/0 sis=40) [0,4] r=1 lpr=40 pct=0'0 lua=0'0 crt=0'0 lcod 0'0 active mbc={}] PeeringState::add_log_entry add_log_entry 44'1 (0'0) modify   17:c90e42af:test-rados-api-smithi162-43481-9::testobj:head by client.4812.0:89 2025-03-27T17:59:24.485935+0000 0 ObjectCleanRegions clean_offsets: [8~18446744073709551607], clean_omap: true, new_object: false
2025-03-27T17:59:24.499+0000 2e200640 20 osd.4 pg_epoch: 44 pg[17.13( v 44'1 (0'0,44'1] local-lis/les=40/42 n=1 ec=40/40 lis/c=40/40 les/c/f=42/43/0 sis=40) [0,4] r=1 lpr=40 pct=0'0 lua=0'0 crt=44'1 lcod 0'0 active mbc={}] rollforward: entry=44'1 (0'0) modify   17:c90e42af:test-rados-api-smithi162-43481-9::testobj:head by client.4812.0:89 2025-03-27T17:59:24.485935+0000 0 ObjectCleanRegions clean_offsets: [8~18446744073709551607], clean_omap: true, new_object: false
2025-03-27T17:59:24.502+0000 2e200640 10 osd.4 44 dequeue_op osd_repop(client.4812.0:89 17.13 e44/40 17:c90e42af:test-rados-api-smithi162-43481-9::testobj:head v 44'1, pct=0'0) v3 finish
2025-03-27T17:59:24.506+0000 2a1f8640 10 osd.4 pg_epoch: 44 pg[17.13( v 44'1 (0'0,44'1] local-lis/les=40/42 n=1 ec=40/40 lis/c=40/40 les/c/f=42/43/0 sis=40) [0,4] r=1 lpr=40 pct=0'0 lua=0'0 crt=44'1 lcod 0'0 active mbc={}] repop_commit on op osd_repop(client.4812.0:89 17.13 e44/40 17:c90e42af:test-rados-api-smithi162-43481-9::testobj:head v 44'1, pct=0'0), sending commit to osd.0
...
2025-03-27T17:59:24.555+0000 2e200640 10 osd.4 pg_epoch: 44 pg[17.13( v 44'1 (0'0,44'1] local-lis/les=40/42 n=1 ec=40/40 lis/c=40/40 les/c/f=42/43/0 sis=40) [0,4] r=1 lpr=40 pct=0'0 lua=0'0 crt=44'1 lcod 0'0 active mbc={}] do_op osd_op(client.4812.0:90 17.13 17:c90e42af:test-rados-api-smithi162-43481-9::testobj:head [read 0~8 [fadvise_dontneed]] snapc 0=[] ondisk+read+balance_reads+rwordered+known_if_redirected+supports_pool_eio e44) may_read -> write-ordered flags ondisk+read+balance_reads+rwordered+known_if_redirected+supports_pool_eio
Actions #11

Updated by Neha Ojha 12 months ago

  • Subject changed from FAILED ceph_assert(!get_acting_recovery_backfill().empty()) to balance_reads - FAILED ceph_assert(!get_acting_recovery_backfill().empty())
Actions #12

Updated by Samuel Just 12 months ago · Edited

void PrimaryLogPG::do_op(OpRequestRef& op)
...
  if ((m->get_flags() & (CEPH_OSD_FLAG_BALANCE_READS |
             CEPH_OSD_FLAG_LOCALIZE_READS)) &&
      op->may_read() &&
      !(op->may_write() || op->may_cache())) {
    // balanced reads; any replica will do
    if (!(is_primary() || is_nonprimary())) {
      osd->handle_misdirected_op(this, op);
      return;
    }
  } else {
    // normal case; must be primary
    if (!is_primary()) {
      osd->handle_misdirected_op(this, op);
      return;
    }
  }
...
  // order this op as a write?
  bool write_ordered = op->rwordered();
...
  if (write_ordered) {
    // degraded object?
    if (is_degraded_or_backfilling_object(head)) {
...

I looked at the osd logs from the first case in the description -- the write in question happens to be both rwordered and balanced:

2025-03-27T17:59:24.555+0000 2e200640 10 osd.4 pg_epoch: 44 pg[17.13( v 44'1 (0'0,44'1] local-lis/les=40/42 n=1 ec=40/40 lis/c=40/40 les/c/f=42/43/0 sis=40) [0,4] r=1 lpr=40 pct=0'0 lua=0'0 crt=44'1 lcod 0'0 active mbc={}] do_op osd_op(client.4812.0:90 17.13 17:c90e42af:test-rados-api-smithi162-43481-9::testobj:head [read 0~8 [fadvise_dontneed]] snapc 0=[] ondisk+read+balance_reads+rwordered+known_if_redirected+supports_po
ol_eio e44) may_read -> write-ordered flags ondisk+read+balance_reads+rwordered+known_if_redirected+supports_pool_eio

Existing tests for balanced reads do not combine those two flags. The do_op code above implicitly assumes that any rwordered read must be on the primary. This makes sense as a replica can't actually enforce that ordering because it doesn't handle writes. However, the check doesn't actually check for it. Thus, I see at least two necessary fixes:
1. The first check I highlight above in do_op needs to check for rwordered as well as may_write() and may_cache() so it can bounce rwordered ops to the primary.
2. The objecter must never send an rwordered read to a replica -- even bouncing it to the primary will break the ordering.

Actions #13

Updated by Samuel Just 12 months ago · Edited

Or, given that the ordering won't work if the replica bounces it, it might make more sense for
1. objecter returns EINVAL for any op that combines rwordered and balanced explicitly
2. replica return EINVAL if it gets an op like that
3. objecter does not add the balanced flag due to policy if the op is rwordered

Actions #14

Updated by Kamoltat (Junior) Sirivadhna 11 months ago

/a/skanta-2025-04-09_05:31:19-rados-wip-bharath17-testing-2025-04-08-0602-distro-default-smithi/8233195

Actions #15

Updated by Nitzan Mordechai 11 months ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 62806
Actions #16

Updated by Radoslaw Zarzynski 11 months ago

Need this for Tentacle.

Actions #17

Updated by Laura Flores 11 months ago

/a/lflores-2025-04-11_15:58:32-rados-wip-lflores-testing-2025-04-10-2245-distro-default-smithi/8235791

Actions #18

Updated by Nitzan Mordechai 11 months ago

  • Backport set to Tentacle
Actions #19

Updated by Laura Flores 11 months ago

/a/lflores-2025-04-11_19:10:45-rados-wip-lflores-testing-3-2025-04-11-1140-distro-default-smithi/8236074

Actions #20

Updated by Kamoltat (Junior) Sirivadhna 11 months ago

/a/lflores-2025-04-14_22:30:34-rados-wip-lflores-testing-2025-04-10-2245-distro-default-gibba/8241119

Actions #21

Updated by Laura Flores 11 months ago

Note from bug scrub: Marked as "needs-qa".

Actions #22

Updated by Matan Breizman 11 months ago

  • Copied to Bug #71007: Copy of: balance_reads - FAILED ceph_assert(!get_acting_recovery_backfill().empty()) added
Actions #23

Updated by Sridhar Seshasayee 11 months ago

/a/skanta-2025-04-22_23:21:15-rados-wip-bharath1-testing-2025-04-21-0529-distro-default-smithi/8254491

Actions #24

Updated by Radoslaw Zarzynski 11 months ago

scrub note: new changes; will rereview.

Actions #25

Updated by Shraddha Agrawal 8 months ago

/a/yuriw-2025-07-10_01:00:46-rados-wip-yuri-testing-2025-07-09-1458-tentacle-distro-default-smithi/8379366

Actions #26

Updated by Laura Flores 8 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from Tentacle to tentacle
Actions #27

Updated by Upkeep Bot 8 months ago

  • Copied to Backport #72119: tentacle: balance_reads - FAILED ceph_assert(!get_acting_recovery_backfill().empty()) added
Actions #28

Updated by Upkeep Bot 8 months ago

  • Tags (freeform) set to backport_processed
Actions #29

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to edf8b22bfb676ea5e5e10280f9c40393505ea1c9
  • Fixed In set to v20.3.0-949-gedf8b22bfb
  • Upkeep Timestamp set to 2025-07-14T20:43:36+00:00
Actions #30

Updated by Laura Flores 8 months ago

/a/yuriw-2025-07-10_23:00:33-rados-wip-yuri5-testing-2025-07-10-0913-distro-default-smithi/8380960

Actions #31

Updated by Upkeep Bot 3 months ago

  • Status changed from Pending Backport to Resolved
  • Upkeep Timestamp changed from 2025-07-14T20:43:36+00:00 to 2025-12-12T01:00:26+00:00
Actions

Also available in: Atom PDF