Project

General

Profile

Actions

Bug #70939

closed

crimson: ceph_assert(interrupt_cond<InterruptCond>.interrupt_cond) in ReplicatedRecoveryBackend::recover_object

Added by Samuel Just 11 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Fixed In:
v20.0.0-1559-g11ce348d9f
Released In:
v20.2.0~533
Upkeep Timestamp:
2025-11-01T01:13:30+00:00

Description

ERROR 2025-04-15 21:38:20,513 [shard 1:main] none - /home/sam/git-checkouts/ceph-workspace/main/src/crimson/common/interruptible_future.h:485 : In function 'auto crimson::interruptible::interruptible_future_detai
l<InterruptCond, seastar::future<T> >::then_interruptible(Func&&) [with Func = ReplicatedRecoveryBackend::recover_object(const hobject_t&, eversion_t)::<lambda()>; InterruptCond = crimson::osd::IOInterruptConditi
on; T = void]', ceph_assert(%s)
interrupt_cond<InterruptCond>.interrupt_cond

Seems to occur at the first call to then_interruptible in

  LOG_PREFIX(ReplicatedRecoveryBackend::recover_object);
  DEBUGDPP("{}, {}", pg, soid, need);
  // always add_recovering(soid) before recover_object(soid)
  assert(is_recovering(soid));
  // start tracking the recovery of soid
  return maybe_pull_missing_obj(
    soid, need
  ).then_interruptible([FNAME, this, soid, need] {
...

called from

PGRecovery::interruptible_future<>
PGRecovery::recover_object_with_throttle(
  const hobject_t &soid,
  eversion_t need)
{
  crimson::osd::scheduler::params_t params =
    {1, 0, crimson::osd::scheduler::scheduler_class_t::background_best_effort};
  auto &ss = pg->get_shard_services();
  logger().debug("{} {}", soid, need);
  return ss.with_throttle(
    std::move(params),
    [this, soid, need] {
    logger().debug("got throttle: {} {}", soid, need);
    auto backend = pg->get_recovery_backend();
    assert(backend);
    return backend->recover_object(soid, need);
  });
}

introduced in 791772f1c032b4ca754d6a67322df6967edfc40e using

  template <typename F>
  auto with_throttle(
    crimson::osd::scheduler::params_t params,
    F &&f) {
    if (!max_in_progress) return f();
    return acquire_throttle(params)
      .then(std::forward<F>(f))
      .finally([this] {
    release_throttle();
      });
  }

f() in the acquire_throttle() path is called without interrupt_cond<InterruptCond>.interrupt_cond populated.

Reproduces with

function start_cluster {
  pkill -9 crimson-osd
  ../src/stop.sh
  MDS=0 MGR=1 OSD=3 MON=1 ../src/vstart.sh --without-dashboard -X --redirect-output --debug -n --no-restart $@
  ./bin/ceph osd pool create rbd 8 8 replicated replicated_rule 2 2 2
  ./bin/ceph osd pool create single 1 1 replicated replicated_rule 2 2 2
}
function start_cluster_test_backfill {
  start_cluster $@
  ./bin/ceph config set osd crimson_osd_scheduler_concurrency 5
  ./bin/ceph config set osd osd_min_pg_log_entries 1
  ./bin/ceph config set osd osd_max_pg_log_entries 2
  ./bin/ceph config set osd osd_pg_log_trim_min 0
  ./bin/ceph_test_rados --max-ops 10000000000 --objects 1000 --max-in-flight 32 --size 40000 --min-stride-size 4000 --max-stride-size 8000 --max-seconds 120 --op read 0 --op write 50 --op delete 50 --op snap_create 50 --pool rbd
  sleep 5
  ./bin/ceph osd out 0
}
Actions #1

Updated by Samuel Just 11 months ago

  • Description updated (diff)
Actions #2

Updated by Samuel Just 11 months ago

  • Description updated (diff)
Actions #3

Updated by Samuel Just 11 months ago

  • Description updated (diff)
Actions #4

Updated by Samuel Just 11 months ago

  • Description updated (diff)
Actions #5

Updated by Samuel Just 11 months ago

I actually don't think this is a bug in interruptible_future, or at least not something we've tried to disallow statically in the past. with_throttle simply can't be agnostic as to whether f() assumes that the calling context has a live interrupt_cond.

Actions #6

Updated by Samuel Just 11 months ago

simpler example:

template <typename F>
auto f(F &&f) {
  return seastar::sleep(
    std::chrono::milliseconds(10)
  ).then([] {
    return seastar::sleep(std::chrono::milliseconds(10));
  }).then(
    std::forward<F>(f)
  ).finally([] {});
}

using interruptor =
  interruptible::interruptor<TestInterruptCondition>;
interruptor::future<> g() {
  return f([] {
    return interruptor::make_interruptible(
      seastar::sleep(std::chrono::milliseconds(10))
    ).then_interruptible([] {
      return interruptor::make_interruptible(
    seastar::sleep(std::chrono::milliseconds(10)));
    });
  });
}

TEST_F(seastar_test_suite_t, implicit_interruptible_conversion)
{
  run_async([] {
    interruptor::with_interruption(
      [] {
    return interruptor::make_interruptible(
      seastar::sleep(std::chrono::milliseconds(10))
    ).then_interruptible([] {
      return g().then_interruptible([] {
        return seastar::now();
      });
    });
      },
      [](auto) {}, false
    ).get();
  });
}
Actions #7

Updated by Samuel Just 11 months ago

  • Pull request ID set to 62837
Actions #8

Updated by Matan Breizman 11 months ago

  • Status changed from In Progress to Resolved
Actions #9

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 11ce348d9f062f78c72b82112d3ad758bf835ed7
  • Fixed In set to v20.0.0-1559-g11ce348d9f0
  • Upkeep Timestamp set to 2025-07-09T17:42:03+00:00
Actions #10

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v20.0.0-1559-g11ce348d9f0 to v20.0.0-1559-g11ce348d9f
  • Upkeep Timestamp changed from 2025-07-09T17:42:03+00:00 to 2025-07-14T17:42:40+00:00
Actions #11

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~533
  • Upkeep Timestamp changed from 2025-07-14T17:42:40+00:00 to 2025-11-01T01:13:30+00:00
Actions

Also available in: Atom PDF