librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove by chenyuanrun · Pull Request #55348 · ceph/ceph

chenyuanrun · 2024-01-29T10:10:06Z

librados::OPERATION_FULL_FORCE should be translated to CEPH_OSD_FLAG_FULL_FORCE before calling IoCtxImpl::remove().

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

chenyuanrun · 2024-01-30T01:43:09Z

@ajarr @gregsfortytwo I think this is a BUG for librados, could you please review this fix? Thanks!

chenyuanrun · 2024-01-30T01:52:35Z

@tchaikov fix commit 435954b

interestingyong · 2024-02-17T04:45:22Z

It seems like only at the delete blocking ABIs, against aio ABIs is ok.
Could help to review this pr?
@xiexingguo @idryomov

idryomov

The change seems correct, but doesn't this issue also affect

int librados::IoCtx::remove(const std::string& oid, int flags)
{
  object_t obj(oid);
  return io_ctx_impl->remove(obj, flags); 
}

Here flags are passed from a public API to a private implementation

int librados::IoCtxImpl::remove(const object_t& oid, int flags)
{
  ::ObjectOperation op;
  prepare_assert_ops(&op);
  op.remove();
  return operate(oid, &op, NULL, flags);
}

without any translation.

It would be nice to:

file a ticket on https://tracker.ceph.com ticket so the fix can be backported
add an integration test to ensure that this doesn't break in the future

chenyuanrun · 2024-02-21T06:37:51Z

@idryomov Thanks! I will file a ticket on https://tracker.ceph.com/ later.

chenyuanrun · 2024-02-22T02:37:16Z

@idryomov librados::IoCtxImpl::remove can not be fixed because it is used by src/tools/rados/rados.cc with the private flags:

ceph/src/tools/rados/rados.cc

Lines 3077 to 3083 in ed736fa

    
           for (const auto& oid : oids) { 
        
             if (forcefull) { 
        
                 ret = detail::remove(io_ctx, oid, (CEPH_OSD_FLAG_FULL_FORCE | 
        
                                      CEPH_OSD_FLAG_FULL_TRY), use_striper); 
        
             } else { 
        
                 ret = detail::remove(io_ctx, oid, use_striper); 
        
             }

If we change it , we break the public API, and the break the apps which rely on the improper behavior of this API.

idryomov · 2024-02-26T11:21:43Z

If we change it , we break the public API, and the break the apps which rely on the improper behavior of this API.

We need to at least document this -- I don't see anything (not even a one-liner comment!) in src/include/rados/librados.hpp or src/librados/librados_cxx.cc. It's very easy to get confused as it is.

Then this is up to @ceph/core, but I'd also suggest deprecating librados::IoCtx::remove(const std::string& oid, int flags). We could add librados::IoCtx::remove2() taking the right flags as a replacement or perhaps just change librados::IoCtx::remove(const std::string& oid) to pass both FULL_TRY and FULL_FORCE as done in src/tools/rados/rados.cc in the snippet that you highlighted in the previous comment. That would effectively make --force-full behavior the default which seems like what @jdurgin had in mind in https://tracker.ceph.com/issues/22413#note-2.

idryomov · 2024-02-26T11:27:36Z

or perhaps just change librados::IoCtx::remove(const std::string& oid) to pass both FULL_TRY and FULL_FORCE as done in src/tools/rados/rados.cc in the snippet that you highlighted in the previous comment. That would effectively make --force-full behavior the default which seems like what @jdurgin had in mind in https://tracker.ceph.com/issues/22413#note-2.

Note from git history spelunking: at the time when #20534 landed, only FULL_FORCE was passed in src/tools/rados/rados.cc. FULL_TRY was added later in #24264.

rzarzynski · 2024-02-26T22:42:13Z

src/librados/IoCtxImpl.cc

  prepare_assert_ops(&op);
  op.remove();
-  return operate(oid, &op, nullptr, librados::OPERATION_FULL_FORCE);
+  return operate(oid, &op, nullptr, CEPH_OSD_FLAG_FULL_FORCE);


Let's take a look on the flow:

int librados::IoCtxImpl::operate(const object_t& oid, ::ObjectOperation *o, ceph::real_time *pmtime, int flags) { // ... Objecter::Op *objecter_op = objecter->prepare_mutate_op( oid, oloc, *o, snapc, ut, flags | extra_op_flags, oncommit, &ver);

// mid-level helpers Op *prepare_mutate_op( const object_t& oid, const object_locator_t& oloc, ObjectOperation& op, const SnapContext& snapc, ceph::real_time mtime, int flags, Context *oncommit, version_t *objver = NULL, osd_reqid_t reqid = osd_reqid_t(), ZTracer::Trace *parent_trace = nullptr) { Op *o = new Op(oid, oloc, std::move(op.ops), flags | global_op_flags | CEPH_OSD_FLAG_WRITE, oncommit, objver, nullptr, parent_trace);

The union with CEPH_OSD_FLAG_WRITE suggests the conceptual type.

All constructors of Op do:

struct Op : public RefCountedObject { // ... op_target_t target; // ... Op(const object_t& o, const object_locator_t& ol, osdc_opvec&& _ops, int f, /* ... */) : target(o, ol, f),

struct op_target_t { int flags = 0; // ... op_target_t(const object_t& oid, const object_locator_t& oloc, int flags) : flags(flags), // ... bool respects_full() const { return (flags & (CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_RWORDERED)) && !(flags & (CEPH_OSD_FLAG_FULL_TRY | CEPH_OSD_FLAG_FULL_FORCE)); }

respects_full() clearly assumes CEPH_OSD_FLAG_FULL_FORCE.

The respects_full() governs the objecter's behavior on traffic pausing:

bool Objecter::target_should_be_paused(op_target_t *t) { const pg_pool_t *pi = osdmap->get_pg_pool(t->base_oloc.pool); bool pauserd = osdmap->test_flag(CEPH_OSDMAP_PAUSERD); bool pausewr = osdmap->test_flag(CEPH_OSDMAP_PAUSEWR) || (t->respects_full() && (_osdmap_full_flag() || _osdmap_pool_full(*pi))); return (t->flags & CEPH_OSD_FLAG_READ && pauserd) || (t->flags & CEPH_OSD_FLAG_WRITE && pausewr) || (osdmap->get_epoch() < epoch_barrier); }

However, the main responsibility of op_target_t::flags is being passed to an OSD via MOSDOp:

Objecter::MOSDOp *Objecter::_prepare_osd_op(Op *op) { // rwlock is locked int flags = op->target.flags; flags |= CEPH_OSD_FLAG_KNOWN_REDIR; flags |= CEPH_OSD_FLAG_SUPPORTSPOOLEIO; // Nothing checks this any longer, but needed for compatibility with // pre-luminous osds flags |= CEPH_OSD_FLAG_ONDISK; if (!honor_pool_full) flags |= CEPH_OSD_FLAG_FULL_FORCE; // ... auto m = new MOSDOp(client_inc, op->tid, hobj, op->target.actual_pgid, osdmap->get_epoch(), flags, op->features);

chenyuanrun · 2024-02-27T01:49:16Z

Note from git history spelunking: at the time when #20534 landed, only FULL_FORCE was passed in src/tools/rados/rados.cc. FULL_TRY was added later in #24264.

@idryomov Should I change CEPH_OSD_FLAG_FULL_FORCE to CEPH_OSD_FLAG_FULL_FORCE|CEPH_OSD_FLAG_FULL_TRY so that this fix could be merged?

idryomov · 2024-02-27T09:53:14Z

That would be my preference, as @jdurgin clearly wanted remove() to behave like that CLI command:

You can get around this by using rados_write_op_operate with the 'LIBRADOS_OPERATION_FULL_FORCE' flag (128), like the rados cli, but I agree this should be the default for plain old remove().

But the current fix makes sense on its own, and one could argue that FULL_FORCE -> FULL_FORCE | FULL_TRY should be a separate change. @rzarzynski would have the final say here.

rzarzynski · 2024-02-27T14:57:17Z

These flags have rich history. FULL_TRY seems to be even stronger than FULL_FORCE:

  // discard due to cluster full transition?  (we discard any op that
  // originates before the cluster or pool is marked full; the client
  // will resend after the full flag is removed or if they expect the
  // op to succeed despite being full).  The except is FULL_FORCE and
  // FULL_TRY ops, which there is no reason to discard because they
  // bypass all full checks anyway.  If this op isn't write or
  // read-ordered, we skip.
  // FIXME: we exclude mds writes for now.
  if (write_ordered && !(m->get_source().is_mds() ||
                         m->has_flag(CEPH_OSD_FLAG_FULL_TRY) ||
                         m->has_flag(CEPH_OSD_FLAG_FULL_FORCE)) &&
      info.history.last_epoch_marked_full > m->get_map_epoch()) {
    dout(10) << __func__ << " discarding op sent before full " << m << " "
             << *m << dendl;
    return;
  }
  // mds should have stopped writing before this point.
  // We can't allow OSD to become non-startable even if mds
  // could be writing as part of file removals.
  if (write_ordered && osd->check_failsafe_full(get_dpp()) &&
      !m->has_flag(CEPH_OSD_FLAG_FULL_TRY)) {
    dout(10) << __func__ << " fail-safe full check failed, dropping request." << dendl;
    return;
  }

The failsafe check has been relaxed in #17177. The idea was to accept operations that are net reducers of usage. However, I'm not sure it can be said with certainty about a deletion of an RADOS object. To exemplify, let's imagine removing 1 byte long obj. Is it a net reducer? Well, on one hand it going to squeeze at least 1 bytes from an ObjectStore (how much exactly is an implementation's detail); on the other it will result in generation of a few dozens of bytes in PGLog and, maybe, trimming other entries.

Pretty complex thing. I would prefer to be cautious when it comes to the failsafe check.

rzarzynski · 2024-02-27T15:24:20Z

If we change it , we break the public API, and the break the apps which rely on the improper behavior of this API.

I agree with @idryomov on the doc and deprecating the private-flags-taking variant of remove(). We should get remove2() doing the proper translation (see translate_flags() in librados/librados_util.cc).

idryomov · 2024-02-27T15:47:35Z

@rzarzynski So I take it that you prefer to see a new remove2() and have plain old remove() fixed to pass the proper flag only for FULL_FORCE and not FULL_TRY? I'm not sure I see much use in that...

It might be better to have plain old remove() not pass any flags then -- this would avoid confusion as passing FULL_FORCE is only a half measure and also wouldn't be a change in behavior compared to today since the public FULL_FORCE that is passed today is clearly bogus.

idryomov · 2024-02-27T15:50:06Z

To exemplify, let's imagine removing 1 byte long obj. Is it a net reducer? Well, on one hand it going to squeeze at least 1 bytes from an ObjectStore (how much exactly is an implementation's detail); on the other it will result in generation of a few dozens of bytes in PGLog and, maybe, trimming other entries.

I thought the way FULL_TRY was supposed to work was that it's always safe to pass and is effectively free, because if the op isn't a net reducer (let's define that as "makes the situation worse for the OSD"), it's failed with ENOSPC error. This is in contrast to the default behavior where the OSD would drop the op and it would hang in the Objecter and therefore also from a user perspective. Am I mistaken?

rzarzynski · 2024-02-27T16:41:56Z

@rzarzynski So I take it that you prefer to see a new remove2() and have plain old remove() fixed to pass the proper flag only for FULL_FORCE and not FULL_TRY? I'm not sure I see much use in that...

Yes, just FULL_FORCE, without FULL_TRY (I know the naming became confusing :-/).

The rationale would be to let removes() squeeze some data when it's still quite safe – under the full condition but stay really cautious (paranoid?) when things had gone so badly that OSD got into failsafe full.

I thought the way FULL_TRY was supposed to work was that it's always safe to pass and is effectively free, because if the op isn't a net reducer (let's define that as "makes the situation worse for the OSD"), it's failed with ENOSPC error.

I think you means the second check which happens later, in prepare_transaction(), and bases on the delta_stats:

  // check for full
  if ((ctx->delta_stats.num_bytes > 0 ||
       ctx->delta_stats.num_objects > 0) &&  // FIXME: keys?
      pool.info.has_flag(pg_pool_t::FLAG_FULL)) {
    auto m = ctx->op->get_req<MOSDOp>();
    if (ctx->reqid.name.is_mds() ||   // FIXME: ignore MDS for now
        m->has_flag(CEPH_OSD_FLAG_FULL_FORCE)) {
      dout(20) << __func__ << " full, but proceeding due to FULL_FORCE or MDS"
               << dendl;
    } else if (m->has_flag(CEPH_OSD_FLAG_FULL_TRY)) {
      // they tried, they failed.
      dout(20) << __func__ << " full, replying to FULL_TRY op" << dendl;
      return pool.info.has_flag(pg_pool_t::FLAG_FULL_QUOTA) ? -EDQUOT : -ENOSPC;
    } else {
      // drop request
      dout(20) << __func__ << " full, dropping request (bad client)" << dendl;
      return -EAGAIN;
    }
  }

I doubt these counters cover size of PGLog entries, internals of an ObjectStore' and so on (UPDATE: actually, I doubt they even cover OMAP). I perceive the failsafe full as a measure to (imperfectly) cover such circumstances and prevent an OSD from becoming unable to boot. I imagine it can be very hard for ObjectStore to be always exact about ENOSPC – I'm thinking here about things like a RocksDB's compaction-on-startup in BlueStore.

chenyuanrun · 2024-03-11T01:36:13Z

@idryomov @rzarzynski So what should I do if I want this fix could be merged?

idryomov

No objections from me to this going in as is.

idryomov · 2024-03-11T09:46:17Z

@idryomov @rzarzynski So what should I do if I want this fix could be merged?

Since this is a bug fix, I think we need a tracker ticket filed for backporting purposes.

chenyuanrun · 2024-03-13T10:37:53Z

@idryomov I have created a issue: https://tracker.ceph.com/issues/64558

idryomov · 2024-03-13T11:24:58Z

@idryomov I have created a issue: https://tracker.ceph.com/issues/64558

Now need to add Fixes: https://tracker.ceph.com/issues/64558 on the line before Signed-off-by in the commit message. See https://github.com/ceph/ceph/blob/main/SubmittingPatches.rst#fixes-lines for details.

librados::OPERATION_FULL_FORCE should be translated to CEPH_OSD_FLAG_FULL_FORCE before calling IoCtxImpl::remove(). Fixes: https://tracker.ceph.com/issues/64558 Signed-off-by: Chen Yuanrun <chen-yuanrun@foxmail.com>

chenyuanrun · 2024-03-14T02:12:45Z

@idryomov I have modified my commit message, could you please review it again? Thanks!

idryomov · 2024-03-14T10:01:40Z

@idryomov I have modified my commit message, could you please review it again? Thanks!

LGTM!

amathuria · 2024-03-21T10:04:12Z

Rados approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrackercephcomissues64974

chenyuanrun requested a review from a team as a code owner January 29, 2024 10:10

github-actions bot added the core label Jan 29, 2024

idryomov reviewed Feb 17, 2024

View reviewed changes

rzarzynski self-requested a review February 26, 2024 18:49

rzarzynski reviewed Feb 26, 2024

View reviewed changes

idryomov approved these changes Mar 11, 2024

View reviewed changes

idryomov added the bug-fix label Mar 11, 2024

rzarzynski approved these changes Mar 13, 2024

View reviewed changes

librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove

a4e2a59

librados::OPERATION_FULL_FORCE should be translated to CEPH_OSD_FLAG_FULL_FORCE before calling IoCtxImpl::remove(). Fixes: https://tracker.ceph.com/issues/64558 Signed-off-by: Chen Yuanrun <chen-yuanrun@foxmail.com>

chenyuanrun force-pushed the fix-io-ctx-impl-flags branch from 1ee6764 to a4e2a59 Compare March 14, 2024 02:10

rzarzynski added the needs-qa label Mar 18, 2024

yuriw added the wip-yuri5-testing label Mar 18, 2024

yuriw merged commit e6d2e6c into ceph:main Mar 21, 2024

This was referenced Aug 17, 2024

reef: librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove #59282

Merged

quincy: librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove #59283

Merged

squid: librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove #59284

Merged

Conversation

chenyuanrun commented Jan 29, 2024

Contribution Guidelines

Checklist

Uh oh!

chenyuanrun commented Jan 30, 2024

Uh oh!

chenyuanrun commented Jan 30, 2024

Uh oh!

interestingyong commented Feb 17, 2024

Uh oh!

idryomov left a comment

Choose a reason for hiding this comment

Uh oh!

chenyuanrun commented Feb 21, 2024

Uh oh!

chenyuanrun commented Feb 22, 2024

Uh oh!

idryomov commented Feb 26, 2024

Uh oh!

idryomov commented Feb 26, 2024

Uh oh!

rzarzynski Feb 26, 2024

Choose a reason for hiding this comment

Uh oh!

rzarzynski Feb 27, 2024

Choose a reason for hiding this comment

Uh oh!

chenyuanrun commented Feb 27, 2024

Uh oh!

idryomov commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzarzynski commented Feb 27, 2024

Uh oh!

rzarzynski commented Feb 27, 2024

Uh oh!

idryomov commented Feb 27, 2024

Uh oh!

idryomov commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rzarzynski commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chenyuanrun commented Mar 11, 2024

Uh oh!

idryomov left a comment

Choose a reason for hiding this comment

Uh oh!

idryomov commented Mar 11, 2024

Uh oh!

chenyuanrun commented Mar 13, 2024

Uh oh!

idryomov commented Mar 13, 2024

Uh oh!

chenyuanrun commented Mar 14, 2024

Uh oh!

idryomov commented Mar 14, 2024

Uh oh!

amathuria commented Mar 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

idryomov commented Feb 27, 2024 •

edited

Loading

idryomov commented Feb 27, 2024 •

edited

Loading

rzarzynski commented Feb 27, 2024 •

edited

Loading