librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove#55348
librados: use CEPH_OSD_FLAG_FULL_FORCE for IoCtxImpl::remove#55348
Conversation
|
@ajarr @gregsfortytwo I think this is a BUG for librados, could you please review this fix? Thanks! |
|
It seems like only at the delete blocking ABIs, against aio ABIs is ok. |
idryomov
left a comment
There was a problem hiding this comment.
The change seems correct, but doesn't this issue also affect
int librados::IoCtx::remove(const std::string& oid, int flags)
{
object_t obj(oid);
return io_ctx_impl->remove(obj, flags);
}
Here flags are passed from a public API to a private implementation
int librados::IoCtxImpl::remove(const object_t& oid, int flags)
{
::ObjectOperation op;
prepare_assert_ops(&op);
op.remove();
return operate(oid, &op, NULL, flags);
}
without any translation.
It would be nice to:
- file a ticket on https://tracker.ceph.com ticket so the fix can be backported
- add an integration test to ensure that this doesn't break in the future
|
@idryomov Thanks! I will file a ticket on https://tracker.ceph.com/ later. |
We need to at least document this -- I don't see anything (not even a one-liner comment!) in Then this is up to @ceph/core, but I'd also suggest deprecating |
Note from git history spelunking: at the time when #20534 landed, only |
| prepare_assert_ops(&op); | ||
| op.remove(); | ||
| return operate(oid, &op, nullptr, librados::OPERATION_FULL_FORCE); | ||
| return operate(oid, &op, nullptr, CEPH_OSD_FLAG_FULL_FORCE); |
There was a problem hiding this comment.
Let's take a look on the flow:
int librados::IoCtxImpl::operate(const object_t& oid, ::ObjectOperation *o,
ceph::real_time *pmtime, int flags)
{
// ...
Objecter::Op *objecter_op = objecter->prepare_mutate_op(
oid, oloc,
*o, snapc, ut,
flags | extra_op_flags,
oncommit, &ver); // mid-level helpers
Op *prepare_mutate_op(
const object_t& oid, const object_locator_t& oloc,
ObjectOperation& op, const SnapContext& snapc,
ceph::real_time mtime, int flags,
Context *oncommit, version_t *objver = NULL,
osd_reqid_t reqid = osd_reqid_t(),
ZTracer::Trace *parent_trace = nullptr) {
Op *o = new Op(oid, oloc, std::move(op.ops), flags | global_op_flags |
CEPH_OSD_FLAG_WRITE, oncommit, objver,
nullptr, parent_trace);The union with CEPH_OSD_FLAG_WRITE suggests the conceptual type.
All constructors of Op do:
struct Op : public RefCountedObject {
// ...
op_target_t target;
// ...
Op(const object_t& o, const object_locator_t& ol, osdc_opvec&& _ops,
int f, /* ... */) :
target(o, ol, f), struct op_target_t {
int flags = 0;
// ...
op_target_t(const object_t& oid, const object_locator_t& oloc, int flags)
: flags(flags),
// ...
bool respects_full() const {
return
(flags & (CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_RWORDERED)) &&
!(flags & (CEPH_OSD_FLAG_FULL_TRY | CEPH_OSD_FLAG_FULL_FORCE));
}respects_full() clearly assumes CEPH_OSD_FLAG_FULL_FORCE.
There was a problem hiding this comment.
The respects_full() governs the objecter's behavior on traffic pausing:
bool Objecter::target_should_be_paused(op_target_t *t)
{
const pg_pool_t *pi = osdmap->get_pg_pool(t->base_oloc.pool);
bool pauserd = osdmap->test_flag(CEPH_OSDMAP_PAUSERD);
bool pausewr = osdmap->test_flag(CEPH_OSDMAP_PAUSEWR) ||
(t->respects_full() && (_osdmap_full_flag() || _osdmap_pool_full(*pi)));
return (t->flags & CEPH_OSD_FLAG_READ && pauserd) ||
(t->flags & CEPH_OSD_FLAG_WRITE && pausewr) ||
(osdmap->get_epoch() < epoch_barrier);
}However, the main responsibility of op_target_t::flags is being passed to an OSD via MOSDOp:
Objecter::MOSDOp *Objecter::_prepare_osd_op(Op *op)
{
// rwlock is locked
int flags = op->target.flags;
flags |= CEPH_OSD_FLAG_KNOWN_REDIR;
flags |= CEPH_OSD_FLAG_SUPPORTSPOOLEIO;
// Nothing checks this any longer, but needed for compatibility with
// pre-luminous osds
flags |= CEPH_OSD_FLAG_ONDISK;
if (!honor_pool_full)
flags |= CEPH_OSD_FLAG_FULL_FORCE;
// ...
auto m = new MOSDOp(client_inc, op->tid,
hobj, op->target.actual_pgid,
osdmap->get_epoch(),
flags, op->features);|
That would be my preference, as @jdurgin clearly wanted
But the current fix makes sense on its own, and one could argue that |
|
These flags have rich history. // discard due to cluster full transition? (we discard any op that
// originates before the cluster or pool is marked full; the client
// will resend after the full flag is removed or if they expect the
// op to succeed despite being full). The except is FULL_FORCE and
// FULL_TRY ops, which there is no reason to discard because they
// bypass all full checks anyway. If this op isn't write or
// read-ordered, we skip.
// FIXME: we exclude mds writes for now.
if (write_ordered && !(m->get_source().is_mds() ||
m->has_flag(CEPH_OSD_FLAG_FULL_TRY) ||
m->has_flag(CEPH_OSD_FLAG_FULL_FORCE)) &&
info.history.last_epoch_marked_full > m->get_map_epoch()) {
dout(10) << __func__ << " discarding op sent before full " << m << " "
<< *m << dendl;
return;
}
// mds should have stopped writing before this point.
// We can't allow OSD to become non-startable even if mds
// could be writing as part of file removals.
if (write_ordered && osd->check_failsafe_full(get_dpp()) &&
!m->has_flag(CEPH_OSD_FLAG_FULL_TRY)) {
dout(10) << __func__ << " fail-safe full check failed, dropping request." << dendl;
return;
}The failsafe check has been relaxed in #17177. The idea was to accept operations that are net reducers of usage. However, I'm not sure it can be said with certainty about a deletion of an RADOS object. To exemplify, let's imagine removing 1 byte long obj. Is it a net reducer? Well, on one hand it going to squeeze at least 1 bytes from an Pretty complex thing. I would prefer to be cautious when it comes to the failsafe check. |
I agree with @idryomov on the doc and deprecating the private-flags-taking variant of |
|
@rzarzynski So I take it that you prefer to see a new It might be better to have plain old |
I thought the way |
Yes, just The rationale would be to let
I think you means the second check which happens later, in // check for full
if ((ctx->delta_stats.num_bytes > 0 ||
ctx->delta_stats.num_objects > 0) && // FIXME: keys?
pool.info.has_flag(pg_pool_t::FLAG_FULL)) {
auto m = ctx->op->get_req<MOSDOp>();
if (ctx->reqid.name.is_mds() || // FIXME: ignore MDS for now
m->has_flag(CEPH_OSD_FLAG_FULL_FORCE)) {
dout(20) << __func__ << " full, but proceeding due to FULL_FORCE or MDS"
<< dendl;
} else if (m->has_flag(CEPH_OSD_FLAG_FULL_TRY)) {
// they tried, they failed.
dout(20) << __func__ << " full, replying to FULL_TRY op" << dendl;
return pool.info.has_flag(pg_pool_t::FLAG_FULL_QUOTA) ? -EDQUOT : -ENOSPC;
} else {
// drop request
dout(20) << __func__ << " full, dropping request (bad client)" << dendl;
return -EAGAIN;
}
}I doubt these counters cover size of |
|
@idryomov @rzarzynski So what should I do if I want this fix could be merged? |
idryomov
left a comment
There was a problem hiding this comment.
No objections from me to this going in as is.
Since this is a bug fix, I think we need a tracker ticket filed for backporting purposes. |
|
@idryomov I have created a issue: https://tracker.ceph.com/issues/64558 |
Now need to add |
librados::OPERATION_FULL_FORCE should be translated to CEPH_OSD_FLAG_FULL_FORCE before calling IoCtxImpl::remove(). Fixes: https://tracker.ceph.com/issues/64558 Signed-off-by: Chen Yuanrun <chen-yuanrun@foxmail.com>
1ee6764 to
a4e2a59
Compare
|
@idryomov I have modified my commit message, could you please review it again? Thanks! |
LGTM! |
librados::OPERATION_FULL_FORCE should be translated to CEPH_OSD_FLAG_FULL_FORCE before calling IoCtxImpl::remove().
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e