Project

General

Profile

Actions

Bug #64611

closed

Inconsistent usage of the return codes in the MDS code base

Added by Leonid Usov about 2 years ago. Updated 2 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Backport:
squid,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Client, Common/Protocol, MDS
Labels (FS):
task(easy), task(intern)
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-7699-g5b16367c7b
Released In:
v20.2.0~1081
Upkeep Timestamp:
2025-11-01T00:58:05+00:00

Description

Ceph cluster may comprise of daemons running on different platforms with incompatible numeric values of the errno definitions. To maintain coherency, the approach taken by Ceph is that daemons must exchange messages where the error codes on the wire are always encoded using a common error namespace. Historically, this namespace was chosen to be the Linux errno list, and it's often seen in the source code as a set of CEPHFS_xxx error code defines.

When writing a daemon, it's reasonable to always rely on the native platform errno definitions, as interacting with the underlying OS will inevitably respond with native codes. The way Ceph helps developers is by defining a set of conversion routines that map native error codes to the linux error codes, or in other words, the CEPHFS_* namespace, and back. These functions are

ceph_to_hostos_errno
hostos_to_ceph_errno

Furthermore, Ceph provides a custom type errorcode32_t which calls the corresponding conversion functions during encode and decode:

  void encode(ceph::buffer::list &bl) const {
    using ceph::encode;
    __s32 newcode = hostos_to_ceph_errno(code);
    encode(newcode, bl);
  }
  void decode(ceph::buffer::list::const_iterator &bl) {
    using ceph::decode;
    decode(code, bl);
    code = ceph_to_hostos_errno(code);
  }

This convenient type allows for consistent interop between daemons running on different host platforms without an explicit errno conversion required from the daemon's code.

The MCommandReply class already makes use of the errorcode32_t for the result code, meaning that when constructing the response the daemon must provide its native error code and not the one from the CEPHFS_* namespace.

However, MClientReply doesn't do this automatically. The good news is that the message follows the requirement of having the linux error code on the wire, but it's only doing the conversion to the host system in the getter:


  int get_result() const {
    #ifdef _WIN32
    // libclient and libcephfs return CEPHFS_E* errors, which are basically
    // Linux errno codes. If we convert mds errors to host errno values, we
    // end up mixing error codes.
    //
    // For Windows, we'll preserve the original error value, which is expected
    // to be a linux (CEPHFS_E*) error. It may be worth doing the same for
    // other platforms.
    return head.result;
    #else
    return ceph_to_hostos_errno((__s32)(__u32)head.result);
    #endif
  }

  void set_result(int r) { head.result = r; }

The code above is suboptimal and someone unaware of the internals will be surprised to get a different code than what's set. This unfortunate imbalance has spread over the MDS code base and currently, all places that construct an MClientReply have to provide a CEPHFS_* error code.

While it is handled correctly in most cases, this inconsistency makes it impossible to write a generic internal MDS handler that produces a result code usable both in MClientReply and MCommandReply.

We should change MClientReply to use the errorcode32_t just like MCommandReply does, and update all places in the MDS to use native error codes.


Related issues 2 (1 open1 closed)

Copied to CephFS - Backport #70071: squid: Inconsistent usage of the return codes in the MDS code baseResolvedIgor GolikovActions
Copied to CephFS - Backport #70072: reef: Inconsistent usage of the return codes in the MDS code baseQA TestingIgor GolikovActions
Actions #1

Updated by Patrick Donnelly almost 2 years ago

  • Category set to Correctness/Safety
  • Priority changed from Normal to Urgent
  • Target version set to v20.0.0
  • Source set to Development
  • Backport set to squid,reef
  • Component(FS) Client, Common/Protocol added
Actions #2

Updated by Venky Shankar over 1 year ago

  • Assignee set to Igor Golikov

@Igor Golikov - please take this one.

Actions #3

Updated by Igor Golikov over 1 year ago ยท Edited

It looks like MDSDaemon.cc uses CEPHFS_* error codes to initialize the MCommandReply:

...
if (!session->auth_caps.allow_all()) {
    dout(1) << __func__
      << ": received command from client without `tell` capability: " 
      << *m->get_connection()->peer_addrs << dendl;

    ss << "permission denied";
    r = -CEPHFS_EACCES;
  } else if (m->cmd.empty()) {
    r = -CEPHFS_EINVAL;
    ss << "no command given";
  } else if (!TOPNSPC::common::cmdmap_from_json(m->cmd, &cmdmap, ss)) {
    r = -CEPHFS_EINVAL;
  } else {
    cct->get_admin_socket()->queue_tell_command(m);
    return;
  }

  auto reply = make_message<MCommandReply>(r, ss.str());
...

Therefore, the usage of errorcode32_t in the MCommandReply doesn't change anything, since it's encode_payload method converts host error code to ceph error code, as shown above. But if the value is ALREADY set to be CEPHFS_, the effect is not clear.

Either the example of MCommandReply as a template for changing MClientReply is incorrect, or, the idea was to change ALL error codes in MDS to native (including for MCommandReply), and then it make sense to use errorcode32_t.

The receiver will get CEPHFS_* code from the wire, and will convert it to CEPHFS_* code (the get_result function described above), and it doesn't matter HOW the value on the wire was set, by explicitly using CEPH_FS_* in the handler code or by converting from the native value, while encoding.

Moreover, the whole MDS codebase uses CEPH_FS code, not only for the return values in the messages but for the internal function calls statuses:

void MDLog::_recovery_thread(MDSContext *completion)
{
  ceph_assert(journaler == NULL);
  if (g_conf()->mds_journal_format > JOURNAL_FORMAT_MAX) {
      dout(0) << "Configuration value for mds_journal_format is out of bounds, max is " 
              << JOURNAL_FORMAT_MAX << dendl;

      // Oh dear, something unreadable in the store for this rank: require
      // operator intervention.
      mds->damaged_unlocked();
      ceph_abort();  // damaged should not return
  }

  // First, read the pointer object.
  // If the pointer object is not present, then create it with
  // front = default ino and back = null
  JournalPointer jp(mds->get_nodeid(), mds->get_metadata_pool());
  const int read_result = jp.load(mds->objecter);

  // here
  *if (read_result == -CEPHFS_ENOENT) {*
  //

    inodeno_t const default_log_ino = MDS_INO_LOG_OFFSET + mds->get_nodeid();
    jp.front = default_log_ino;
    int write_result = jp.save(mds->objecter);
    if (write_result < 0) {
      std::lock_guard l(mds->mds_lock);
      if (mds->is_daemon_stopping()) {
        return;
      }
      mds->damaged();
      ceph_abort();  // damaged should never return
    }

If the idea was to COMPLETELY replace CEPHFS_* to host errno, it make sese. Otherwisze, I don't see any advantage of using one type of errors for the "internal" cases and another type for messages.

Actions #4

Updated by Igor Golikov over 1 year ago

  • Status changed from New to In Progress
Actions #5

Updated by Patrick Donnelly over 1 year ago

Igor Golikov wrote in #note-3:

It looks like MDSDaemon.cc uses CEPHFS_* error codes to initialize the MCommandReply:
[...]

Therefore, the usage of errorcode32_t in the MCommandReply doesn't change anything, since it's encode_payload method converts host error code to ceph error code, as shown above. But if the value is ALREADY set to be CEPHFS_, the effect is not clear.

Either the example of MCommandReply as a template for changing MClientReply is incorrect, or, the idea was to change ALL error codes in MDS to native (including for MCommandReply), and then it make sense to use errorcode32_t.

Yes, that's what this ticket is about. The CEPHFS_ errnos is an incorrect fix (see blame history). Unfortunately, the magic of errorcode32_t wasn't understood until Leonid ran into issues with wrong error codes on Mac.

The receiver will get CEPHFS_* code from the wire, and will convert it to CEPHFS_* code (the get_result function described above), and it doesn't matter HOW the value on the wire was set, by explicitly using CEPH_FS_* in the handler code or by converting from the native value, while encoding.

Moreover, the whole MDS codebase uses CEPH_FS code, not only for the return values in the messages but for the internal function calls statuses:

[...]

If the idea was to COMPLETELY replace CEPHFS_* to host errno, it make sese. Otherwisze, I don't see any advantage of using one type of errors for the "internal" cases and another type for messages.

Yes, let's use host errnos uniformly on client and MDS. Document heavily in MClientReply and MCommandReply that there is a conversion.

Actions #6

Updated by Igor Golikov over 1 year ago

Thanks for the clarification, @Patrick Donnelly

Actions #7

Updated by Igor Golikov over 1 year ago

  • Status changed from In Progress to Fix Under Review
Actions #8

Updated by Igor Golikov over 1 year ago

  • Pull request ID set to 60286
Actions #9

Updated by Igor Golikov about 1 year ago

  • Status changed from Fix Under Review to Resolved
Actions #10

Updated by Venky Shankar about 1 year ago

  • Status changed from Resolved to Pending Backport

@Igor Golikov - this changes required backporting.

Note: The backport bot will auto create the backport trackers (and assign them to you :))

Actions #11

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #70071: squid: Inconsistent usage of the return codes in the MDS code base added
Actions #12

Updated by Upkeep Bot about 1 year ago

  • Copied to Backport #70072: reef: Inconsistent usage of the return codes in the MDS code base added
Actions #13

Updated by Upkeep Bot about 1 year ago

  • Tags (freeform) set to backport_processed
Actions #14

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to 5b16367c7bd1782398bd08a1c239ec03bcfd0794
  • Fixed In set to v19.3.0-7699-g5b16367c7bd
  • Upkeep Timestamp set to 2025-07-09T14:05:11+00:00
Actions #15

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-7699-g5b16367c7bd to v19.3.0-7699-g5b16367c7b
  • Upkeep Timestamp changed from 2025-07-09T14:05:11+00:00 to 2025-07-14T17:41:16+00:00
Actions #16

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~1081
  • Upkeep Timestamp changed from 2025-07-14T17:41:16+00:00 to 2025-11-01T00:58:05+00:00
Actions #17

Updated by Igor Golikov 2 months ago

  • Status changed from Pending Backport to Resolved
Actions

Also available in: Atom PDF