Bug #66260
openmon, osd, *: require-min-compat-client is not really honored
0%
Description
Problem description¶
/**
* gets the required minimum *client* version that can connect to the cluster.
*/
ceph_release_t get_require_min_compat_client() const;
maincluster,ceph -wfrommain+ceph -wfromreef[rzarzynski@o06 build]$ bin/ceph daemon mon.a sessions | jq -jr '.[] | .name, "\t", .con_features, "\t", .con_features_hex, "\n"' | grep client *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** client.? 4540701547738038271 3f03cffffffdffff client.? 4540701547738038271 3f03cffffffdffff client.? 4540138322906710015 3f01cfbffffdffff
- enforcing
>= squidclients[rzarzynski@o06 build]$ bin/ceph osd set-require-min-compat-client squid *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2024-05-24T13:05:03.122+0000 7f540283f700 -1 WARNING: all dangerous and experimental features are enabled. 2024-05-24T13:05:03.131+0000 7f540283f700 -1 WARNING: all dangerous and experimental features are enabled. set require_min_compat_client to squid [rzarzynski@o06 build]$ echo $? 0 [rzarzynski@o06 build]$ bin/ceph osd get-require-min-compat-client *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2024-05-24T13:05:26.780+0000 7f06401e8700 -1 WARNING: all dangerous and experimental features are enabled. 2024-05-24T13:05:26.787+0000 7f06401e8700 -1 WARNING: all dangerous and experimental features are enabled. squid
- after the enforcement
The reef'sceph -wis still perfectly able to connect to the squid cluster :-/[rzarzynski@o06 build]$ bin/ceph daemon mon.a sessions | jq -jr '.[] | .name, "\t", .con_features, "\t", .con_features_hex, "\n"' | grep client *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** client.? 4540701547738038271 3f03cffffffdffff client.? 4540701547738038271 3f03cffffffdffff client.? 4540138322906710015 3f01cfbffffdffff
RCA¶
OSDMap::require_min_compat_client from the very beginning (2017; see https://github.com/ceph/ceph/pull/14959) lacks the junction point with features_required of messenger (please see e.g. OSD::check_osdmap_features()).
Impact¶
We don't really have the machinery to let operators control which client version can connect to a cluster.
This shortcoming becomes more painful as the pg-upmap-primary feature is built upon the assumption require_min_compat_client truly enforces the version – turning the management commands on for primary-upmap-primary solely depends on it.
Other features might be affected as well. However, this is just an addition to the primal absence of configurable client version control.
Updated by Radoslaw Zarzynski almost 2 years ago
- Severity changed from 3 - minor to 2 - major
Updated by Ilya Dryomov almost 2 years ago · Edited
Radoslaw Zarzynski wrote:
This shortcoming becomes more painful as the
pg-upmap-primaryfeature is built upon the assumptionrequire_min_compat_clienttruly enforces the version – turning the management commands on forprimary-upmap-primarysolely depends on it.
This assumption is wrong, but it shouldn't be hard to fix the read balancer.
"ceph osd set-require-min-compat-client <release>" is working as intended -- it records the fact that the operator is OK with a set features corresponding to <release> potentially getting enabled in the future. At the time it runs, it checks whether any clients that don't support any of those features are connected but this check can be overridden with --yes-i-really-mean-it.
The enforcement (that can't be overridden) is supposed to come in when a particular feature from that set is actually enabled.
Other features might be affected as well. However, this is just an addition to the primal absence of configurable client version control.
This is on purpose because the kernel client doesn't really have a concept of a Ceph version/release and needs the ability to "cherry pick" feature bits. Our recent tendency to cram everything under a single feature bit corresponding to a Ceph release somewhat muddied the waters, but require-min-compat-client predates that.
I'm pasting a slightly redacted email that I wrote on this topic last month in response to another report of someone hitting ceph_assert(pg_upmap_primaries.empty()) in src/osd/OSDMap.cc:
[...] but the authors of the read balancer seem to have gotten confused
by what "ceph osd set-require-min-compat-client <release>" actually does.
Looking at https://docs.ceph.com/en/latest/rados/operations/read-balancer/
that you linked, it says:
To allow use of the new feature on an existing cluster, you must
restrict the cluster to supporting only Reef (and newer) clients. To
do so, run the following command:$ ceph osd set-require-min-compat-client reef
This command will fail if any pre-Reef clients or daemons are
connected to the monitors.
First, I'm pretty sure that the last sentence isn't true. It appears
that the intent was to piggyback on SERVER_REEF feature bit, but
ceph_release_features() wasn't updated accordingly. Since
ceph_release_features() never adds SERVER_REEF feature bit to the
returned bitmask, "ceph osd set-require-min-compat-client reef" command
would succeed even if there are older clients connected at the time the
command is run.
Second, "ceph osd set-require-min-compat-client <release>" doesn't
enable any features and doesn't mark any cluster-wide feature bits as
required for connecting, so no restriction on which clients can connect
is put in place at that time. The only thing that "ceph osd
set-require-min-compat-client <release>" does is record the fact that
the operator is OK with older clients potentially not being able to
connect in the future. The restriction is supposed to be put in place
only when (one of) the feature(s) covered by <release> is enabled and
starts getting used. In the case of the read balancer, I would expect
this to happen after one of "ceph balancer mode read" or "ceph balancer
mode upmap-read" commands is run since only these modes need it:
Both read and upmap-read mode make use of pg-upmap-primary. In order
to use pg-upmap-primary, the cluster cannot have any pre-Reef
clients.
So the underlying issue is that SERVER_REEF is never added to the
bitmask of required feature bits in that case. I suspect a big part of
it is that the new pg_upmap_primaries field is lumped together with the
older pg_upmap and pg_upmap_items fields in OSDMap::get_features():
if (!pg_upmap.empty() || !pg_upmap_items.empty() || !pg_upmap_primaries.empty())
features |= CEPH_FEATUREMASK_OSDMAP_PG_UPMAP;
mask |= CEPH_FEATUREMASK_OSDMAP_PG_UPMAP;
Here, non-empty pg_upmap_primaries map should probably lead to
SERVER_REEF being added instead. And this TODO:
//TODO: Should I add feature and test for upmap-primary?
err = check_cluster_features(min_feature, ss);
in "ceph osd pg-upmap-primary" command handler has a lot to do with it too.
So far I have spotted three different places which need fixing, but there might be more.
Updated by Ilya Dryomov almost 2 years ago
- Related to Bug #61948: Failed assert "pg_upmap_primaries.empty()" in the read balancer added
Updated by Radoslaw Zarzynski almost 2 years ago · Edited
I disagree. I think there are two levers for controlling which clients can connect to a cluster:
OSDMap::get_features()with per-feature granularity,set-require-min-compat-clientwith per-release granularity.
"ceph osd set-require-min-compat-client <release>" is working as intended -- it records the fact that the operator is OK with a set features corresponding to <release> potentially getting enabled in the future.
The documentation of the method:
/** * gets the required minimum *client* version that can connect to the cluster. */ ceph_release_t get_require_min_compat_client() const;
clearly says "(...) that can connect to the cluster". It has been introduced in 2018 by https://github.com/ceph/ceph/pull/20080. Let me quote the PR's description:
librbd would like to automatically enable the new clone v2 feature (by default) if all clients are known to be mimic or later. This new API method avoids the need to send a mon command to dump the OSD map and parse the JSON for the client compat settings.
Please note the absence of the potentially and in the future wording.
The other commit of the same PR has exposed the things over the librados with the documentation:
/**
* Gets the minimum compatible client version
*
* @param cluster cluster handle
* @param[out] min_compat_client minimum compatible client version
* based upon the current features
* @param[out] require_min_compat_client required minimum client version
* based upon explicit setting
* @returns 0 on sucess, negative error code on failure
*/
CEPH_RADOS_API int rados_get_min_compatible_client(rados_t cluster,
int8_t* min_compat_client,
int8_t* require_min_compat_client);
I don't think this leaves a lot of space for discussion what the intention is.
With this being said, I agree that reglueing the pg-upmap-primary to the per-feature enforcement machinery would be pretty simple, quicker and better. I'm already testing https://github.com/ceph/ceph/commit/220a42862c079a4622e994deab77c9e5bfa8c648 and also a bigger patch for required_min_compat_client I have some worries about.
Updated by Ilya Dryomov almost 2 years ago · Edited
Radoslaw Zarzynski wrote in #note-4:
The documentation of the method:
[...]
clearly says "(...) that can connect to the cluster". It has been introduced in 2018 by https://github.com/ceph/ceph/pull/20080. Let me quote the PR's description:
librbd would like to automatically enable the new clone v2 feature (by default) if all clients are known to be mimic or later. This new API method avoids the need to send a mon command to dump the OSD map and parse the JSON for the client compat settings.
Please note the absence of the potentially and in the future wording.
Hi Radek,
https://github.com/ceph/ceph/pull/20080 simply added a getter for a pre-existing osdmap field. As such, that description, while it may have contributed to the confusion, isn't relevant for determining how "ceph osd set-require-min-compat-client <release>" was actually intended to work. However, since that PR pertains to RBD, I can fill in the blanks ;)
The only thing librbd does based on the value returned by get_require_min_compat_client() is select the default on-disk format for clone images created with an undecorated "rbd clone" command. If require_min_compat_client is set to mimic to later, we default to a v2 clone, otherwise a v1 clone is created. The clone format can also be specified explicitly for a each clone image -- "rbd clone --rbd-default-clone-format 2 parentimg@snap cloneimg" will happily create a v2 clone irrespective of what require_min_compat_client is set to. Why? Because require_min_compat_client is NOT where enforcement is supposed to happen.
The enforcement occurs on a per clone image basis based on "features" and "op_features" fields stored in the image header. A pre-mimic client would not be allowed to perform certain maintenance operations on a v2 clone, but would be allowed to open it and do I/O on it as usual -- this is by design because we want to be compatible as far back as possible. If require_min_compat_client worked the way you envision, pre-mimic clients wouldn't be allowed to even connect to the cluster, let alone open any RBD images. I hope it's obvious now that librbd would have never tied v2 clones (or anything else, really) to require_min_compat_client if that was the case.
We brought require_min_compat_client into the picture in RBD just to help with the adoption of v2 clones. Since "ceph osd set-require-min-compat-client mimic" is basically a nod from an operator that they are OK with pre-mimic clients potentially getting disrupted in the future, librbd goes ahead and bumps the default from v1 to v2. No clients, old or new, are affected at that point.
I don't think this leaves a lot of space for discussion what the intention is.
require_min_compat_client field was added in https://github.com/ceph/ceph/pull/14959 and Sage's commit message is a lot clearer:
This is a persistent record of the compatibility for old clients that
we will enforce. Once set, the mon will eventually prevent newer features
of the OSDMap or CRUSH maps from being used if they violate this
requirement.
This was done in support of pg-upmap and CRUSH chooseargs and also replaced mon_osd_allow_primary_temp and mon_osd_allow_primary_affinity options (note "allow" in the names). The intent is very clear -- prevent newer features from being enabled/used, not make older clients unable to connect to the cluster right on the spot. Taking pg-upmap as an example:
- "ceph osd set-require-min-compat-client luminous" does nothing except check whether older clients that don't support OSDMAP_PG_UPMAP and other luminous feature bits are connected (but that check can be overridden and many users indeed had to override it in the past because for a period of time the kernel client supported OSDMAP_PG_UPMAP but not one of the other luminous feature bits)
- before any "ceph osd pg-upmap ..." command is executed, require_min_compat_client is checked and if it's not luminous or newer, the command is denied (and that can NOT be overridden)
- as a result of executing "ceph osd pg-upmap ..." command, OSDMAP_PG_UPMAP feature bit may get set and only then older clients are disrupted
So no, require_min_compat_client was never intended to be a lever for controlling which clients can connect to a cluster.
Updated by Ilya Dryomov almost 2 years ago
Ilya Dryomov wrote in #note-5:
So no, require_min_compat_client was never intended to be a lever for controlling which clients can connect to a cluster.
Rather, it's intended to be a lever for controlling which features can be enabled on a cluster.
Updated by Laura Flores almost 2 years ago
- Related to Bug #66274: Online balancing with upmap-read mode performed without setting "min_compat_client" to reef added
Updated by Laura Flores almost 2 years ago · Edited
Notes from CDM per Ilya:
- require-min-compat-client is implemented how it should be
- nevertheless, documentation is confusing in some places and should be updated
- we should also raise a ticket for the "lack of client eviction" issue
- a way to address the balancer module is to add a developer mon command that can enable a particular feature bit upon setting the relevant balancer mode, as opposed to waiting for the balancer to kick in.
Updated by Laura Flores almost 2 years ago
Note from bug scrub: Laura and Radek will sync on this to decide what to do for squid.
Updated by Radoslaw Zarzynski over 1 year ago
- Priority changed from Immediate to Urgent
Lowering the priority as https://github.com/ceph/ceph/pull/57776 addressed the most immediate needs. Still, we're lacking a few moving parts the enforcing of require-min-compat-client would nicely address.
Updated by Radoslaw Zarzynski over 1 year ago
Updated by Laura Flores over 1 year ago
- Status changed from New to Pending Backport
- Backport set to squid,reef
Updated by Laura Flores over 1 year ago
- Status changed from Pending Backport to New
- Backport changed from squid,reef to squid,reef,quincy
Updated by Laura Flores over 1 year ago
@Radoslaw Zarzynski I changed the backport field to quincy- can you verify that this is what we want?
Updated by Radoslaw Zarzynski over 1 year ago
We talked about the problems coming from this ticket (see https://github.com/ceph/ceph/pull/57843#issuecomment-2250969594) during the Core CDS session. The conclusion was that using just release names is very, very imperfect -- in the past it would have caused delay in adoption of features by the kernel clients; the machinery should offer per-bit granularity.
https://pad.ceph.com/p/rados-and-crimson-idea-cephalocon2024
Updated by Konstantin Shalygin about 1 year ago
- Backport changed from squid,reef,quincy to squid,reef