mon: validate also mons and osds on {rm-,}pg-upmap-primary#57843
mon: validate also mons and osds on {rm-,}pg-upmap-primary#57843
Conversation
Fixes: https://tracker.ceph.com/issues/66329 Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
|
Note for @ljflores: Need to review this and go over manual testing with @rzarzynski |
Report from manual testingOn Jul 10th me and @ljflores confirmed the patch takes effect. In a vanilla, quincy-based After upgrading leader monitorAfter upgrading all monitors |
idryomov
left a comment
There was a problem hiding this comment.
Makes sense to me! I think this is the last of the three "shouldn't be hard to fix" items that I highlighted in https://tracker.ceph.com/issues/66260#note-2.
Are we tracking the "make the balancer turn the feature bit on as soon as it's enabled even if no balancing is required at that very moment (i.e. separately from |
|
Hi Ilya (@idryomov),
My stance is it's already tracked by: https://tracker.ceph.com/issues/66260. This problem goes far deeper with both new & old balancers being merely instances of it. This isn't specific to upmaps. Strictly speaking, operators should be cautious even when recreating / removing pools or crush rules. For instance, when last MSR rule is deleted, older clients become allowed to connect even if it's expected another MSR rule will be added shortly. To keep requiring the feature bit, operators should always leave at least one rule, probably even a dummy. Yet, this isn't specific to MSR. The behavior dates back to the prehistory: uint64_t OSDMap::get_features(int entity_type, uint64_t *pmask) const
{
// ...
if (crush->has_v4_buckets())
features |= CEPH_FEATURE_CRUSH_V4;
// ...
if (!pg_upmap.empty() || !pg_upmap_items.empty() || !pg_upmap_primaries.empty())
features |= CEPH_FEATUREMASK_OSDMAP_PG_UPMAP;bool CrushWrapper::has_v4_buckets() const
{
for (int i=0; i<crush->max_buckets; ++i) {
crush_bucket *b = crush->buckets[i];
if (!b)
continue;
if (b->alg == CRUSH_BUCKET_STRAW2)
return true;
}
return false;
}If the initial intention behind Having dedicated pieces of information in the cluster state to track the intention towards every feature seems overkill. Daemons are also not a concern – entities operators lack full control over are clients. Therefore compression to client release looks like useful simplification which fits
void OSD::check_osdmap_features()
{
// adjust required feature bits?
// we have to be a bit careful here, because we are accessing the
// Policy structures without taking any lock. in particular, only
// modify integer values that can safely be read by a racing CPU.
// since we are only accessing existing Policy structures a their
// current memory location, and setting or clearing bits in integer
// fields, and we are the only writer, this is not a problem.
const auto osdmap = get_osdmap();
{
Messenger::Policy p = client_messenger->get_default_policy();
uint64_t mask;
uint64_t features = osdmap->get_features(entity_name_t::TYPE_CLIENT, &mask);
if ((p.features_required & mask) != features) {
dout(0) << "crush map has features " << features
<< ", adjusting msgr requires for clients" << dendl;
p.features_required = (p.features_required & ~mask) | features;
client_messenger->set_default_policy(p);
}
}
// ...
}Then multiply this by all the client-talking daemons... Finally, enforcing the policy fits Regards, |
|
Hey @rzarzynski, this failure I found on teuthology might be related: I found a coredump on one of the teuthology tests: /a/yuriw-2024-07-23_19:38:12-rados-wip-yuri5-testing-2024-07-23-0804-distro-default-smithi/7814604/remote/smithi151/coredump And the coredump file is available in teuthology. |
|
The above is not related ^ |
Fixes: https://tracker.ceph.com/issues/66329
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e