Skip to content

mon: validate also mons and osds on {rm-,}pg-upmap-primary#57843

Merged
ljflores merged 1 commit intoceph:mainfrom
rzarzynski:wip-bug-66329
Jul 31, 2024
Merged

mon: validate also mons and osds on {rm-,}pg-upmap-primary#57843
ljflores merged 1 commit intoceph:mainfrom
rzarzynski:wip-bug-66329

Conversation

@rzarzynski
Copy link
Contributor

@rzarzynski rzarzynski commented Jun 3, 2024

Fixes: https://tracker.ceph.com/issues/66329

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Fixes: https://tracker.ceph.com/issues/66329
Signed-off-by: Radosław Zarzyński <rzarzyns@redhat.com>
@ljflores ljflores self-requested a review June 17, 2024 17:40
@ljflores
Copy link
Member

Note for @ljflores: Need to review this and go over manual testing with @rzarzynski

@rzarzynski
Copy link
Contributor Author

Report from manual testing

On Jul 10th me and @ljflores confirmed the patch takes effect. In a vanilla, quincy-based vstart.sh cluster we initially upgraded one monitor and followed with other two while ensuring bin/ceph osd pg-upmap-primary 1.0 0 fails in a presence of quincy monitor or OSD.

After upgrading leader monitor

[rzarzynski@o06 build]$ bin/ceph osd pg-upmap-primary 1.0 0                                                                                                                                                       
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-07-10T16:03:18.266+0000 7f14d0389700 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
2024-07-10T16:03:18.274+0000 7f14d0389700 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
Error ENOTSUP: features 144115190491774976 unsupported by: the monitor cluster, osd.0, osd.1, osd.2

After upgrading all monitors

[rzarzynski@o06 build]$ ps waxu | grep ceph-mon                                                                                                                                                                   
rzarzyn+ 3304291  0.3  0.0 354736 68084 ?        Ssl  14:38   0:18 /home/rzarzynski/ceph3/build/bin/ceph-mon -i b -c /home/rzarzynski/ceph3/build/ceph.conf                                                       
rzarzyn+ 3304332  0.3  0.0 352680 65352 ?        Ssl  14:38   0:18 /home/rzarzynski/ceph3/build/bin/ceph-mon -i c -c /home/rzarzynski/ceph3/build/ceph.conf                                                       
rzarzyn+ 3309147  0.5  0.0 387792 59848 ?        SLsl 15:58   0:01 /home/rzarzynski/ceph1/build/bin/ceph-mon -i a -c /tmp/ceph-main.conf                                                                          
rzarzyn+ 3309653  0.0  0.0   9220  1104 pts/6    S+   16:05   0:00 grep --color=auto ceph-mon
[rzarzynski@o06 build]$ kill 3304332
[rzarzynski@o06 build]$ LD_LIBRARY_PATH="/home/rzarzynski/ceph1/build/lib" /home/rzarzynski/ceph1/build/bin/ceph-mon -i c -c /tmp/ceph-main.conf                                                                  
2024-07-10T16:05:40.863+0000 7f22de032cc0 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
2024-07-10T16:05:40.863+0000 7f22de032cc0 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
2024-07-10T16:05:40.864+0000 7f22de032cc0 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
[rzarzynski@o06 build]$ kill 3304291
[rzarzynski@o06 build]$ LD_LIBRARY_PATH="/home/rzarzynski/ceph1/build/lib" /home/rzarzynski/ceph1/build/bin/ceph-mon -i b -c /tmp/ceph-main.conf                                                                  
2024-07-10T16:05:55.328+0000 7f0b03b6ccc0 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
2024-07-10T16:05:55.328+0000 7f0b03b6ccc0 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
2024-07-10T16:05:55.329+0000 7f0b03b6ccc0 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
[rzarzynski@o06 build]$ ps waxu | grep ceph-mon                                                                                                                                                                   
rzarzyn+ 3309147  0.5  0.0 387792 61160 ?        SLsl 15:58   0:02 /home/rzarzynski/ceph1/build/bin/ceph-mon -i a -c /tmp/ceph-main.conf                                                                          
rzarzyn+ 3309656  0.6  0.0 370260 43632 ?        SLsl 16:05   0:00 /home/rzarzynski/ceph1/build/bin/ceph-mon -i c -c /tmp/ceph-main.conf                                                                          
rzarzyn+ 3309700  2.0  0.0 370240 41868 ?        Ssl  16:05   0:00 /home/rzarzynski/ceph1/build/bin/ceph-mon -i b -c /tmp/ceph-main.conf
[rzarzynski@o06 build]$ bin/ceph osd pg-upmap-primary 1.0 0                                                                                                                                                       
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-07-10T16:06:15.922+0000 7f70f2501700 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
2024-07-10T16:06:15.929+0000 7f70f2501700 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
Error ENOTSUP: features 144115190491774976 unsupported by: osd.0, osd.1, osd.2
[rzarzynski@o06 build]$ bin/ceph osd get-require-min-compat-client                                                                                                                                                
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
2024-07-10T16:06:28.447+0000 7f964ccb8700 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
2024-07-10T16:06:28.452+0000 7f964ccb8700 -1 WARNING: all dangerous and experimental features are enabled.                                                                                                        
reef

@rzarzynski rzarzynski marked this pull request as ready for review July 13, 2024 16:11
@rzarzynski rzarzynski requested a review from a team as a code owner July 13, 2024 16:11
Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! I think this is the last of the three "shouldn't be hard to fix" items that I highlighted in https://tracker.ceph.com/issues/66260#note-2.

@idryomov
Copy link
Contributor

I think this is the last of the three "shouldn't be hard to fix" items that I highlighted in https://tracker.ceph.com/issues/66260#note-2.

Are we tracking the "make the balancer turn the feature bit on as soon as it's enabled even if no balancing is required at that very moment (i.e. separately from ceph osd pg-upmap-primary command)" item that came up in the CDM?

@rzarzynski
Copy link
Contributor Author

rzarzynski commented Jul 25, 2024

Hi Ilya (@idryomov),

Are we tracking the "make the balancer turn the feature bit on as soon as it's enabled even if no balancing is required at that very moment (...)?

My stance is it's already tracked by: https://tracker.ceph.com/issues/66260.

This problem goes far deeper with both new & old balancers being merely instances of it.
Imagine an operator who uses the osdmaptool instead. The logic is the same: remove old and add new upmaps (only if necessary) potentially leading to OSDMap::get_features() ceasing the features bits from being required while the operator's intention hasn't changed at all (the no-need-to-rebalance case).

This isn't specific to upmaps. Strictly speaking, operators should be cautious even when recreating / removing pools or crush rules. For instance, when last MSR rule is deleted, older clients become allowed to connect even if it's expected another MSR rule will be added shortly. To keep requiring the feature bit, operators should always leave at least one rule, probably even a dummy.

Yet, this isn't specific to MSR. The behavior dates back to the prehistory:

uint64_t OSDMap::get_features(int entity_type, uint64_t *pmask) const
{   
  // ...
  if (crush->has_v4_buckets())
    features |= CEPH_FEATURE_CRUSH_V4;

  // ...
  if (!pg_upmap.empty() || !pg_upmap_items.empty() || !pg_upmap_primaries.empty())
    features |= CEPH_FEATUREMASK_OSDMAP_PG_UPMAP;
bool CrushWrapper::has_v4_buckets() const
{
  for (int i=0; i<crush->max_buckets; ++i) {
    crush_bucket *b = crush->buckets[i];
    if (!b)
      continue;
    if (b->alg == CRUSH_BUCKET_STRAW2)
      return true;
  }
  return false;
}

If the initial intention behind ceph osd set-require-min-compat-client had been just to audit the clients, the design of feature management would have lacked a measure for operators to express the intention to use which can depart from using at a moment to a great extent.

Having dedicated pieces of information in the cluster state to track the intention towards every feature seems overkill. Daemons are also not a concern – entities operators lack full control over are clients. Therefore compression to client release looks like useful simplification which fits set-require-min-compat-client and would rectify another aspect that makes the mechanism, in its current shape, dysfunctional: lack of eviction of connected-but-incompliant clients.

set-require-min-compat-client ensures there is no mon session established with an incompliant client. The problem is it does the check only at the exact moment of calling the command which, again, can be far away from actually using the feature. If it had the enforcement in place, there wouldn't been the need for eviction. Unfortunately, there is potentially long gap giving incompliant clients plenty of time to connect. To address this through eviction, the junction point with messenger would need to be far more complex than it currently is:

void OSD::check_osdmap_features()
{
  // adjust required feature bits?

  // we have to be a bit careful here, because we are accessing the
  // Policy structures without taking any lock.  in particular, only
  // modify integer values that can safely be read by a racing CPU.
  // since we are only accessing existing Policy structures a their
  // current memory location, and setting or clearing bits in integer
  // fields, and we are the only writer, this is not a problem.

  const auto osdmap = get_osdmap();
  {
    Messenger::Policy p = client_messenger->get_default_policy();
    uint64_t mask;
    uint64_t features = osdmap->get_features(entity_name_t::TYPE_CLIENT, &mask);
    if ((p.features_required & mask) != features) {
      dout(0) << "crush map has features " << features
              << ", adjusting msgr requires for clients" << dendl;
      p.features_required = (p.features_required & ~mask) | features;
      client_messenger->set_default_policy(p);
    }
  }
  // ...
}

Then multiply this by all the client-talking daemons...

Finally, enforcing the policy fits set-require-min-compat-client so appropriately and intuitively that I'm entirely not surprised this is exactly how its underlying machinery has been documented ("minimum client version that can connect to the cluster"). The simplicity of implementation (gluing SERVER_<RELEASE> bits to OSDMap::get_features()) and saving a lot of complexity only contributes to the impression those deficiencies should have been addressed there.

Regards,
Radek

@ljflores
Copy link
Member

Hey @rzarzynski, this failure I found on teuthology might be related:

I found a coredump on one of the teuthology tests:

/a/yuriw-2024-07-23_19:38:12-rados-wip-yuri5-testing-2024-07-23-0804-distro-default-smithi/7814604/remote/smithi151/coredump
The last action that was performed in the test was getting the osdmap:

2024-07-24T01:14:02.849 INFO:tasks.workunit.client.0.smithi151.stderr:+ tmp_map=/tmp/tmp.9hvNwfqxL7
2024-07-24T01:14:02.856 INFO:tasks.workunit.client.0.smithi151.stderr:+ ceph osd getmap 216 -o /tmp/tmp.9hvNwfqxL7
...
c009d810 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=1).operator() setting up a delay queue
2024-07-24T01:14:03.992 INFO:tasks.workunit.client.0.smithi151.stderr:+ return 1

And the coredump file is available in teuthology.

@ljflores
Copy link
Member

The above is not related ^

@ljflores
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants