Skip to content

mon, osd, *: expose upmap-primary in OSDMap::get_features()#57776

Merged
ljflores merged 1 commit intoceph:mainfrom
rzarzynski:wip-bug-61948
Jul 24, 2024
Merged

mon, osd, *: expose upmap-primary in OSDMap::get_features()#57776
ljflores merged 1 commit intoceph:mainfrom
rzarzynski:wip-bug-61948

Conversation

@rzarzynski
Copy link
Contributor

@rzarzynski rzarzynski commented May 29, 2024

This is a minimal fix to ensure only peers understanding pg-upmap-primary are able to connect, and thus to exclude the possibility of running into the pg_upmap_primaries.empty() assertion in encoders.

Fixes for other problems will follow up.

The intention is to ship this patch in the very next minor release of reef.

Fixes: https://tracker.ceph.com/issues/61948

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@rzarzynski rzarzynski requested a review from JoshSalomon May 29, 2024 19:35
@rzarzynski rzarzynski requested a review from a team as a code owner May 29, 2024 19:35
@github-actions github-actions bot added the core label May 29, 2024
@rzarzynski rzarzynski requested review from idryomov and ljflores May 29, 2024 19:35
Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As automated test coverage is clearly lacking, it would be nice if the list of test cases that were executed manually on this PR was posted for the record.

@rzarzynski
Copy link
Contributor Author

  • Added a new commit that introduces CEPH_FEATUREMASK_SERVER_REEF to ceph_release_features().
  • Only changed the commit description of the original commit to convey the manual testing steps. No other changes.

@JoshSalomon
Copy link
Contributor

lgtm

Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only changed the commit description of the original commit to convey the manual testing steps.

I expected to see it in a comment, but since you added it to the commit messages to be preserved in git, some nits (feel free to ignore!):

  • *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** and 2024-05-30T08:56:41.464+0000 7feb3d44a700 -1 WARNING: all dangerous and experimental features are enabled. lines are unnecessary clutter
  • Error EINVAL: osd.1 is already primary for pg 1.0 output doesn't have a corresponding command (probably a leftover from removing a command with a typo?)
  • in the second commit message, it would be nice to include ceph daemon mon.a sessions output with the quincy client or mention that it's a continuation of the test from the previous commit with ceph -w from quincy still running

@ljflores
Copy link
Member

ljflores commented Jun 4, 2024

@rzarzynski can you stage a Reef backport too? We cherry-picked it on the reef-release, but we should also merge this to Reef (unless I'm mistaken)

@idryomov
Copy link
Contributor

idryomov commented Jun 5, 2024

@rzarzynski can you stage a Reef backport too? We cherry-picked it on the reef-release, but we should also merge this to Reef (unless I'm mistaken)

@ljflores Did you see #57794?

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

This is a minimal fix to ensure only peers understanding
`pg-upmap-primary` are able to connect, and thus to exclude
the possibility of running into the `pg_upmap_primaries.empty()`
assertion in encoders.

Fixes for other problems will follow up.

The intention is to ship this patch in the very next minor
release of reef.

Manual testing
--------------

\### start using upmap-primar is presence of `quincy` client
NOTE: incompatible clients aren't disconnected but this is
known and expected as we lack the machinery.

```
[rzarzynski@o06 build]$ bin/ceph osd get-require-min-compat-client
reef
[rzarzynski@o06 build]$ bin/ceph daemon mon.a sessions | jq  -jr '.[] | .name, "\t", .con_features, "\t", .con_features_hex, "\n"' | grep client
client.?        4540701547738038271     3f03cffffffdffff
client.?        4540138320759226367     3f01cfbf7ffdffff
[rzarzynski@o06 build]$ bin/ceph osd pool create test_pool 1 1
pool 'test_pool' created
[rzarzynski@o06 build]$ bin/ceph osd pg-upmap-primary 1.0 2
change primary for pg 1.0 to osd.2
[rzarzynski@o06 build]$ bin/ceph daemon mon.a sessions | jq  -jr '.[] | .name, "\t", .con_features, "\t", .con_features_hex, "\n"' | grep client
client.?        4540701547738038271     3f03cffffffdffff
client.?        4540138320759226367     3f01cfbf7ffdffff
```

\### `main` client is still able to connect
```
[rzarzynski@o06 build]$ bin/ceph -w
  cluster:
    id:     d570a7c-84ca-4fd0-aafb-80138762c6af
    health: HEALTH_WARN
            11 mgr modules have failed dependencies
            1 pool(s) do not have an application enabled

  services:
    mon: 1 daemons, quorum a (age 64m)
    mgr: x(active, since 64m)
    osd: 3 osds: 3 up (since 64m), 3 in (since 64m)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     1 active+clean
```

\### `quincy` client is refused
```
[rzarzynski@o06 build-quincy]$ bin/ceph -s -c /home/rzarzynski/ceph2/build/ceph.conf
2024-05-30T08:59:42.411+0000 7f0911a9b700 -1 --2- 127.0.0.1:0/2812481872 >> [v2:127.0.0.1:40536/0,v1:127.0.0.1:40537/0] conn(0x7f090c111500 0x7f090c1118f0 secure :-1 s=SESSION_CONNECTING pgs=0 cs=0 l=0 rev1=1 crypto rx=0x7f08fc0048c0 tx=0x7f08fc009e30 comp rx=0 tx=0).handle_ident_missing_features client does not support all server features: 80000000
2024-05-30T08:59:42.612+0000 7f0911a9b700  0 --2- 127.0.0.1:0/2812481872 >> [v2:127.0.0.1:40536/0,v1:127.0.0.1:40537/0] conn(0x7f090c111500 0x7f090c1118f0 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).send_auth_request get_initial_auth_request returned -2
```

\### stop using upmap-primary
```
[rzarzynski@o06 build]$ bin/ceph osd rm-pg-upmap-primary 1.0
clear 1.0 pg_upmap_primary mapping
```

\### `quincy` client may connect again
```
[rzarzynski@o06 build-quincy]$ bin/ceph -s -c /home/rzarzynski/ceph2/build/ceph.conf
  cluster:
    id:     d570a7c-84ca-4fd0-aafb-80138762c6af
    health: HEALTH_WARN
            11 mgr modules have failed dependencies
            1 pool(s) do not have an application enabled

  services:
    mon: 1 daemons, quorum a (age 77m)
    mgr: x(active, since 77m)
    osd: 3 osds: 3 up (since 76m), 3 in (since 76m)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     1 active+clean

```

Fixes: https://tracker.ceph.com/issues/61948
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
@ljflores
Copy link
Member

jenkins test make check

@ljflores
Copy link
Member

jenkins test api

@ljflores
Copy link
Member

jenkins test make check

@ljflores
Copy link
Member

@ljflores ljflores added the needs-squid-backport PR needs a squid backport label Jul 23, 2024
@yuriw
Copy link
Contributor

yuriw commented Jul 23, 2024

@rzarzynski @ljflores pls merge when all checks passed
ref: https://tracker.ceph.com/issues/66706

@ljflores
Copy link
Member

jenkins test make check

@ljflores ljflores merged commit baaf7c8 into ceph:main Jul 24, 2024
NitzanMordhai pushed a commit to NitzanMordhai/ceph that referenced this pull request Aug 1, 2024
mon, osd, *: expose upmap-primary in OSDMap::get_features()
sp98 added a commit to sp98/ocs-operator that referenced this pull request Aug 13, 2024
upmap-read mode of the balancer isn't supported by the kernel client.
There was a recent change on the RADOS side which actually enforced this,
see ceph/ceph#57776. Prior to 18.2.4 and 19.1.1 all sorts of incompatibility
checks were missing in this area. So we need to revert upmap-read mode for now.

Signed-off-by: sp98 <sapillai@redhat.com>
sp98 added a commit to sp98/ocs-operator that referenced this pull request Aug 13, 2024
upmap-read mode of the balancer isn't supported by the kernel client.
There was a recent change on the RADOS side which actually enforced this
in ceph/ceph#57776.Prior to 18.2.4 and 19.1.1 all sorts of incompatibility
checks were missing in this area. So we need to revert upmap-read mode for now.

Signed-off-by: sp98 <sapillai@redhat.com>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ocs-operator that referenced this pull request Aug 13, 2024
upmap-read mode of the balancer isn't supported by the kernel client.
There was a recent change on the RADOS side which actually enforced this
in ceph/ceph#57776.Prior to 18.2.4 and 19.1.1 all sorts of incompatibility
checks were missing in this area. So we need to revert upmap-read mode for now.

Signed-off-by: sp98 <sapillai@redhat.com>
ShravaniVangur pushed a commit to ShravaniVangur/ocs-operator that referenced this pull request Aug 16, 2024
upmap-read mode of the balancer isn't supported by the kernel client.
There was a recent change on the RADOS side which actually enforced this
in ceph/ceph#57776.Prior to 18.2.4 and 19.1.1 all sorts of incompatibility
checks were missing in this area. So we need to revert upmap-read mode for now.

Signed-off-by: sp98 <sapillai@redhat.com>
rewantsoni pushed a commit to rewantsoni/ocs-operator that referenced this pull request Sep 4, 2024
upmap-read mode of the balancer isn't supported by the kernel client.
There was a recent change on the RADOS side which actually enforced this
in ceph/ceph#57776.Prior to 18.2.4 and 19.1.1 all sorts of incompatibility
checks were missing in this area. So we need to revert upmap-read mode for now.

Signed-off-by: sp98 <sapillai@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants