Skip to content

reef: mon, osd, *: expose upmap-primary in OSDMap::get_features() #57794

Merged
ljflores merged 2 commits intoceph:reeffrom
rzarzynski:wip-bug-61948-reef-backport
Jun 5, 2024
Merged

reef: mon, osd, *: expose upmap-primary in OSDMap::get_features() #57794
ljflores merged 2 commits intoceph:reeffrom
rzarzynski:wip-bug-61948-reef-backport

Conversation

@rzarzynski
Copy link
Contributor

This is reef backport of #57776.
Backport tracker: https://tracker.ceph.com/issues/66299.
Parent tracker: https://tracker.ceph.com/issues/61948.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

This is a minimal fix to ensure only peers understanding
`pg-upmap-primary` are able to connect, and thus to exclude
the possibility of running into the `pg_upmap_primaries.empty()`
assertion in encoders.

Fixes for other problems will follow up.

The intention is to ship this patch in the very next minor
release of reef.

Manual testing
--------------

\### start using upmap-primar is presence of `quincy` client
NOTE: incompatible clients aren't disconnected but this is
known and expected as we lack the machinery.

```
[rzarzynski@o06 build]$ bin/ceph osd get-require-min-compat-client
reef
[rzarzynski@o06 build]$ bin/ceph daemon mon.a sessions | jq  -jr '.[] | .name, "\t", .con_features, "\t", .con_features_hex, "\n"' | grep client
client.?        4540701547738038271     3f03cffffffdffff
client.?        4540138320759226367     3f01cfbf7ffdffff
[rzarzynski@o06 build]$ bin/ceph osd pool create test_pool 1 1
pool 'test_pool' created
[rzarzynski@o06 build]$ bin/ceph osd pg-upmap-primary 1.0 2
change primary for pg 1.0 to osd.2
[rzarzynski@o06 build]$ bin/ceph daemon mon.a sessions | jq  -jr '.[] | .name, "\t", .con_features, "\t", .con_features_hex, "\n"' | grep client
client.?        4540701547738038271     3f03cffffffdffff
client.?        4540138320759226367     3f01cfbf7ffdffff
```

\### `main` client is still able to connect
```
[rzarzynski@o06 build]$ bin/ceph -w
  cluster:
    id:     d570a7c-84ca-4fd0-aafb-80138762c6af
    health: HEALTH_WARN
            11 mgr modules have failed dependencies
            1 pool(s) do not have an application enabled

  services:
    mon: 1 daemons, quorum a (age 64m)
    mgr: x(active, since 64m)
    osd: 3 osds: 3 up (since 64m), 3 in (since 64m)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     1 active+clean
```

\### `quincy` client is refused
```
[rzarzynski@o06 build-quincy]$ bin/ceph -s -c /home/rzarzynski/ceph2/build/ceph.conf
2024-05-30T08:59:42.411+0000 7f0911a9b700 -1 --2- 127.0.0.1:0/2812481872 >> [v2:127.0.0.1:40536/0,v1:127.0.0.1:40537/0] conn(0x7f090c111500 0x7f090c1118f0 secure :-1 s=SESSION_CONNECTING pgs=0 cs=0 l=0 rev1=1 crypto rx=0x7f08fc0048c0 tx=0x7f08fc009e30 comp rx=0 tx=0).handle_ident_missing_features client does not support all server features: 80000000
2024-05-30T08:59:42.612+0000 7f0911a9b700  0 --2- 127.0.0.1:0/2812481872 >> [v2:127.0.0.1:40536/0,v1:127.0.0.1:40537/0] conn(0x7f090c111500 0x7f090c1118f0 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).send_auth_request get_initial_auth_request returned -2
```

\### stop using upmap-primary
```
[rzarzynski@o06 build]$ bin/ceph osd rm-pg-upmap-primary 1.0
clear 1.0 pg_upmap_primary mapping
```

\### `quincy` client may connect again
```
[rzarzynski@o06 build-quincy]$ bin/ceph -s -c /home/rzarzynski/ceph2/build/ceph.conf
  cluster:
    id:     d570a7c-84ca-4fd0-aafb-80138762c6af
    health: HEALTH_WARN
            11 mgr modules have failed dependencies
            1 pool(s) do not have an application enabled

  services:
    mon: 1 daemons, quorum a (age 77m)
    mgr: x(active, since 77m)
    osd: 3 osds: 3 up (since 76m), 3 in (since 76m)

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 300 GiB / 303 GiB avail
    pgs:     1 active+clean

```

Fixes: https://tracker.ceph.com/issues/61948
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 5dbb7c4)

Conflicts:
	src/osd/OSDMap.cc -- reef lacks MSR
Unit testing
-------------
```
[rzarzynski@o06 build]$ bin/unittest_features
...
[ RUN      ] features.release_features
1 argonaut features 0x40000 looks like argonaut
2 bobtail features 0x40000 looks like argonaut
3 cuttlefish features 0x40000 looks like argonaut
4 dumpling features 0x42040000 looks like dumpling
5 emperor features 0x42040000 looks like dumpling
6 firefly features 0x20842040000 looks like firefly
7 giant features 0x20842040000 looks like firefly
8 hammer features 0x1020842040000 looks like hammer
9 infernalis features 0x1020842040000 looks like hammer
10 jewel features 0x401020842040000 looks like jewel
11 kraken features 0xc01020842040000 looks like kraken
12 luminous features 0xe01020842240000 looks like luminous
13 mimic features 0xe01020842240000 looks like luminous
14 nautilus features 0xe01020842240000 looks like luminous
15 octopus features 0xe01020842240000 looks like luminous
16 pacific features 0xe01020842240000 looks like luminous
17 quincy features 0xe01020842240000 looks like luminous
18 reef features 0xe010208d2240000 looks like reef
19 squid features 0xe010208d2240000 looks like reef
[       OK ] features.release_features (0 ms)
```

Manual testing
--------------
\### 'quincy` client connected to `main` cluster
There was `ceph -w` from `quincy` running in the background.

```
[rzarzynski@o06 build]$ bin/ceph osd set-require-min-compat-client reef
Error EPERM: cannot set require_min_compat_client to reef: 1 connected client(s) look like luminous (missing 0x80000000); add --yes-i-really-mean-it to do it anyway
```

\### Only `main` clients connected to `main` cluster
```
[rzarzynski@o06 build]$ bin/ceph osd get-require-min-compat-client
luminous
[rzarzynski@o06 build]$ bin/ceph daemon mon.a sessions | jq  -jr '.[] | .name, "\t", .con_features, "\t", .con_features_hex, "\n"' | grep client
client.?        4540701547738038271     3f03cffffffdffff
client.?        4540701547738038271     3f03cffffffdffff
[rzarzynski@o06 build]$ bin/ceph osd set-require-min-compat-client reef
set require_min_compat_client to reef
```

Fixes: https://tracker.ceph.com/issues/61948
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 4d74ff6)
@rzarzynski rzarzynski requested a review from a team as a code owner May 30, 2024 14:36
@rzarzynski rzarzynski requested review from JoshSalomon and removed request for a team May 30, 2024 14:36
@github-actions github-actions bot added this to the reef milestone May 30, 2024
Copy link
Contributor

@idryomov idryomov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I was a bit surprised by the parent tracker getting moved to Pending Backport prior to merge.

I didn't mention this on the main PR since it doesn't strictly apply there, but we need a big release note for reef. Do you want to handle it in this PR or defer to #56932?

@idryomov idryomov modified the milestones: reef, v18.2.3 May 30, 2024
@rzarzynski
Copy link
Contributor Author

rzarzynski commented May 30, 2024

#56932 already needs to be altered. Let's piggy back on it.

Sorry for the surprise. I wanted to get the backport tickets. As this is immediate (with centos8 going EOL), doing QA in parallel.

@rzarzynski
Copy link
Contributor Author

Looks unrelated:

1 tests failed out of 282

Total Test time (real) = 2446.96 sec

The following tests FAILED:
	228 - unittest_rgw_lua (Failed)

@rzarzynski
Copy link
Contributor Author

jenkins test make check

@ljflores
Copy link
Member

ljflores commented Jun 5, 2024

@ljflores ljflores merged commit 39da09c into ceph:reef Jun 5, 2024
@idryomov idryomov mentioned this pull request Jun 5, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants