Epoch filtering by leonidc · Pull Request #60871 · ceph/ceph

leonidc · 2024-11-28T05:10:59Z

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

src/mon/NVMeofGwMap.h

src/nvmeof/NVMeofGwMonitorClient.cc

src/mon/NVMeofGwSerialize.h

src/mon/NVMeofGwTypes.h

src/mon/NVMeofGwMon.cc

github-actions · 2024-12-19T15:43:27Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

src/mon/NVMeofGwMap.cc

src/mon/NVMeofGwMon.cc

src/mon/NVMeofGwMap.h

athanatos

I think the upgrade logic looks right. I'll do another review once the comments and commit structure are cleaned up.

src/mon/NVMeofGwMap.h

src/mon/NVMeofGwMon.cc

src/nvmeof/NVMeofGwMonitorClient.cc

src/nvmeof/NVMeofGwMonitorClient.h

src/test/test_nvmeof_mon_encoding.cc

src/nvmeof/NVMeofGwMonitorClient.cc

see comments in gw_epoch for details Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>

Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>

… feature Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>

since they are crashed, plan to fix this in separate commit Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>

athanatos · 2025-02-16T19:57:22Z

Drop the first commit as it is now a noop
Push the branch to ceph-ci and post the branch name here
Assuming the build goes through, I'll schedule a rados run about 8 hours from now
I'll approve it provided there aren't any problems with the rados run tomorrow or monday, depending on queue

athanatos · 2025-02-16T19:59:31Z

https://github.com/ceph/ceph-ci/pull/new/wip-leonidc-epoch-filter--centos9-only

athanatos · 2025-02-17T07:14:06Z

Can't run rados suite with only cento9, rebuilding as wip-leonidc-epoch-filter, will schedule test once built.

athanatos · 2025-02-17T19:56:32Z

Repushed wip-leonidc-epoch-filter with this pr and a revert for #61089 which should hopefully fix the ubuntu build failure with xmlsec.

athanatos · 2025-02-18T15:28:46Z

https://shaman.ceph.com/builds/ceph/wip-leonidc-epoch-filter-rebuild/a37eaf45c15a8d4050f6b050c2d73bec4c389335/

athanatos · 2025-02-18T15:38:47Z

http://pulpito.ceph.com/sjust-2025-02-18_15:29:41-rados-wip-leonidc-epoch-filter-rebuild-distro-default-smithi/

caroav · 2025-02-19T16:27:04Z

@athanatos I see tests are still pending. Is it expected to wait so much time?

athanatos · 2025-02-19T17:19:54Z

@caroav Infrastructure has had problems.

ronen-fr · 2025-02-24T15:35:39Z

@athanatos I see tests are still pending. Is it expected to wait so much time?
@caroav - based on my own experience, all Teu jobs that were stuck during to the problems last week - are doomed.
I'll try to determine the specific details used by Sam to schedule those runs, and resend.

ronen-fr · 2025-02-24T16:07:32Z

Rerun some tests (all in RADOS suite). See results at https://pulpito.ceph.com/?branch=wip-leonidc-epoch-filter-rebuild

ronen-fr · 2025-02-24T16:18:15Z

src/mon/NVMeofGwMon.cc

 }

+epoch_t NVMeofGwMon::get_ack_map_epoch(bool gw_created,
+    const NvmeGroupKey& group_key) {


(not a request to change this PR, as it's urgent - and as Sam did OK it; but:)
early return is almost always easier to read than a chain of 'if else' with logic.
I.e. - this could have been:

epoch_t ..... { if (!gw_created) { return 0; } const auto group_epoch = map.gw_epoch.find(group_key) ; if (group_epoch == map.gw_epoch.end()) { // feature... return map.epoch; } return *group_epoch; }

or similar

ronen-fr · 2025-02-24T16:26:16Z

src/test/test_nvmeof_mon_encoding.cc

-  dout(0)   << "\n == Test GW Delete ==" << dendl;
-  pending_map.cfg_delete_gw("GW1" ,group_key);
-  dout(0) << "deleted GW1 " << pending_map << dendl;
+  //dout(0)   << "\n == Test GW Delete ==" << dendl;


assuming these would be removed before merging?

No, @leonidc wants them commented out as described in the commit message.

ronen-fr · 2025-02-25T05:06:19Z

@athanatos - only 4 tests of the 506 are still running. All 24 current failures are unrelated.
This can be merged. You are still marked down as "requiring changes". Can this be reset and the PR merged?

athanatos

Lgtm if @ronen-fr is happy with the test run!

Epoch filtering Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com> Reviewed-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 3cdf529)

- gateway submodule Fixes: https://tracker.ceph.com/issues/64777 This PR adds high availability support for the nvmeof Ceph service. High availability means that even in the case that a certain GW is down, there will be another available path for the initiator to be able to continue the IO through another GW. High availability is achieved by running nvmeof service consisting of at least 2 nvmeof GWs in the Ceph cluster. Every GW will be seen by the host (initiator) as a separate path to the nvme namespaces (volumes). The implementation consists of the following main modules: - NVMeofGWMon - a PaxosService. It is a monitor that tracks the status of the nvmeof running services, and take actions in case that services fail, and in case services restored. - NVMeofGwMonitorClient – It is an agent that is running as a part of each nvmeof GW. It is sending beacons to the monitor to signal that the GW is alive. As a part of the beacon, the client also sends information about the service. This information is used by the monitor to take decisions and perform some operations. - MNVMeofGwBeacon – It is a structure used by the client and the monitor to send/recv the beacons. - MNVMeofGwMap – The map is tracking the nvmeof GWs status. It also defines what should be the new role of every GW. So in the events of GWs go down or GWs restored, the map will reflect the new role of each GW resulted by these events. The map is distributed to the NVMeofGwMonitorClient on each GW, and it knows to update the GW with the required changes. It is also adding 3 new mon commands: - nvme-gw create - nvme-gw delete - nvme-gw show The commands are used by the ceph adm to update the monitor that a new GW is deployed. The monitor will update the map accordingly and will start tracking this GW until it is deleted. Signed-off-by: Leonid Chernin <lechernin@gmail.com> Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit 5843c6b) mon: add NVMe-oF gateway monitor and HA doc Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit bb75dde) mgr/cephadm: ceph nvmeof monitor support Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit 2946b19) mon/NVMeofGwMap.cc: tabbing, line length, formatting - Retabs file to match emacs/vim modelines at top - Fixes bracing - Adjusts line length to 80 char Signed-off-by: Samuel Just <sjust@redhat.com> (cherry picked from commit 8bf309e) mon/NVMeofGwMap.h: tabbing, line length, formatting - Adjust method signatures to better match mon/ - Adjust line length to 80 characthers Signed-off-by: Samuel Just <sjust@redhat.com> (cherry picked from commit 58d16c7) mon/NVMeofGwMon.h: tabbing, line length, formatting Signed-off-by: Samuel Just <sjust@redhat.com> (cherry picked from commit 1f470f0) mon/NVMeofGwMon.cc: tabbing, line length, formatting - Retabs file to match emacs/vim modelines at top - Fixes bracing - Adjusts line length to 80 char Signed-off-by: Samuel Just <sjust@redhat.com> (cherry picked from commit bff9dd4) mon/NVMeofGwTypes.h: tabbing, bracing, line length fixes Signed-off-by: Samuel Just <sjust@redhat.com> (cherry picked from commit e0f0469) mon/NVMeofGwSerialize.h: tabbing, bracing, line length fixes Signed-off-by: Samuel Just <sjust@redhat.com> (cherry picked from commit d5e013f) mgr/orchestrator: require "group" field for nvmeof specs Signed-off-by: Adam King <adking@redhat.com> (cherry picked from commit f6d552d) mgr/cephadm: migrate nvmeof specs without group field As we have added the group field as a requirement for new nvmeof specs and check for it in spec validation, we need a migration to populate this field for specs we find that don't have it. Signed-off-by: Adam King <adking@redhat.com> (cherry picked from commit d7b00ea) mgr/cephadm: make nvme-gw adds be able to handle multiple services/groups Before this was grabbing the service spec for the first daemon description in the list. This meant every daemon would be added with the pool/group of whatever that spec happened to specify. This patch grabs the spec, and therefore also the pool/group individually for each nvmeof daemon Signed-off-by: Adam King <adking@redhat.com> (cherry picked from commit 2a6b105) qa/cephadm: add group param when applying nvmeof Since it will now be required Signed-off-by: Adam King <adking@redhat.com> (cherry picked from commit 41c5dbe) include/ceph_features: remove stray available marker Should have been removed in caa9e7a. Signed-off-by: Samuel Just <sjust@redhat.com> include/ceph_features: add NVMEOFHA feature bit Normally, we'd just use the SERVER_SQUID or SERVER_T flags instead of using an extra feature bit. However, the nvmeof ha monitor paxos service has had a more complex development journey. There are users interested in using the nvmeof ha feature in squid, but it didn't make the cutoff for backporting it. There's an upstream nvmeof-squid branch in the ceph.git repository with the patches backported for anyone interested in building it. However, that means that users of our normal stable releases will see the feature added to the monitor one release after anyone who chooses to use the nvmeof-squid branch. We could disallow upgrades from nvmeof-squid to T, but by adding a feature bit here we make such a restriction unnecessary. Signed-off-by: Samuel Just <sjust@redhat.com> mon/NVMeofGw*: support upgrades from prior out-of-tree nvmeofha implementation (nvmeof-reef) This commit adds upgrade support for users running an experimental nvmeofha implementation which can be found in the nvmeof-reef branch in ceph.git. Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> mon/NVMeofGw*: fixing bugs - handle gw fast-reboot, proper handle of gw delete scenarios Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> nvmeof/NVMeofGwMonitorClient: use a separate mutex for beacons Add beacon_lock to mitigate potential beacon delays caused by slow message handling, particularly in handle_nvmeof_gw_map. Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit 0dc4185) cephadm: mount nvmeof certs into container ceph@2946b19 incorrectly removed this line and since then these certs are not being properly mounted into the container. This commit adds the line back Signed-off-by: Adam King <adking@redhat.com> (cherry picked from commit 8cc3a35) qa/suites/rbd/nvmeof: add multi-subsystem setup and thrash test 1. qa/tasks/nvmeof.py: 1.1. create multiple rbd images for all subsystems 1.2. add NvmeofThrasher and ThrashTest 2. qa/tasks/mon_thrash.py: add 'switch_thrashers' option 3. nvmeof_setup_subsystem.sh: create multiple subsystems and enable HA 4. Restructure qa/suites/rbd/nvmeof: Create two sub-suites - "basic" (nvmeof_initiator job) - "thrash" (new: nvmeof_mon_thrash and nvmeof_thrash jobs) Resolves: rhbz#2302243 Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit d0c4182) Revert "mgr/orchestrator: require "group" field for nvmeof specs" This reverts commit f6d552d. It was decided by the nvmeof team to stick with defaulting to an empty string rather than forcing the users onto other non-empty names when they upgrade Signed-off-by: Adam King <adking@redhat.com> (cherry picked from commit 3e5e85a) Revert "mgr/cephadm: migrate nvmeof specs without group field" This reverts commit d7b00ea. It was decided by the nvmeof team to stick with defaulting to an empty string rather than forcing the users onto other non-empty names when they upgrade Signed-off-by: Adam King <adking@redhat.com> (cherry picked from commit e63d4b0) mgr/orchestrator: allow passing group to apply/add nvmeof commands We no longer require the group when applying an nvmeof spec but we still want to allow the commands to take a group parameter (and this will at least make a group name required when creating a new service on the command line) Signed-off-by: Adam King <adking@redhat.com> (cherry picked from commit b377085) mon/NVMeofGw*: Fix issue when ana group of deleted GW was not serviced. Introduced GW Deleting state Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> Resolves: rhbz#2310380 (cherry picked from commit d4f961a) mon/NVMeofGw*: 1. fix blocklist bug - blockist was not called 2. originally monitor only bloklisted specific ana groups but since we allow the changing of ns ana grp on the fly for the sake of ns load balance, it is not good enough and we need to blocklist all the cluster contexts of the failing gateway Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 936d3af) mon/NVMeofGw*: fix issue that GW was down when last subsystem was deleted Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> Resolves: rhbz#2301460 (cherry picked from commit 698e4c5) Merge pull request ceph#59999 from leonidc/tracking-gw-deleting mon/nvmeofgw*: fix tracking gateways in DELETING state Resolves: rhbz#2314625 (cherry picked from commit 381a408) Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> mgr/cephadm: change ceph-nvmeof gw image version to 1.3 Resolves: rhbz#2309667 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 783f868) mgr/cephadm: Make the discovery and gateway IPs configurable in NVMEof configuration Resolves: rhbz#2311459 (cherry picked from commit 9f6d1ec) Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> pybind/mgr/cephadm/services/nvmeof.py: allow setting '0.0.0.0' as address in the spec file - Partial revert of ceph@9eb3b99 - Part of ceph#59738 (cherry picked from commit 62a4247) python-common/ceph/deployment/service_spec.py: Allow the cephadm deployment to determine the default addresses Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit 0997e4c) Resolves: rhbz#2311996 (cherry picked from commit 2db7559) qa/tasks/nvmeof.py: add nvmeof gw-group to deployment Groups was made a required parameter to be `ceph orch apply nvmeof <pool> <group>` in ceph#58860. That broke the `nvmeof` suite so this PR fixes that. Right now, all gateway are deployed in a single group. Later, this would be changed to have multi groups for a better test. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit c9a6fed) qa: Expand nvmeof thrasher and add nvmeof_namespaces.yaml job 1. qa/tasks/nvmeof.py: add other methods to stop nvmeof daemons 2. add qa/workunits/rbd/nvmeof_namespace_test.sh which adds and deletes new namespaces. It is run in nvmeof_namespaces.yaml job where fio happens to other namespaces in background. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 58d8be9) qa/suites/nvmeof/basic: add nvmeof_scalability test Add test to upscale/downscale nvmeof gateways. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit e5a9cda) qa: move nvmeof shell scripts to qa/workunits/nvmeof Move all scripts qa/workunits/rbd/nvmeof_*.sh to qa/workunits/nvmeof/*.sh Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 2ed818e) Conflicts: qa/workunits/nvmeof/setup_subsystem.sh qa/suites/nvmeof: increase hosts in cluster setup In "nvmeof" task, change "client" config to "installer" which allows to take inputs like "host.a". nvmeof/basic: change 2-gateway-2-initiator to 4-gateway-2-inititator cluster nvmeof/thrash: change 3-gateway-1-initiator to 4-gateway-1-inititaor cluster Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 4d97b1a) qa/suites/nvmeof: add mtls test Add qa/workunits/nvmeof/mtls_test.sh which enables mtls config and redeploy, then verify and disables mtls config. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit fdc93ad) Conflicts: qa/tasks/nvmeof.py qa/suite/nvmeof/thrash: increase number of thrashing - Run fio for 15 mins (instead of 10min). - nvmeof.py: change daemon_max_thrash_times default from 3 to 5 - nvmeof.py: run nvme list in do_checks() Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 51743e6) qa/suites/nvmeof: add nvmeof warnings to log-ignorelist Add NVMEOF_SINGLE_GATEWAY and NVMEOF_GATEWAY_DOWN warnings to nvmeof:thrash job's log-ignorelist Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 73d5c01) qa/suites/nvmeof/thrash: Add "is unavailable" to log-ignorelist This commit also: - Remove --rbd_iostat from thrasher fio - Log iteration details before printing stats in nvmeof_tharsher Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit c0ca0eb) qa/tasks/nvmeof.py: Improve thrasher and rbd image creation Create rbd images in one command using ";" to queue them, instead of running "cephadm shell -- rbd create" again and again for each image. Improve the method to select to-be-thrashed daemons. Use randint() and sample(), instead of weights/skip. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 82118e1) qa/tasks/ceph: provide configuration for setting configs via mon These configs may be set using: ceph: cluster-config: entity: foo: bar same as the current: ceph: config: entity: foo: bar The configs will be set in parallel using the `ceph config set` command. The main benefit here is to avoid using the ceph.conf to set configs which cannot be overriden using subsequent `ceph config` command. The only way to override is to change the ceph.conf in the test (yuck) or the admin socket (which gets reset when the daemon restarts). Finally, we can now exploit the `ceph config reset` command will let us trivially rollback config changes after a test completes. That is exposed as the `ctx.config_epoch` variable. Signed-off-by: Patrick Donnelly <pdonnell@redhat.com> (cherry picked from commit 9d485ae) python-common/ceph/deployment: add SPDK log level to nvmeof configuration Fixes https://tracker.ceph.com/issues/67258 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit d3cc237) mgr/cephadm: add SPDK log level to nvmeof configuration Fixes https://tracker.ceph.com/issues/67258 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 19399de) python-common/ceph/deployment: change SPDK RPC fields in nvmeof configuration Fixes https://tracker.ceph.com/issues/67629 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit d18e6fb) mgr/cephadm: change SPDK RPC fields in nvmeof configuration Fixes https://tracker.ceph.com/issues/67629 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit d208242) python-common/ceph/deployment: revert SPDK RPC fields in nvmeof configuration Fixes https://tracker.ceph.com/issues/67844 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit cb28d39) mgr/cephadm: revert SPDK RPC fields in nvmeof configuration Fixes https://tracker.ceph.com/issues/67844 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 11de53f) python-common/ceph/deployment: Add namespace netmask parameters to nvmeof configuration Fixes https://tracker.ceph.com/issues/68542 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit dd4b357) mgr/cephadm: Add namespace netmask parameters to nvmeof configuration Fixes https://tracker.ceph.com/issues/68542 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 0dcc207) python-common/ceph/deployment: Add resource limits to nvmeof configuration Fixes https://tracker.ceph.com/issues/68967 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 4269d7c) mgr/cephadm: Add resource limits to nvmeof configuration Fixes https://tracker.ceph.com/issues/68967 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 1807a55) Signed-off-by: Gil Bregman <gbregman@il.ibm.com> mgr/cephadm/nvmeof: Add auto rebalance fields to NVMeOF configuration Fixes https://tracker.ceph.com/issues/69176 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit bfc8fb6) mgr/cephadm/nvmeof: Rewrite NVMEoF fields validation. Fixes https://tracker.ceph.com/issues/69176 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 31283c0) mgr/cephadm/nvmeof: Add key verification field to NVMeOF configuration Fixes https://tracker.ceph.com/issues/69413 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 26a0f9a) Signed-off-by: Gil Bregman <gbregman@il.ibm.com> pybind/mgr/orchestrator/module.py: NvmeofServiceSpec service_id - make service_id better alligned with default/empty group (ceph@f6d552d) - fix service_id in nvmeof daemon add Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit e1612d0) cephadm/nvmeof: support no huge pages for nvmeof spdk depends on: ceph/ceph-nvmeof#898 Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit 38513cb) cephadm/nvmeof: support per-node gateway addresses Added gateway and discovery address maps to the service specification. These maps store per-node service addresses. The address is first searched in the map, then in the spec address configuration. If neither is defined, the host IP is used as a fallback. Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit 2f47f9d) cephadm/nvmeof: fix ports when default values are overridden Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit e717a92) src/nvmeof/NVMeofGwMonitorClient: remove MDS client, not needed Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit f806872) mon: add nvmeof healthchecks Add NVMeofGwMap::get_health_checks which raises NVMEOF_SINGLE_GATEWAY if any of the groups have 1 gateway. In NVMeofGwMon, call `encode_health` and `load_health` to register healthchecks. This will add nvmeof healthchecks to "ceph health" output. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 1cad040) mon: add warning NVMEOF_GATEWAY_DOWN In src/mon/NVMeofGwMap.cc, add warning NVMEOF_GATEWAY_DOWN when any gateway is in GW_UNAVAILABLE state. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 0006599) monitoring: Add prometheus alert NVMeoFMultipleNamespacesOfRBDImage NVMeoFMultipleNamespacesOfRBDImage alerts the user if a RBD image is used for multiple namespaces. This is important alerts for cases where namespaces are created on same image for different gateway group. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 61b3289) monitoring: add 2 nvmeof alerts to prometheus_alerts.yaml - `NVMeoFMissingListener`: trigger if all listeners are not created for each gateway in a subsystem - `NVMeoFZeroListenerSubsystem`: trigger if a subsystem has no listeners Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit f02e312) monitoring: add 2 new nvmeof alerts Add NVMeoFMissingListener and NVMeoFZeroListenerSubsystem alerts to prometheus_alerts.libsonnet. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 7994fea) monitoring: add tests for 2 new nvmeof alerts Add test for alerts NVMeoFMissingListener and NVMeoFZeroListenerSubsystem to test_alerts.yml. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit a878460) monitoring: Add alert NVMeoFTooManyNamespaces NVMeoFTooManyNamespaces helps to alert user if total number of namespaces across subsystems are more than 1024. Change NVMeoFTooManySubsystems limit to 128 from 16. Fixes: ceph/ceph-nvmeof#948 Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 614e146) mon/NVMeofGwMap: add healthcheck warning NVMEOF_GATEWAY_DELETING Add a warning when NVMeoF gateways are in DELETING state. This happens when there are namespaces under the deleted gateway's ANA group ID. The gateways are removed completely after users manually move these namespaces to another load balancing group. Or if a new gateway is deployed on that host. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 571dd53) src/common/options/mon.yaml.in: add mon_nvmeofgw_delete_grace This config allows to configure the delay in triggering NVMEOF_GATEWAY_DELETING healthcheck warning, which is triggered when NVMeoF gateways are in DELETEING state for too long (indicating a problem in namespace load-balacing). The default value for this config is 15 mins. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 7b33f77) mon/NVMeofGwMap: add delay to NVMEOF_GATEWAY_DELETING warning Instead of immediately triggering, have this healthcheck trigger after some time has elasped. This delay can be configured by mon_nvmeofgw_delete_grace. Track the time when gateways go into DELETING state in a new member var (of NVMeofGwMon) 'gws_deleting_time'. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 56cf512) qa/workunits/nvmeof/basic_tests.sh: fix connect-all assert There seems to be change in 'nvme list' json output which caused failures in asserts after 'nvme connect-all' command. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 22f91cd) mon/nvmeofgw*:fix monitor database corruption upon add gw Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 417c544) mon/nvmeofgw*: fix HA usecase when gateway has no listeners: behaves like no-subsystems Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 47e7a24) mon/nvmeofgw*: monitors publish in nvme-gw show ana group responsible for namespace rebalance Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit c358483) nvmeofgw* : fix publishing rebalance index Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit ceb62c0) mgr/cephadm: change ceph-nvmeof gw image version to 1.4 Fixes https://tracker.ceph.com/issues/69099 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> mon/nvme: fix unused lambda capture warnings Signed-off-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit edb0321) Add multi-cluster support (showMultiCluster=True) to alerts Following PR ceph#55495 fixing the dashboard in regards to multiple clusters storing their metrics in a single Prometheus instance, this PR addresses the issues for alerts. Fixes: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de> (cherry picked from commit 810c706) monitoring: Update nvmeof alert limits in config Update these in config.libsonnet: - NVMeoFMaxGatewaysPerGroup (4->8) - NVMeoFMaxGatewaysPerCluster (4->32) - NVMeoFMaxNamespaces (1024->2048) - NVMeoFHighClientCount (32->128) Also update prometheus_alerts.yml and test_alerts.yml accordingly. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit f3c1881) mon: do not show nvmeof in 'ceph versions' output NVMeoF gateway version is independent of ceph version so 'ceph version' shows wrong nvmeof version in output (i.e. instead of gateway version, it shows Ceph version). Hence, remove nvmeof in 'ceph versions' output. To check for gateway version, use 'gw info' command. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 73c935d) mgr/cephadm/nvmeof: Add verify_listener_ip field to NVMeOF configuration and remove obsolete enable_key_encryption Fixes https://tracker.ceph.com/issues/69731 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 744b04a) mgr/cephadm/nvmeof: Add max_hosts field to NVMeOF configuration and update default values Fixes https://tracker.ceph.com/issues/69759 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 0d8bd4d) mgr/cephadm/nvmeof: Add SPDK iobuf options field to NVMeOF configuration Fixes https://tracker.ceph.com/issues/69554 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 42bac97) monitoring: add NVMeoFMaxGatewayGroups Add config NVMeoFMaxGatewayGroups to config.libsonnet and set it to 4 (groups). Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit c5c4b10) monitoring: add alert NVMeoFMaxGatewayGroups Add alert NVMeoFMaxGatewayGroups to prometheus_alerts.yml and prometheus_alerts.libsonnet. This alerts is to indicate if max number of NVMeoF gateway groups have been reached in a cluster. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit ab4a1dd) monitoring: add tests for NVMeoFMaxGatewayGroups Add unit tests for alert NVMeoFMaxGatewayGroups in monitoring/ceph-mixin/tests_alerts/test_alerts.yml Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e5cb5db) qa/tasks/nvmeof: Add --refresh flag in do_checks() cmds This is to ensure latest state of the services are displayed. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 023c209) qa: Add qa/suites/nvmeof/thrash/gateway-initiator-setup/2-subsys-8-namespace.yaml This allows to run nvmeof thrasher test on smaller confgurations which finshes faster than 120subsys-8ns config. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit d7551f7) qa/tasks/nvmeof.py: Add stop_and_join method to thrasher Also add nvme-gw show command output in do_checks() and revive daemons with 'ceph orch daemon start' in revive_daemon() method. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 0b0f450) qa/workunits/nvmeof/fio_test.sh: fix fio filenames Filenames were provided to fio as nvme1n1:nvme1n2, it should be pull path (/dev/nvme1n1:/dev/nvme1n2). Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 06811a4) qa/tasks/nvmeof.py: Do not use 'systemctl start' in thrasher Instead use 'daemon start' in revive_daemon() to bring up gateways thrashed with 'systemctl stop'. This is because 'systemctl start' method seems to temporary issues. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit b5e6a0c) qa/tasks/nvmeof.py: make seperate calls in do_checks() When running 'nvme list-subsys <device>' command in do_checks(), instead of combining command for all devices with '&&', make seperate calls. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 5a58114) qa/tasks/nvmeof.py: Fix do_checks() method All checks currently run on initator node, now run all "ceph" commands on one of gateway hosts instead of initator nodes. And run "nvme list" and "nvme list-subsys" checks on initator node. Add retry (5 times) to do_checks if any command fails. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 7dfd3d3) qa/tasks/nvmeof.py: Ignore systemctl_stop thrashing method Do not use systemctl_stop method to thrash daemons, just use 'ceph orch daemon stop' and 'ceph orch daemon rm' methods to thrash nvmeof gateways. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit d4aec58) qa/tasks/nvmeof.py: Add teardown() method Add teardown method to remove nvmeof service before rest of the cluster tearsdown. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e8201d3) qa/suites/nvmeof: Remove watchdog from thrasher This commit does the following: 1. remove watchdog from thrasher 1. remove wait from fio_test 3. change thrasher switcher wait-time to 10 mins Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 76b4028) qa/suites/nvmeof: use SCALING_DELAYS: '120' Increase delays for qa/workunits/nvmeof/scalability_test.sh as namespace rebalancing takes more time. After upscaling, gateway initially could be 'CREATED', it is a valid state during gateway initialization, but then the state should progress to 'AVAILABLE' within couple of seconds. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 3b9b290) nvmeofgw*: change log level of critical nvmeof monitor events to 1 Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 57c4e16) nvmeofgw*: 2 fixes - for duplicated optimized pathes and fix for GW startup 1. fix duplicated optimized host's pathes - trigger process_gw_down upon fast-gw reboot, removed old fast-reboot handlers 2. fix GW startup - trigger process_gw_down when expired WAIT_BLOCKLIST timer Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 4397c02) qa/workunits/nvmeof/fio_test: Log cluster status if fio fails Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e450406) qa/suites/nvmeof: add more asserts to scalability_test Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 877c726) qa/suites/nvmeof: Run fio with scalability test Run fio in parallel with scalability test. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e2f3bed) qa/workunits/nvmeof/fio_test.sh: add more debug commands Add more commands to debug when fio fails: - nvme list-subsys /dev/nvme1n2 - nvme list from the initiator - nvme list | wc -l - nvme id-ns /dev/nvme1n2 Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit fd8fbea) monitoring: fix NVMeoFSubsystemNamespaceLimit Alert is not triggered as expected, change the query to fix that. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2282348 Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 4a7866a) mgr/cephadm/nvmeof: Add QOS timeslice field to NVMeOF configuration Fixes https://tracker.ceph.com/issues/69952 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 7b4af1f) Merge pull request ceph#60871 from leonidc/leonidc-epoch-filter Epoch filtering Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com> Reviewed-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 3cdf529) mon/nvmeofgw*: fix no-listeners FSM, fix detection of no-listeners condition Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 66ca80e) restore proper no-listeners logic Signed-off-by: leonidc <leonidc@il.ibm.com>

nvmeofgw*: 2 fixes - for duplicated optimized pathes and fix for GW startup 1. fix duplicated optimized host's pathes - trigger process_gw_down upon fast-gw reboot, removed old fast-reboot handlers 2. fix GW startup - trigger process_gw_down when expired WAIT_BLOCKLIST timer Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 4397c02) qa/workunits/nvmeof/fio_test: Log cluster status if fio fails Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e450406) qa/suites/nvmeof: add more asserts to scalability_test Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 877c726) qa/suites/nvmeof: Run fio with scalability test Run fio in parallel with scalability test. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e2f3bed) qa/workunits/nvmeof/fio_test.sh: add more debug commands Add more commands to debug when fio fails: - nvme list-subsys /dev/nvme1n2 - nvme list from the initiator - nvme list | wc -l - nvme id-ns /dev/nvme1n2 Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit fd8fbea) monitoring: fix NVMeoFSubsystemNamespaceLimit Alert is not triggered as expected, change the query to fix that. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2282348 Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 4a7866a) mgr/cephadm/nvmeof: Add QOS timeslice field to NVMeOF configuration Fixes https://tracker.ceph.com/issues/69952 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 7b4af1f) Merge pull request ceph#60871 from leonidc/leonidc-epoch-filter Epoch filtering Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com> Reviewed-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 3cdf529) mon/nvmeofgw*: fix no-listeners FSM, fix detection of no-listeners condition Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 66ca80e) restore proper no-listeners logic Signed-off-by: leonidc <leonidc@il.ibm.com>

mon/nvmeofgw*: fix HA usecase when gateway has no listeners: behaves like no-subsystems Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 47e7a24) mon/nvmeofgw*: monitors publish in nvme-gw show ana group responsible for namespace rebalance Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit c358483) nvmeofgw* : fix publishing rebalance index Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit ceb62c0) nvmeofgw*: 2 fixes - for duplicated optimized pathes and fix for GW startup 1. fix duplicated optimized host's pathes - trigger process_gw_down upon fast-gw reboot, removed old fast-reboot handlers 2. fix GW startup - trigger process_gw_down when expired WAIT_BLOCKLIST timer Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 4397c02) Merge pull request ceph#60871 from leonidc/leonidc-epoch-filter Epoch filtering Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com> Reviewed-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 3cdf529) mon/nvmeofgw*: fix no-listeners FSM, fix detection of no-listeners condition Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 66ca80e) restore proper no-listeners logic Signed-off-by: leonidc <leonidc@il.ibm.com>

======================================== Resolves: rhbz#2350962 qa/tasks/nvmeof.py: add nvmeof gw-group to deployment Groups was made a required parameter to be `ceph orch apply nvmeof <pool> <group>` in ceph#58860. That broke the `nvmeof` suite so this PR fixes that. Right now, all gateway are deployed in a single group. Later, this would be changed to have multi groups for a better test. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit c9a6fed) qa: Expand nvmeof thrasher and add nvmeof_namespaces.yaml job 1. qa/tasks/nvmeof.py: add other methods to stop nvmeof daemons 2. add qa/workunits/rbd/nvmeof_namespace_test.sh which adds and deletes new namespaces. It is run in nvmeof_namespaces.yaml job where fio happens to other namespaces in background. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 58d8be9) qa/suites/nvmeof/basic: add nvmeof_scalability test Add test to upscale/downscale nvmeof gateways. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit e5a9cda) qa: move nvmeof shell scripts to qa/workunits/nvmeof Move all scripts qa/workunits/rbd/nvmeof_*.sh to qa/workunits/nvmeof/*.sh Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 2ed818e) qa/suites/nvmeof: increase hosts in cluster setup In "nvmeof" task, change "client" config to "installer" which allows to take inputs like "host.a". nvmeof/basic: change 2-gateway-2-initiator to 4-gateway-2-inititator cluster nvmeof/thrash: change 3-gateway-1-initiator to 4-gateway-1-inititaor cluster Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 4d97b1a) qa/suites/nvmeof: wait for service "nvmeof.mypool.mygroup0" This is because nvmeof gateway group names are now part of service id. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit da8e95c) labeler: add nvmeof labelers Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit d513cc5) qa/suites/nvmeof: use "latest" image of gateway and cli Change nvmeof gateway and cli image from 1.2 to "latest". Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 0bab553) qa/workunits/nvmeof/setup_subsystem.sh: use --no-group-append In newer version of nvmeof cli, "subsystem add" needs this tag to ensure subsystem name is value of --subsystem. Otherwise, in newer cli version, the gateway group is appended at the end of the subsystem name. This fixes the teuthology nvmeof suite (currently all jobs fails because of this). Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 303f18b) mon: add nvmeof healthchecks Add NVMeofGwMap::get_health_checks which raises NVMEOF_SINGLE_GATEWAY if any of the groups have 1 gateway. In NVMeofGwMon, call `encode_health` and `load_health` to register healthchecks. This will add nvmeof healthchecks to "ceph health" output. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 1cad040) mon: add warning NVMEOF_GATEWAY_DOWN In src/mon/NVMeofGwMap.cc, add warning NVMEOF_GATEWAY_DOWN when any gateway is in GW_UNAVAILABLE state. Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 0006599) qa/suites/nvmeof: add mtls test Add qa/workunits/nvmeof/mtls_test.sh which enables mtls config and redeploy, then verify and disables mtls config. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit fdc93ad) monitoring: add 2 nvmeof alerts to prometheus_alerts.yaml - `NVMeoFMissingListener`: trigger if all listeners are not created for each gateway in a subsystem - `NVMeoFZeroListenerSubsystem`: trigger if a subsystem has no listeners Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit f02e312) monitoring: add 2 new nvmeof alerts Add NVMeoFMissingListener and NVMeoFZeroListenerSubsystem alerts to prometheus_alerts.libsonnet. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 7994fea) monitoring: add tests for 2 new nvmeof alerts Add test for alerts NVMeoFMissingListener and NVMeoFZeroListenerSubsystem to test_alerts.yml. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit a878460) qa/suites/nvmeof: add nvmeof warnings to log-ignorelist Add NVMEOF_SINGLE_GATEWAY and NVMEOF_GATEWAY_DOWN warnings to nvmeof:thrash job's log-ignorelist Signed-off-by: Vallari Agrawal <val.agl002@gmail.com> (cherry picked from commit 73d5c01) qa/suites/nvmeof: fix nvmeof_namespaces.yaml When basic_tests.sh is executed in parallel with namespace_test.sh, sometimes namespace_test.sh starts before fio_test.sh which would break the test. So this change ensures "fio_test.sh" is started before and executed in parallel with "namespace_test.sh". Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 6e15b5e) qa/suite/nvmeof: add asserts to scalability_test.sh Add assertions to 'status_checks()' function. Use "apply" and "redeploy", instead of "orch rm" and "apply" to upscale/downscale gateways. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 9393509) qa/suite/nvmeof/thrash: increase number of thrashing - Run fio for 15 mins (instead of 10min). - nvmeof.py: change daemon_max_thrash_times default from 3 to 5 - nvmeof.py: run nvme list in do_checks() Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 51743e6) qa/suites/nvmeof/basic: use default image in nvmeof_initiator.yaml Instead of using quay.io/ceph/nvmeof:latest, use default image in ceph build. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit f670916) qa/suites/nvmeof/thrash: Add "is unavailable" to log-ignorelist This commit also: - Remove --rbd_iostat from thrasher fio - Log iteration details before printing stats in nvmeof_tharsher Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit c0ca0eb) qa/suites/nvmeof/thrasher: use 120 subsystems and 8 ns each For tharsher test: 1. Run it on 120 subsystems with 8 namespaces each 2. Run FIO for 20 mins (instead of 15mins) 2. Run FIO for few randomly picked devices (using `--random_devices 200`) Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e1983c5) qa/tasks/nvmeof.py: Improve thrasher and rbd image creation Create rbd images in one command using ";" to queue them, instead of running "cephadm shell -- rbd create" again and again for each image. Improve the method to select to-be-thrashed daemons. Use randint() and sample(), instead of weights/skip. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 82118e1) qa/workunits/nvmeof/setup_subsystem.sh: add list_namespaces() func Add list_namespaces function which could be useful for debugging later. Remove extra call of list_subsystems so it's only logged once after subsystems are completely setup. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 2030411) qa/workunits/nvmeof/basic_tests.sh: Assert number of devices Check number of devices connected after connect-all. It should be equal to number of namespaces created. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 7ee4677) qa/suites/nvmeof/thrash: add 10-subsys-90-namespace-no_huge_pages.yaml Add test for no-huge-pages by using config "spdk_mem_size: 4096" in 10 subsystems and 90 namespaces each setup. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 09ade3d) monitoring: Add prometheus alert NVMeoFMultipleNamespacesOfRBDImage NVMeoFMultipleNamespacesOfRBDImage alerts the user if a RBD image is used for multiple namespaces. This is important alerts for cases where namespaces are created on same image for different gateway group. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 61b3289) mon/NVMeofGwMap: add healthcheck warning NVMEOF_GATEWAY_DELETING Add a warning when NVMeoF gateways are in DELETING state. This happens when there are namespaces under the deleted gateway's ANA group ID. The gateways are removed completely after users manually move these namespaces to another load balancing group. Or if a new gateway is deployed on that host. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 571dd53) src/common/options/mon.yaml.in: add mon_nvmeofgw_delete_grace This config allows to configure the delay in triggering NVMEOF_GATEWAY_DELETING healthcheck warning, which is triggered when NVMeoF gateways are in DELETEING state for too long (indicating a problem in namespace load-balacing). The default value for this config is 15 mins. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 7b33f77) mon/NVMeofGwMap: add delay to NVMEOF_GATEWAY_DELETING warning Instead of immediately triggering, have this healthcheck trigger after some time has elasped. This delay can be configured by mon_nvmeofgw_delete_grace. Track the time when gateways go into DELETING state in a new member var (of NVMeofGwMon) 'gws_deleting_time'. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 56cf512) qa/workunits/nvmeof/basic_tests.sh: fix connect-all assert There seems to be change in 'nvme list' json output which caused failures in asserts after 'nvme connect-all' command. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 22f91cd) qa/tasks/nvmeof: Add --refresh flag in do_checks() cmds This is to ensure latest state of the services are displayed. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 023c209) qa: Add qa/suites/nvmeof/thrash/gateway-initiator-setup/2-subsys-8-namespace.yaml This allows to run nvmeof thrasher test on smaller confgurations which finshes faster than 120subsys-8ns config. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit d7551f7) qa/tasks/nvmeof.py: Add stop_and_join method to thrasher Also add nvme-gw show command output in do_checks() and revive daemons with 'ceph orch daemon start' in revive_daemon() method. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 0b0f450) qa/workunits/nvmeof/fio_test.sh: fix fio filenames Filenames were provided to fio as nvme1n1:nvme1n2, it should be pull path (/dev/nvme1n1:/dev/nvme1n2). Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 06811a4) qa/tasks/nvmeof.py: Do not use 'systemctl start' in thrasher Instead use 'daemon start' in revive_daemon() to bring up gateways thrashed with 'systemctl stop'. This is because 'systemctl start' method seems to temporary issues. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit b5e6a0c) qa/tasks/nvmeof.py: make seperate calls in do_checks() When running 'nvme list-subsys <device>' command in do_checks(), instead of combining command for all devices with '&&', make seperate calls. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 5a58114) qa/tasks/nvmeof.py: Fix do_checks() method All checks currently run on initator node, now run all "ceph" commands on one of gateway hosts instead of initator nodes. And run "nvme list" and "nvme list-subsys" checks on initator node. Add retry (5 times) to do_checks if any command fails. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 7dfd3d3) qa/tasks/nvmeof.py: Ignore systemctl_stop thrashing method Do not use systemctl_stop method to thrash daemons, just use 'ceph orch daemon stop' and 'ceph orch daemon rm' methods to thrash nvmeof gateways. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit d4aec58) qa/tasks/nvmeof.py: Add teardown() method Add teardown method to remove nvmeof service before rest of the cluster tearsdown. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e8201d3) qa/suites/nvmeof: Remove watchdog from thrasher This commit does the following: 1. remove watchdog from thrasher 1. remove wait from fio_test 3. change thrasher switcher wait-time to 10 mins Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 76b4028) monitoring: add NVMeoFMaxGatewayGroups Add config NVMeoFMaxGatewayGroups to config.libsonnet and set it to 4 (groups). Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit c5c4b10) monitoring: add alert NVMeoFMaxGatewayGroups Add alert NVMeoFMaxGatewayGroups to prometheus_alerts.yml and prometheus_alerts.libsonnet. This alerts is to indicate if max number of NVMeoF gateway groups have been reached in a cluster. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit ab4a1dd) monitoring: add tests for NVMeoFMaxGatewayGroups Add unit tests for alert NVMeoFMaxGatewayGroups in monitoring/ceph-mixin/tests_alerts/test_alerts.yml Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e5cb5db) qa/suites/nvmeof: use SCALING_DELAYS: '120' Increase delays for qa/workunits/nvmeof/scalability_test.sh as namespace rebalancing takes more time. After upscaling, gateway initially could be 'CREATED', it is a valid state during gateway initialization, but then the state should progress to 'AVAILABLE' within couple of seconds. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 3b9b290) qa/workunits/nvmeof/fio_test: Log cluster status if fio fails Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e450406) qa/suites/nvmeof: add more asserts to scalability_test Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 877c726) qa/suites/nvmeof: Run fio with scalability test Run fio in parallel with scalability test. Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e2f3bed) qa/workunits/nvmeof/fio_test.sh: add more debug commands Add more commands to debug when fio fails: - nvme list-subsys /dev/nvme1n2 - nvme list from the initiator - nvme list | wc -l - nvme id-ns /dev/nvme1n2 Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit fd8fbea) mon: Add nvmeof group/gateway name in "ceph -s" In "ceph status" command output, show gateway group names and gateway names. Before: ``` services: mon: 4 daemons, quorum ceph-nvme-vm8,ceph-nvme-vm1,ceph-nvme-vm7,ceph-nvme-vm6 (age 71m) mgr: ceph-nvme-vm8.tgytdq(active, since 73m), standbys: ceph-nvme-vm6.tequqo, ceph-nvme-vm1.pxrofr, ceph-nvme-vm7.lbxrea osd: 4 osds: 4 up (since 70m), 4 in (since 70m) nvmeof: 4 gateways active (4 hosts) ``` After: ``` services: mon: 4 daemons, quorum ceph-nvme-vm14,ceph-nvme-vm11,ceph-nvme-vm13,ceph-nvme-vm12 (age 17m) mgr: ceph-nvme-vm14.gjjgvq(active, since 19m), standbys: ceph-nvme-vm12.shbvpw, ceph-nvme-vm11.gucgiu, ceph-nvme-vm13.inzizw osd: 4 osds: 4 up (since 15m), 4 in (since 16m) nvmeof (mygroup1) : 2 gateways active (ceph-nvme-vm13.azfdpk, ceph-nvme-vm14.hdsoxl) nvmeof (mygroup2) : 2 gateways active (ceph-nvme-vm11.hnooxs, ceph-nvme-vm12.wcjcjs) ``` Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit e3fab2a) mon: show count of active/total nvmeof gws in "ceph -s" Improve "ceph status" output for nvmeof service: 1. Group by service_id (<pool>.<group>) instead of just by gateway groups. 2. Show total gateway count from NVMeofGwMap, and count of active gateways. New output: ``` services: mon: 4 daemons, quorum ceph-nvme-vm31,ceph-nvme-vm28,ceph-nvme-vm30,ceph-nvme-vm29 (age 16m) mgr: ceph-nvme-vm31.wnfclf(active, since 18m), standbys: ceph-nvme-vm29.iuwqin, ceph-nvme-vm28.lnnyui, ceph-nvme-vm30.fitwnw osd: 4 osds: 4 up (since 14m), 4 in (since 15m) nvmeof (mypool.mygroup1): 2 gateways: 1 active (ceph-nvme-vm30.kkcfux) nvmeof (mypool.mygroup2): 2 gateways: 2 active (ceph-nvme-vm28.mfqucr, ceph-nvme-vm29.hrizzl) ``` Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 3065ffe) monitoring: fix NVMeoFSubsystemNamespaceLimit Alert is not triggered as expected, change the query to fix that. BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2282348 Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com> (cherry picked from commit 4a7866a) mgr/cephadm: set service name for DaemonDescription object used during daemon removal What this is specifically fixing is that the nvmeof post_remove function needs the service spec of the daemon's service to get the pool and group tied to the nvmeof daemon. We have been using the DaemonDescription "service_name" property to get the service name in order to get the spec. This works in a regular deployment. However, it is possible to make a placement like placement: hosts: - vm-00=nvmeof.a - vm-01=nvmeof.b and one of the nvmeof CI tests was doing so, which is why we saw this. That will cause the nvmeof daemon names to be nvmeof.nvmeof.a and nvmeof.nvmeof.b and not include the service name at all. In this case, the service_name property on the DaemonDescription class will end up getting service names nvmeof.nvmeof.a and nvmeof.nvmeof.b respectively from the nvmeof daemons, which will cause us to fail to find the spec in post_remove. This change makes it so we manually set the service name for the DaemonDescription object that gets passed to post_remove based on the service name of the daemon object we get from the host cache, which will still have the correct service name even if the daemon has a custom name. Then the nvmeof post_remove function will get the correct service name and be able to find the spec. Additionally, we now take are technically taking the daemon type and id from the DaemonDescription in our HostCache as well, but this is mostly just for consistency and should have no real impact. Fixes: https://tracker.ceph.com/issues/68962 Signed-off-by: Adam King <adking@redhat.com> (cherry picked from commit d8dae24) Add multi-cluster support (showMultiCluster=True) to alerts Following PR ceph#55495 fixing the dashboard in regards to multiple clusters storing their metrics in a single Prometheus instance, this PR addresses the issues for alerts. Fixes: https://tracker.ceph.com/issues/64321 Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de> (cherry picked from commit 810c706) mon/nvme: fix unused lambda capture warnings Signed-off-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit edb0321) src/nvmeof/NVMeofGwMonitorClient: remove MDS client, not needed Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit f806872) cephadm/nvmeof: fix ports when default values are overridden Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit e717a92) cephadm/nvmeof: support per-node gateway addresses Added gateway and discovery address maps to the service specification. These maps store per-node service addresses. The address is first searched in the map, then in the spec address configuration. If neither is defined, the host IP is used as a fallback. Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit 2f47f9d) cephadm/nvmeof: support no huge pages for nvmeof spdk depends on: ceph/ceph-nvmeof#898 Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit 38513cb) pybind/mgr/orchestrator/module.py: NvmeofServiceSpec service_id - make service_id better alligned with default/empty group (ceph@f6d552d) - fix service_id in nvmeof daemon add Signed-off-by: Alexander Indenbaum <aindenba@redhat.com> (cherry picked from commit e1612d0) python-common/ceph/deployment: add SPDK log level to nvmeof configuration Fixes https://tracker.ceph.com/issues/67258 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit d3cc237) mgr/cephadm: add SPDK log level to nvmeof configuration Fixes https://tracker.ceph.com/issues/67258 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 19399de) python-common/ceph/deployment: change SPDK RPC fields in nvmeof configuration Fixes https://tracker.ceph.com/issues/67629 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit d18e6fb) mgr/cephadm: change SPDK RPC fields in nvmeof configuration Fixes https://tracker.ceph.com/issues/67629 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit d208242) python-common/ceph/deployment: revert SPDK RPC fields in nvmeof configuration Fixes https://tracker.ceph.com/issues/67844 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit cb28d39) mgr/cephadm: revert SPDK RPC fields in nvmeof configuration Fixes https://tracker.ceph.com/issues/67844 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 11de53f) python-common/ceph/deployment: Add namespace netmask parameters to nvmeof configuration Fixes https://tracker.ceph.com/issues/68542 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit dd4b357) mgr/cephadm: Add namespace netmask parameters to nvmeof configuration Fixes https://tracker.ceph.com/issues/68542 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 0dcc207) python-common/ceph/deployment: Add resource limits to nvmeof configuration Fixes https://tracker.ceph.com/issues/68967 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 4269d7c) mgr/cephadm: Add resource limits to nvmeof configuration Fixes https://tracker.ceph.com/issues/68967 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 1807a55) Signed-off-by: Gil Bregman <gbregman@il.ibm.com> mgr/cephadm/nvmeof: Add auto rebalance fields to NVMeOF configuration Fixes https://tracker.ceph.com/issues/69176 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit bfc8fb6) mgr/cephadm/nvmeof: Rewrite NVMEoF fields validation. Fixes https://tracker.ceph.com/issues/69176 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 31283c0) mgr/cephadm/nvmeof: Add key verification field to NVMeOF configuration Fixes https://tracker.ceph.com/issues/69413 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 26a0f9a) Signed-off-by: Gil Bregman <gbregman@il.ibm.com> mgr/cephadm: change ceph-nvmeof gw image version to 1.4 Fixes https://tracker.ceph.com/issues/69099 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> mgr/cephadm/nvmeof: Add verify_listener_ip field to NVMeOF configuration and remove obsolete enable_key_encryption Fixes https://tracker.ceph.com/issues/69731 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 744b04a) mgr/cephadm/nvmeof: Add max_hosts field to NVMeOF configuration and update default values Fixes https://tracker.ceph.com/issues/69759 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 0d8bd4d) mgr/cephadm/nvmeof: Add SPDK iobuf options field to NVMeOF configuration Fixes https://tracker.ceph.com/issues/69554 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 42bac97) mgr/cephadm/nvmeof: Add QOS timeslice field to NVMeOF configuration Fixes https://tracker.ceph.com/issues/69952 Signed-off-by: Gil Bregman <gbregman@il.ibm.com> (cherry picked from commit 7b4af1f) mon/nvmeofgw*: fix HA usecase when gateway has no listeners: behaves like no-subsystems Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 47e7a24) mon/nvmeofgw*: monitors publish in nvme-gw show ana group responsible for namespace rebalance Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit c358483) nvmeofgw* : fix publishing rebalance index Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit ceb62c0) nvmeofgw*: 2 fixes - for duplicated optimized pathes and fix for GW startup 1. fix duplicated optimized host's pathes - trigger process_gw_down upon fast-gw reboot, removed old fast-reboot handlers 2. fix GW startup - trigger process_gw_down when expired WAIT_BLOCKLIST timer Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 4397c02) Merge pull request ceph#60871 from leonidc/leonidc-epoch-filter Epoch filtering Reviewed-by: Samuel Just <sjust@redhat.com> Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com> Reviewed-by: Ronen Friedman <rfriedma@redhat.com> (cherry picked from commit 3cdf529) mon/nvmeofgw*: fix no-listeners FSM, fix detection of no-listeners condition Signed-off-by: Leonid Chernin <leonidc@il.ibm.com> (cherry picked from commit 66ca80e) restore proper no-listeners logic Signed-off-by: leonidc <leonidc@il.ibm.com>

Naveenaidu · 2025-03-20T10:57:53Z

@leonidc While working on a tool that checks if any new ceph configuration has been introduced and not yet included in the release notes - I came across this config option and noticed that releases notes have not been updated. I was curious if this was an intended decision or something that fell through the cracks. This information would help me tailor my changes.

Details of the tool: https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/7PQ5GWNX4NVWYP2UZLC34BEUH6GSRYBT/

leonidc requested a review from a team as a code owner November 28, 2024 05:11

github-actions bot added core mon nvmeof tests labels Nov 28, 2024

leonidc force-pushed the leonidc-epoch-filter branch from 99809d7 to 56abaaf Compare December 2, 2024 11:17

athanatos requested changes Dec 2, 2024

View reviewed changes

src/mon/NVMeofGwMap.h Outdated Show resolved Hide resolved

leonidc force-pushed the leonidc-epoch-filter branch from 56abaaf to 5cd2958 Compare December 5, 2024 06:44

caroav requested a review from baum December 15, 2024 08:25

athanatos reviewed Dec 17, 2024

View reviewed changes

src/nvmeof/NVMeofGwMonitorClient.cc Outdated Show resolved Hide resolved

athanatos reviewed Dec 17, 2024

View reviewed changes

src/nvmeof/NVMeofGwMonitorClient.cc Outdated Show resolved Hide resolved

baum mentioned this pull request Dec 18, 2024

src/nvmeof/NVMeofGwMonitorClient: remove MDS client, not needed #61130

Merged

14 tasks