Skip to content

upgrade rules for NVMeofGW monitors and gateways#59240

Merged
caroav merged 3 commits intoceph:mainfrom
leonidc:wip-leonidc-20241508-upgrade-rules-centos9-only
Aug 28, 2024
Merged

upgrade rules for NVMeofGW monitors and gateways#59240
caroav merged 3 commits intoceph:mainfrom
leonidc:wip-leonidc-20241508-upgrade-rules-centos9-only

Conversation

@leonidc
Copy link
Contributor

@leonidc leonidc commented Aug 15, 2024

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@leonidc leonidc requested a review from a team as a code owner August 15, 2024 14:49
@athanatos
Copy link
Contributor

commit message is missing subdirectory/component and can be much more explicit:

mon/NVMeofGw*: support upgrades from prior out-of-tree nvmeofha implementation (nvmeof-reef)

This commit adds upgrade support for users running an experimental nvmeofha implementation which
can be found in the nvmeof-reef branch in ceph.git.

Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>

@athanatos
Copy link
Contributor

athanatos commented Aug 15, 2024

We probably want to include the feature bit in this PR as well. Please cherry-pick "include/ceph_features: remove stray available marker" and " include/ceph_features: add NVMEOFHA feature bit" from https://github.com/athanatos/ceph/tree/sjust/wip-nvmeof-upgrade-59198 ordered before your commit. Keep them as separate commits as the commit messages have information in them (let me know if you need help with this). Then, incorporate "include/ceph_features: remove stray available marker" into your commit (either cherry-pick and squash or just find/replace SQUID manually, there are only 4 usages).

@leonidc leonidc force-pushed the wip-leonidc-20241508-upgrade-rules-centos9-only branch 2 times, most recently from 72e03c5 to f37b0a4 Compare August 19, 2024 12:43
@athanatos
Copy link
Contributor

jenkins test make check

@athanatos
Copy link
Contributor

athanatos commented Aug 19, 2024

The current commit ordering does not really make sense.

mon/NVMeofGw*: support upgrades from prior out-of-tree nvmeofha implementation (nvmeof-reef)

is the first commit, and it uses SQUID feature bits.

The next commit "mon/NVMEofHw*: use NVMEOFHA rather than SERVER_SQUID" does SERVER_SQUID -> NVMEOFHA. This should be squashed into "mon/NVMeofGw*: support upgrades from prior out-of-tree nvmeofha implementation (nvmeof-reef)" -- there is no reason for future readers to see a version of this code with the wrong feature bit.

The commit after that "include/ceph_features: add NVMEOFHA feature bit" is where the NVMEOFHA feature bit actually gets added. Without this commit, the prior two do not compile. Please put this one before the feature bit is used.

Finally, "mon/NVMeofGw*: code review fixes, limit number Gws in each pool&group to 16" should also be squashed into "mon/NVMeofGw*: support upgrades from prior out-of-tree nvmeofha implementation (nvmeof-reef)"

Less importantly, this branch also lacks "include/ceph_features: remove stray available marker"

I've repushed https://github.com/athanatos/ceph/tree/sjust/wip-nvmeof-upgrade-59198-commits-fixed with the above changes as well as fixes for the comments I make below -- please read over those fixes carefully. I also pushed https://github.com/athanatos/ceph/tree/sjust/wip-nvmeof-upgrade-59198-commits-fixed-squashed with the comment fixes squashed -- feel free to simply force-push https://github.com/athanatos/ceph/tree/sjust/wip-nvmeof-upgrade-59198-commits-fixed-squashed to this branch if you are ok with the changes.

for (auto &tm_itr:state.data) {
tmdata[tm_itr.first].timer_started = tm_itr.second.timer_started;
tmdata[tm_itr.first].timer_value = tm_itr.second.timer_value;
tmdata[tm_itr.first].end_time = tm_itr.second.end_time;
Copy link
Contributor

@athanatos athanatos Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dangerous. It sort of works at the moment due to the guard in cfg_add_gw and the corresponding enforcement in nvmeof-reef, but during the evolution of this PR, MAX_SUPPORTED_ANA_GROUPS has generally been considered to be for backward compatibility rather than a permanent commitment. This encoder should really be robust to a future where more than 16 GWs are supported in a group, and it would be a very easy thing to miss while making such a change. Rather than populating a fixed size array, simply check state.data for each loop through the following for loop. The branch I mention above has such a fix.

Copy link
Contributor

@athanatos athanatos Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you ever do feel you need such a fixed size array, please be certain to include ceph_assert guards to ensure that the indices are in bounds if they cannot be statically proven to be so within 10 lines or so of the usage. While crashing the mon is unacceptable, it's actually far preferable to the kind of undefined nonsense that can happen as a result of an out-of-bounds array write.

@athanatos
Copy link
Contributor

athanatos commented Aug 19, 2024

@leonidc leonidc force-pushed the wip-leonidc-20241508-upgrade-rules-centos9-only branch from f37b0a4 to e447024 Compare August 20, 2024 11:18
@athanatos
Copy link
Contributor

athanatos commented Aug 21, 2024

Sorry, one more fix. @gregsfortytwo points out above that populating ENCODE_START as I suggested

ENCODE_START(version, version, bl);

is dangerous because when you eventually want to bump version it will naturally bump the min_compat version as well (the second argument), which is unlikely to be desirable since presumably your future struct encoding changes will be simpler. I've pushed (once again) https://github.com/athanatos/ceph/tree/sjust/wip-nvmeof-upgrade-59198-commits-fixed with a fix for the above and https://github.com/athanatos/ceph/tree/sjust/wip-nvmeof-upgrade-59198-commits-fixed-squashed as a final branch with that extra commit squashed.

281ea85 is the specific change.

@athanatos athanatos force-pushed the wip-leonidc-20241508-upgrade-rules-centos9-only branch from e447024 to 8e3eee4 Compare August 21, 2024 05:38
@athanatos
Copy link
Contributor

@athanatos
Copy link
Contributor

jenkins test make check

baum pushed a commit to baum/ceph-nvmeof that referenced this pull request Aug 22, 2024
Should have been removed in caa9e7a.

Signed-off-by: Samuel Just <sjust@redhat.com>
Normally, we'd just use the SERVER_SQUID or SERVER_T flags instead of
using an extra feature bit.  However, the nvmeof ha monitor paxos
service has had a more complex development journey.  There are users
interested in using the nvmeof ha feature in squid, but it didn't make
the cutoff for backporting it.  There's an upstream nvmeof-squid branch
in the ceph.git repository with the patches backported for anyone
interested in building it.

However, that means that users of our normal stable releases will see
the feature added to the monitor one release after anyone who chooses to
use the nvmeof-squid branch.  We could disallow upgrades from
nvmeof-squid to T, but by adding a feature bit here we make such a
restriction unnecessary.

Signed-off-by: Samuel Just <sjust@redhat.com>
…mentation (nvmeof-reef)

This commit adds upgrade support for users running an experimental
nvmeofha implementation which can be found in the nvmeof-reef branch in
ceph.git.

Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
@leonidc leonidc force-pushed the wip-leonidc-20241508-upgrade-rules-centos9-only branch from 8e3eee4 to 3c50ef6 Compare August 26, 2024 07:03
@leonidc
Copy link
Contributor Author

leonidc commented Aug 26, 2024

@athanatos , All comments seems to be resolved, some of them resolved by your commit
So can you please approve this PR

@athanatos athanatos self-requested a review August 26, 2024 17:42
@caroav caroav merged commit 2e813d0 into ceph:main Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants