Skip to content

nvmeof: fix removeHost bug#6084

Merged
mergify[bot] merged 1 commit into
ceph:develfrom
gadididi:nvmeof/bug_fix_remove_host
Feb 18, 2026
Merged

nvmeof: fix removeHost bug#6084
mergify[bot] merged 1 commit into
ceph:develfrom
gadididi:nvmeof/bug_fix_remove_host

Conversation

@gadididi

Copy link
Copy Markdown
Contributor

ControllerUnPublishVolume() was searching in host list. but host list
in Namespace is not belong to hostNqn we use.
this host list is for Namespace Masking feature.
(out of scope here).
I caught it, when I created 2 PVCs + 2 pods
then I remove 1 pod and I saw it deleted the host
even though there is still 1 live pod.

so in our case, we just need to check the len of the ns list (for specific subsystemNqn)
if left more than 1, don't remove the host!

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@gadididi gadididi self-assigned this Feb 17, 2026
@gadididi gadididi requested review from Copilot and nixpanic February 17, 2026 18:35
@gadididi gadididi added component/nvme-of Issues and PRs related to NVMe-oF. bug Something isn't working labels Feb 17, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the NVMe-oF controller where hosts were being prematurely removed from subsystems, causing live pods to lose access to their volumes. The issue occurred when multiple PVCs on the same node were using volumes from the same subsystem - unpublishing one volume would incorrectly remove the host, breaking access for other pods still using the subsystem.

Changes:

  • Replaced incorrect namespace masking host check with a namespace count-based heuristic
  • Added logic to count other namespaces in the subsystem before deciding whether to remove a host
  • Updated log message to reflect the new logic

Comment thread internal/nvmeof/controller/controllerserver.go Outdated
Comment thread internal/nvmeof/controller/controllerserver.go Outdated
it was searching in host list. but host list
in Namespace is not belong to hostNqn we use.
this host list is for Namespace Masking feature.
(out of scope here).
I catched it, when I created 2 PVCs + 2 pods
then I remove 1 pod and I saw it deleted the host
even though there is still 1 live pod.

so in our case, we just need to check the len
of the ns list (for specific subsystemNqn)
if left more than 1, dont remove the host!

Signed-off-by: gadi-didi <gadi.didi@ibm.com>
@gadididi gadididi force-pushed the nvmeof/bug_fix_remove_host branch from 831be85 to 15ad853 Compare February 18, 2026 06:54
@gadididi gadididi added the ci/skip/e2e skip running e2e CI jobs label Feb 18, 2026
@nixpanic nixpanic requested a review from a team February 18, 2026 12:50
@mergify mergify Bot added the queued label Feb 18, 2026
mergify Bot added a commit that referenced this pull request Feb 18, 2026
@mergify

mergify Bot commented Feb 18, 2026

Copy link
Copy Markdown
Contributor

Merge Queue Status

Rule: default


This pull request spent 38 minutes 17 seconds in the queue, including 37 minutes 55 seconds running CI.

Required conditions to merge
  • #approved-reviews-by >= 2 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • any of:
    • all of:
      • base=devel
      • status-success=codespell
      • status-success=go-test
      • status-success=golangci-lint
      • status-success=lint-extras
      • status-success=mod-check
      • status-success=multi-arch-build
      • status-success=uncommitted-code-check
      • any of:
        • label=ci/skip/e2e
        • all of:
          • status-success=ci/centos/k8s-e2e-external-storage/1.33
          • status-success=ci/centos/k8s-e2e-external-storage/1.34
          • status-success=ci/centos/k8s-e2e-external-storage/1.35
          • status-success=ci/centos/mini-e2e-helm/k8s-1.33
          • status-success=ci/centos/mini-e2e-helm/k8s-1.34
          • status-success=ci/centos/mini-e2e-helm/k8s-1.35
          • status-success=ci/centos/mini-e2e/k8s-1.33
          • status-success=ci/centos/mini-e2e/k8s-1.34
          • status-success=ci/centos/mini-e2e/k8s-1.35
          • status-success=ci/centos/upgrade-tests-cephfs
          • status-success=ci/centos/upgrade-tests-rbd
    • all of:
      • base~=^(release-.+)$
      • status-success=codespell
      • status-success=go-test
      • status-success=golangci-lint
      • status-success=lint-extras
      • status-success=mod-check
      • status-success=multi-arch-build
      • status-success=uncommitted-code-check
      • any of:
        • label=ci/skip/e2e
        • all of:
          • status-success=ci/centos/k8s-e2e-external-storage/1.32
          • status-success=ci/centos/mini-e2e-helm/k8s-1.32
          • status-success=ci/centos/mini-e2e/k8s-1.32
          • status-success=ci/centos/k8s-e2e-external-storage/1.33
          • status-success=ci/centos/k8s-e2e-external-storage/1.34
          • status-success=ci/centos/mini-e2e-helm/k8s-1.33
          • status-success=ci/centos/mini-e2e-helm/k8s-1.34
          • status-success=ci/centos/mini-e2e/k8s-1.33
          • status-success=ci/centos/mini-e2e/k8s-1.34
          • status-success=ci/centos/upgrade-tests-cephfs
          • status-success=ci/centos/upgrade-tests-rbd
    • all of:
      • base=release-v3.15
      • status-success=codespell
      • status-success=go-test
      • status-success=golangci-lint
      • status-success=lint-extras
      • status-success=mod-check
      • status-success=multi-arch-build
      • status-success=uncommitted-code-check
      • any of:
        • label=ci/skip/e2e
        • all of:
          • status-success=ci/centos/k8s-e2e-external-storage/1.31
          • status-success=ci/centos/k8s-e2e-external-storage/1.32
          • status-success=ci/centos/mini-e2e-helm/k8s-1.31
          • status-success=ci/centos/mini-e2e-helm/k8s-1.32
          • status-success=ci/centos/mini-e2e/k8s-1.31
          • status-success=ci/centos/mini-e2e/k8s-1.32
          • status-success=ci/centos/k8s-e2e-external-storage/1.33
          • status-success=ci/centos/mini-e2e-helm/k8s-1.33
          • status-success=ci/centos/mini-e2e/k8s-1.33
          • status-success=ci/centos/upgrade-tests-cephfs
          • status-success=ci/centos/upgrade-tests-rbd
    • all of:
      • base=ci/centos
      • status-success=ci/centos/jjb-validate
      • status-success=ci/centos/job-validation

@mergify mergify Bot merged commit 64a8a26 into ceph:devel Feb 18, 2026
21 checks passed
@mergify mergify Bot removed the queued label Feb 18, 2026
@gadididi gadididi deleted the nvmeof/bug_fix_remove_host branch February 18, 2026 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci/skip/e2e skip running e2e CI jobs component/nvme-of Issues and PRs related to NVMe-oF.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants