nvmeof: fix removeHost bug#6084
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical bug in the NVMe-oF controller where hosts were being prematurely removed from subsystems, causing live pods to lose access to their volumes. The issue occurred when multiple PVCs on the same node were using volumes from the same subsystem - unpublishing one volume would incorrectly remove the host, breaking access for other pods still using the subsystem.
Changes:
- Replaced incorrect namespace masking host check with a namespace count-based heuristic
- Added logic to count other namespaces in the subsystem before deciding whether to remove a host
- Updated log message to reflect the new logic
it was searching in host list. but host list in Namespace is not belong to hostNqn we use. this host list is for Namespace Masking feature. (out of scope here). I catched it, when I created 2 PVCs + 2 pods then I remove 1 pod and I saw it deleted the host even though there is still 1 live pod. so in our case, we just need to check the len of the ns list (for specific subsystemNqn) if left more than 1, dont remove the host! Signed-off-by: gadi-didi <gadi.didi@ibm.com>
831be85 to
15ad853
Compare
nixpanic
approved these changes
Feb 18, 2026
Madhu-1
approved these changes
Feb 18, 2026
78 tasks
Contributor
Merge Queue StatusRule:
This pull request spent 38 minutes 17 seconds in the queue, including 37 minutes 55 seconds running CI. Required conditions to merge
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ControllerUnPublishVolume()was searching in host list. but host listin Namespace is not belong to hostNqn we use.
this host list is for Namespace Masking feature.
(out of scope here).
I caught it, when I created 2 PVCs + 2 pods
then I remove 1 pod and I saw it deleted the host
even though there is still 1 live pod.
so in our case, we just need to check the len of the ns list (for specific subsystemNqn)
if left more than 1, don't remove the host!
Checklist:
guidelines in the developer
guide.
Request
notes
updated with breaking and/or notable changes for the next major release.
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>: retest the<job-name>after unrelatedfailure (please report the failure too!)