nvmeof: fix bug in cleanup NVMe-oF resources by gadididi · Pull Request #5936 · ceph/ceph-csi

gadididi · 2026-01-15T13:47:28Z

Describe what this PR does

Problem

When CreateVolume fails during NVMe-oF resource setup, the cleanup defer does not execute properly, leaving orphaned subsystems (instead of removing it) on the gateway that cause subsequent volume creation attempts to success (but should not success).

Root Cause

The issue occurred in the following sequence:

CreateVolume creates an RBD volume successfully
createNVMeoFResources() is called to set up NVMe-oF resources:
- Creates subsystem successfully
- Attempts to create listener - FAILS (e.g., "Invalid parameters")
createNVMeoFResources() returns nil, error <-- Bug
Deferred cleanup in CreateVolume was not called at all! <-- Another Bug
Cleanup never executes - subsystem remains on gateway
Retry attempt sees existing subsystem and assumes it's ready, but it's actually incomplete (missing listeners) <-- Another Bug
Volume creation succeeds on retry but with incorrect configuration (missing the listeners..)

Example from logs:

I0114 14:09:57.263554       1 utils.go:350] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 GRPC call: /csi.v1.Controller/CreateVolume
I0114 14:09:57.466641       1 nvmeof.go:248] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 Creating NVMe subsystem: nqn.2016-06.io.spdk:cnode1.rook-ceph on gateway rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc:5500
I0114 14:09:57.490502       1 nvmeof.go:277] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 Subsystem created successfully: nqn.2016-06.io.spdk:cnode1.rook-ceph
I0114 14:09:57.490521       1 controllerserver.go:569] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 Creating listener 0: rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc:4420 (rook-ceph-nvmeof-nvmeof-0)
I0114 14:09:57.490525       1 nvmeof.go:367] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 Adding listener rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc to subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph
E0114 14:09:57.509716       1 controllerserver.go:152] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 NVMe-oF resource setup failed for volumeID 0001-0009-rook-ceph-0000000000000002-6a7447ca-ce4b-4fcf-a123-4ce22d9d5505: subsystem setup failed: failed to create listener 0 (rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc:4420 (rook-ceph-nvmeof-nvmeof-0)): gateway AddListener returned error (status=32602): Failure adding nqn.2016-06.io.spdk:cnode1.rook-ceph listener at rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc:4420: Invalid parameters
I0114 14:09:57.512304       1 omap.go:89] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 got omap values: (pool="nvmeofpool", namespace="", name="csi.volume.6a7447ca-ce4b-4fcf-a123-4ce22d9d5505"): map[csi.imageid:12971c079d96 csi.imagename:csi-vol-6a7447ca-ce4b-4fcf-a123-4ce22d9d5505 csi.volname:pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 csi.volume.owner:default]
I0114 14:09:57.533878       1 rbd_util.go:1548] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 rbd: snap rm nvmeofpool/csi-vol-6a7447ca-ce4b-4fcf-a123-4ce22d9d5505-temp@csi-vol-6a7447ca-ce4b-4fcf-a123-4ce22d9d5505 using mon 
I0114 14:09:57.536827       1 rbd_util.go:682] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 rbd: delete csi-vol-6a7447ca-ce4b-4fcf-a123-4ce22d9d5505-temp using mon 10.101.55.5:6789, pool nvmeofpool
I0114 14:09:57.538643       1 controllerserver.go:1106] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 deleting image csi-vol-6a7447ca-ce4b-4fcf-a123-4ce22d9d5505
I0114 14:09:57.538719       1 rbd_util.go:682] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 rbd: delete csi-vol-6a7447ca-ce4b-4fcf-a123-4ce22d9d5505 using mon 10.101.55.5:6789, pool nvmeofpool
I0114 14:09:57.557466       1 rbd_util.go:728] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 rbd: adding task to remove image "nvmeofpool/csi-vol-6a7447ca-ce4b-4fcf-a123-4ce22d9d5505" with id "12971c079d96" from trash
I0114 14:09:57.565129       1 rbd_util.go:756] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 rbd: successfully added task to move image "nvmeofpool/csi-vol-6a7447ca-ce4b-4fcf-a123-4ce22d9d5505" with id "12971c079d96" to trash
I0114 14:09:57.569464       1 omap.go:126] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 removed omap keys (pool="nvmeofpool", namespace="", name="csi.volumes.default"): [csi.volume.pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4]
E0114 14:09:57.569599       1 utils.go:355] ID: 15 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 GRPC error: rpc error: code = Internal desc = NVMe-oF setup failed: subsystem setup failed: failed to create listener 0 (rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc:4420 (rook-ceph-nvmeof-nvmeof-0)): gateway AddListener returned error (status=32602): Failure adding nqn.2016-06.io.spdk:cnode1.rook-ceph listener at rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc:4420: Invalid parameters
I0114 14:09:58.077290       1 utils.go:350] ID: 16 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 GRPC call: /csi.v1.Controller/CreateVolume
...
...
I0114 14:09:58.126102       1 controllerserver.go:1115] ID: 16 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 Connected to the gateway rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc:5500
I0114 14:09:58.126154       1 nvmeof.go:307] ID: 16 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 Checking if subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph exists on gateway rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc:5500
I0114 14:09:58.133641       1 nvmeof.go:323] ID: 16 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 Subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph exists
I0114 14:09:58.133669       1 controllerserver.go:699] ID: 16 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph and Listener [rook-ceph-nvmeof-nvmeof-0.rook-ceph.svc:4420 (rook-ceph-nvmeof-nvmeof-0)] for the subsystem were created
I0114 14:09:58.133686       1 nvmeof.go:112] ID: 16 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 Creating namespace for RBD nvmeofpool/csi-vol-df56be2d-81ee-45a9-97d1-f66152770812 in subsystem nqn.2016-06.io.spdk:cnode1.rook-ceph
I0114 14:09:58.801472       1 nvmeof.go:143] ID: 16 Req-ID: pvc-ba4f407d-fda1-4a11-afd1-d944eb4971c4 Namespace created with NSID: 1

you can see from the log that in the second try (by CSI mechanism) it passed and subsystem existed already, but it should not!!

Solution

1. Return nvmeofData on Error (Main Fix)

Modified createNVMeoFResources() to return nvmeofData instead of nil when errors occur after the struct is initialized:

// Before
if err := ensureSubsystem(...); err != nil {
    return nil, fmt.Errorf("subsystem setup failed: %w", err)  // ❌ Returns nil
}

// After
if err := ensureSubsystem(...); err != nil {
    return nvmeofData, fmt.Errorf("subsystem setup failed: %w", err)  // ✅ Returns nvmeofData
}

Why this fixes it: The cleanup defer now receives valid nvmeofData containing gateway connection info and subsystem details, allowing it to properly clean up the orphaned subsystem.

2. Add NamespaceID Guard

Added check in cleanupNVMeoFResources() to skip namespace deletion when it was never created:

if nvmeofData.NamespaceID > 0 {
    // Delete namespace
} else {
    log.DebugLog(ctx, "No namespace to delete (NamespaceID=0)")
}

Why this is needed: Since we now return nvmeofData even when namespace creation fails, NamespaceID may still be 0 (the default value). Attempting to delete namespace 0 would fail.

3. Fix ensureSubsystem to Create Listeners Even if Subsystem Exists

Modified ensureSubsystem() to attempt listener creation even when the subsystem already exists:

// Before
if !exists {
    err = gateway.CreateSubsystem(ctx, subsystemNQN, networkMask)
    if err != nil {
        return err
    }
    
    // Only create listeners if we just created the subsystem
    if networkMask == "" {
        for i, listener := range listeners { ... }
    }
}

// After
if !exists {
    err = gateway.CreateSubsystem(ctx, subsystemNQN, networkMask)
    if err != nil {
        return err
    }
}

// Create listeners regardless of whether subsystem was just created
// (handles retry scenarios and multiple PVCs with different listeners on same subsystem)
if networkMask == "" {
    for i, listener := range listeners { ... }
}

4. The defer was declared AFTER the call that could fail

// ❌ WRONG - defer declared after the call
nvmeofData, err = cs.createNVMeoFResources(...)
if err != nil {
    return nil, status.Errorf(...)  // Returns here, defer below never registered!
}
defer func() {  // ← This line never reached on error!
    if err == nil {
        return
    }
    cleanupErr := cs.cleanupNVMeoFResources(ctx, nvmeofData)
}()

Moved the cleanup defer to be declared BEFORE the call to createNVMeoFResources():

// ✅ CORRECT - defer declared before the call
var nvmeofData *nvmeof.NVMeoFVolumeData

defer func() {
    if err != nil && nvmeofData != nil {  // Added nil check
        cleanupErr := cs.cleanupNVMeoFResources(ctx, nvmeofData)
        if cleanupErr != nil {
            log.ErrorLog(ctx, "failed to cleanup NVMe-oF resources: %v", cleanupErr)
        }
    }
}()

nvmeofData, err = cs.createNVMeoFResources(...)
if err != nil {
    return nil, status.Errorf(...)  // Defer IS registered and will execute
}

Note: The RBD cleanup defer remains in its current position (after CreateVolume) because:

If RBD creation fails, nothing was created → no cleanup needed
If RBD creation succeeds, we need volumeID from the response for cleanup
NVMe-oF is different because partial resources (subsystem) can exist even on failure

Checklist:

Commit Message Formatting: Commit titles and messages follow
guidelines in the developer
guide.
Reviewed the developer guide on Submitting a Pull
Request
Pending release
notes
updated with breaking and/or notable changes for the next major release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

/retest ci/centos/<job-name>: retest the <job-name> after unrelated
failure (please report the failure too!)

add another case, if the subsystem of the rquested ns (that we want to remove) does not exist, dont raise an error. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

When CreateVolume fails during NVMe-oF resource setup (e.g., listener creation failure), the cleanup defer was not executing properly because createNVMeoFResources() returned nil for nvmeofData on error. This left orphaned subsystems on the gateway that prevented subsequent volume creation attempts. Changes: - Modified createNVMeoFResources() to return nvmeofData even on error (instead of nil), allowing cleanup to access gateway connection info and subsystem details - Added NamespaceID > 0 check in cleanupNVMeoFResources() to skip namespace deletion when it was never created (NamespaceID defaults to 0) Signed-off-by: gadi-didi <gadi.didi@ibm.com>

when there is some failure to remove some listener, it is ok to just print warning, because delete the subsystem will delete the listener. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

before this change,if the subsystem exists,it would not create listeners. but maybe another pvc on same subsystem but with different listener\s was requested, it would not create the listener\s (in manual mode of adding the listeners) now, even the subsystem exists, if we are in manual listener mode try to add them (if they exist, it is ok) Signed-off-by: gadi-didi <gadi.didi@ibm.com>

the defer in CreateVolume() for cleanup the NVMe-oF resoureces was in incorrect place, so in case of createNVMeoFResources() failure it was not called at all. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

nixpanic

Thanks, looks good to me.

nixpanic · 2026-01-19T09:01:40Z

 	// Step 3: Ensure subsystem exists (and listener)
 	if err := ensureSubsystem(ctx, gateway, nvmeofData.SubsystemNQN, networkMask, nvmeofData.ListenerInfo); err != nil {
-		return nil, fmt.Errorf("subsystem setup failed: %w", err)
+		return nvmeofData, fmt.Errorf("subsystem setup failed: %w", err)


I think it is nicer to not return nvmeofData in case of a failure. That would mean a cleanup needs to be done in this function too. It should make the code simpler to understand, but it also adds a little more complexity in this function 🤔

@nixpanic , I wanted to reuse the cleanupNVMeoFResources() from DeleteVolume().

So, if I return nil it means here:

defer func() { // Cleanup NVMe-oF resources on subsequent errors. // only if there was an error and nvmeofData is not nil, it means resources were created. if err != nil && nvmeofData != nil { log.DebugLog(ctx, "Cleaning up NVMe-oF resources for volume %q due to error: %v", volumeID, err) cleanupErr := cs.cleanupNVMeoFResources(ctx, nvmeofData) if cleanupErr != nil { log.ErrorLog(ctx, "failed to cleanup NVMe-oF resources for volume %q: %v", volumeID, cleanupErr) } } }()

I cannot to send to cs.cleanupNVMeoFResources() the nvmeofData because it will be empty. So, I will need to change the cleanupNVMeoFResources() to get subsystemNqn, NSid , and ListenerInfo..
(and the cleanupNVMeoFResources() that is called in the DeleteVolume() as well)
what do you think is the best way to clean in case of partial initialization?

My preference is to have each function cleanup the resources it allocated itself. That also means there is less re-use possible of functions like cleanupNVMeoFResources(). To structure it the way I would prefer, each function that allocates resources would probably get a cleanup-helper function, which then can get called from cleanupNVMeoFResources().

But I'm not sure that it really results in a simpler/cleaner approach... The current way is good enough for now, we can try to make it better maintainable later on.

Madhu-1 · 2026-01-20T10:35:58Z

@Mergifyio queue

mergify · 2026-01-20T10:36:06Z

queue

✅ The pull request has been merged automatically

Details

The pull request has been merged automatically at 6b688a9

mergify · 2026-01-20T10:36:06Z

gadididi requested a review from nixpanic January 15, 2026 13:47

gadididi self-assigned this Jan 15, 2026

gadididi added the component/nvme-of Issues and PRs related to NVMe-oF. label Jan 15, 2026

mergify Bot added the bug Something isn't working label Jan 15, 2026

gadididi force-pushed the nvmeof/fix_listener_bug branch from 6b69715 to 52cbada Compare January 15, 2026 15:25

gadididi marked this pull request as ready for review January 15, 2026 15:25

gadididi requested a review from Madhu-1 January 15, 2026 15:26

nixpanic reviewed Jan 15, 2026

View reviewed changes

Comment thread internal/nvmeof/nvmeof.go

Comment thread internal/nvmeof/controller/controllerserver.go

Comment thread internal/nvmeof/controller/controllerserver.go

nvmeof: fix DeleteNamespace to be idempotent

68e5e36

add another case, if the subsystem of the rquested ns (that we want to remove) does not exist, dont raise an error. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

gadididi force-pushed the nvmeof/fix_listener_bug branch from 52cbada to 3c27975 Compare January 18, 2026 08:05

gadididi added 4 commits January 18, 2026 10:22

nvmeof: Enhanced cleanupEmptySubsystem() to be idempotent

c29026d

when there is some failure to remove some listener, it is ok to just print warning, because delete the subsystem will delete the listener. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

nvmeof: fix defer location of cleanup nvmeof

8ebc1c2

the defer in CreateVolume() for cleanup the NVMe-oF resoureces was in incorrect place, so in case of createNVMeoFResources() failure it was not called at all. Signed-off-by: gadi-didi <gadi.didi@ibm.com>

gadididi force-pushed the nvmeof/fix_listener_bug branch from 3c27975 to 8ebc1c2 Compare January 18, 2026 08:22

gadididi requested a review from nixpanic January 18, 2026 08:25

nixpanic approved these changes Jan 19, 2026

View reviewed changes

nixpanic requested a review from a team January 19, 2026 09:02

Madhu-1 approved these changes Jan 20, 2026

View reviewed changes

Madhu-1 added the ci/skip/e2e skip running e2e CI jobs label Jan 20, 2026

mergify Bot added the queued label Jan 20, 2026

mergify Bot added a commit that referenced this pull request Jan 20, 2026

Merge of #5936

6266131

mergify Bot mentioned this pull request Jan 20, 2026

merge queue: embarking devel (982f093) and #5936 together #5948

Closed

78 tasks

mergify Bot merged commit 6b688a9 into ceph:devel Jan 20, 2026
21 of 22 checks passed

mergify Bot removed the queued label Jan 20, 2026

gadididi deleted the nvmeof/fix_listener_bug branch January 20, 2026 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvmeof: fix bug in cleanup NVMe-oF resources #5936

nvmeof: fix bug in cleanup NVMe-oF resources #5936
mergify[bot] merged 5 commits into
ceph:develfrom
gadididi:nvmeof/fix_listener_bug

gadididi commented Jan 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nixpanic left a comment

Uh oh!

nixpanic Jan 19, 2026

Uh oh!

gadididi Jan 19, 2026

Uh oh!

nixpanic Jan 19, 2026

Uh oh!

Madhu-1 commented Jan 20, 2026

Uh oh!

mergify Bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gadididi commented Jan 15, 2026

Describe what this PR does

Problem

Root Cause

Solution

1. Return nvmeofData on Error (Main Fix)

2. Add NamespaceID Guard

3. Fix ensureSubsystem to Create Listeners Even if Subsystem Exists

4. The defer was declared AFTER the call that could fail

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nixpanic left a comment

Choose a reason for hiding this comment

Uh oh!

nixpanic Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gadididi Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

nixpanic Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Madhu-1 commented Jan 20, 2026

Uh oh!

mergify Bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ The pull request has been merged automatically

Uh oh!

mergify Bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Queue Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mergify Bot commented Jan 20, 2026 •

edited

Loading

mergify Bot commented Jan 20, 2026 •

edited

Loading