Skip to content

nvmeof: Add mount cache and locking for safe nvme disconnect#6192

Merged
mergify[bot] merged 3 commits into
ceph:develfrom
gadididi:nvmeof/add_mnt_cache_nodeserver
Apr 9, 2026
Merged

nvmeof: Add mount cache and locking for safe nvme disconnect#6192
mergify[bot] merged 3 commits into
ceph:develfrom
gadididi:nvmeof/add_mnt_cache_nodeserver

Conversation

@gadididi

@gadididi gadididi commented Mar 22, 2026

Copy link
Copy Markdown
Contributor

Describe what this PR does

Adds an in-memory cache to track which nvme devices are currently mounted at staging paths. This eliminates expensive findmnt --list system calls on every unstage operation.

Structure:
The cache maintains bidirectional maps for fast lookups:

  • pathToDevice: staging path -> device path
  • deviceToPath: device path -> staging path (reverse)

Initialization:
Cache is populated at node server startup by scanning existing mounts, then kept up to date as volumes are staged and unstaged.

Thread Safety:
Protected by sync.Mutex. We use a simple mutex instead of sync.RWMutex because:

  • Operations are very fast ( map lookups or remove. just 1 call for get copy , which it is called once per delete pod.. )
  • Read/write ratio is - not read-heavy

Why copying the map is safe:
When GetAllDevices() returns a copy of the cache, callers get a consistent snapshot at that moment. The copy happens after cache updates (first, the pod unmounts, then deletes itself, and last fetch the copy of cache), so concurrent unstage operations each see current state. Even if two unstages run simultaneously, each removes its device from the cache first, then gets a fresh snapshot showing remaining devices.
So the last of them who call to GetAllDevices() will get the update list, and will know if disconnect or not. (NodeStage cannot run at this time !!)

Related PR##

after this pr: #6183 will be merged, group locking (between NodeStage\NodeUnStage) will added.

Future concerns

next step is to add group lock after #6183 will be merged.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@gadididi gadididi requested a review from nixpanic March 22, 2026 15:03
@gadididi gadididi self-assigned this Mar 22, 2026
@gadididi gadididi added the component/nvme-of Issues and PRs related to NVMe-oF. label Mar 22, 2026
@gadididi gadididi requested review from Rakshith-R and Copilot March 23, 2026 06:14

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an in-memory mount cache for NVMe-oF staging mounts so NodeUnstage can avoid repeated findmnt --list calls and decide when it’s safe to disconnect NVMe controllers.

Changes:

  • Changed GetAllNVMeMountedDevices() to return device -> stagingTarget mappings (instead of a bool set).
  • Introduced MountCache (bidirectional device/path cache) and initialized it at NodeServer startup.
  • Updated NodeUnstage disconnect decision flow to use the in-memory cache snapshot.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
internal/nvmeof/util/mounter.go Changes mounted-device discovery to return device→mountpoint mapping.
internal/nvmeof/util/mounter_cache.go Adds new mutex-protected mount cache with bidirectional maps.
internal/nvmeof/nvmeof_initiator.go Updates disconnect interfaces/helpers to accept the new mounted-devices map type.
internal/nvmeof/nodeserver/nodeserver.go Initializes cache on startup; updates stage/unstage flows to maintain and consult the cache for disconnect decisions.

Comment thread internal/nvmeof/util/mounter_cache.go
Comment thread internal/nvmeof/util/mounter_cache.go Outdated
Comment thread internal/nvmeof/util/mounter_cache.go
Comment thread internal/nvmeof/util/mounter_cache.go Outdated
Comment on lines 515 to +525
transaction.isMounted = true

// Resolve real device and update cache
realDevice, err := nvmeutil.GetDeviceFromMountpoint(ctx, stagingTargetPath)
if err != nil {
log.WarningLog(ctx, "failed to resolve device: %v", err)
// Fallback - try to continue
realDevice = devicePath
}

ns.mountCache.Add(realDevice, stagingTargetPath)

Copilot AI Mar 23, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between the successful mount (transaction.isMounted = true) and the later ns.mountCache.Add call, there is a window where the namespace is mounted in the system but missing from mountCache. Because NodeStage/NodeUnstage are only locked per-volume (volumeLocks), a concurrent unstage of a different namespace in the same subsystem could take a snapshot that omits this mount and incorrectly disconnect shared controllers. To avoid false negatives, consider updating mountCache immediately after the mount succeeds (using the best-available device identity) or adding a safety validation (e.g., re-check mounts via findmnt) before disconnect decisions.

Suggested change
transaction.isMounted = true
// Resolve real device and update cache
realDevice, err := nvmeutil.GetDeviceFromMountpoint(ctx, stagingTargetPath)
if err != nil {
log.WarningLog(ctx, "failed to resolve device: %v", err)
// Fallback - try to continue
realDevice = devicePath
}
ns.mountCache.Add(realDevice, stagingTargetPath)
// Immediately record the mount in the cache using the best-available device identity
ns.mountCache.Add(devicePath, stagingTargetPath)
// Resolve real device and refine cache entry if possible
realDevice, err := nvmeutil.GetDeviceFromMountpoint(ctx, stagingTargetPath)
if err != nil {
log.WarningLog(ctx, "failed to resolve device: %v", err)
// Fallback - keep using devicePath for cache
} else if realDevice != "" && realDevice != devicePath {
// Update cache to use the canonical device identifier for this mountpoint
ns.mountCache.RemoveByMountPoint(stagingTargetPath)
ns.mountCache.Add(realDevice, stagingTargetPath)
}
transaction.isMounted = true

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because NodeStage/NodeUnstage are only locked per-volume (volumeLocks), a concurrent unstage of a different namespace in the same subsystem could take a snapshot that omits this mount and incorrectly disconnect shared controllers

As I said in the PR description, in the next PR I will add group lock(which locks per group type stage\unstage) . Once this PR will be merged: #6183

Comment thread internal/nvmeof/nodeserver/nodeserver.go
@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch 2 times, most recently from 8673fba to 077fce8 Compare March 24, 2026 14:22
@gadididi gadididi requested a review from Copilot March 24, 2026 14:24

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread internal/nvmeof/nodeserver/nodeserver.go
Comment thread internal/nvmeof/util/mounter.go Outdated
Comment thread internal/nvmeof/nodeserver/nodeserver.go Outdated
@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch from 077fce8 to 9558efb Compare March 25, 2026 07:46
Comment thread internal/nvmeof/util/mounter_cache.go Outdated
// MountCache is a thread-safe cache that maintains the mapping between staging paths and devices.
// Note: This map is 1:1, meaning each staging path corresponds to exactly one device.
// device can only be mounted to one staging path at a time, and each staging path can only have one device.
type MountCache struct {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a MountCache interface and mountCache struct that contains the actual implementation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread internal/nvmeof/util/mounter_cache.go
}

// Initialize mounted devices cache on startup to ensure we have an accurate view
mountedDevices, err := getNVMeMountedDevices(context.Background())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make getNVMeMountedDevices() a function of NodeServer (call it initNVMeMountedDevices()?) and add the path/device directly in that function. This removes the for-loop a little below and makes this function a little smaller.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch 2 times, most recently from 5e46539 to 6d03f95 Compare March 30, 2026 09:31
@gadididi gadididi requested a review from nixpanic March 30, 2026 09:31
Comment thread internal/nvmeof/nodeserver/nodeserver.go
@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch 2 times, most recently from 53bba11 to 2fe4143 Compare March 30, 2026 11:59
@gadididi gadididi requested a review from nixpanic March 30, 2026 12:23
@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch from 2fe4143 to 0803810 Compare March 30, 2026 12:23
@gadididi gadididi requested a review from Copilot March 30, 2026 12:40

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread internal/nvmeof/nodeserver/nodeserver.go Outdated
@gadididi gadididi force-pushed the nvmeof/add_mnt_cache_nodeserver branch from 0803810 to 353a958 Compare April 7, 2026 06:47
@gadididi gadididi requested review from a team and removed request for Rakshith-R April 7, 2026 06:47
@nixpanic

nixpanic commented Apr 8, 2026

Copy link
Copy Markdown
Member

@Mergifyio rebase

@ceph-csi-bot ceph-csi-bot force-pushed the nvmeof/add_mnt_cache_nodeserver branch from 353a958 to 2a614e0 Compare April 8, 2026 16:49
@mergify

mergify Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Deprecation notice: This pull request comes from a fork and was rebased using bot_account impersonation. This capability will be removed on July 1, 2026. After this date, the rebase action will no longer be able to rebase fork pull requests with this configuration. Please switch to the update action/command to ensure compatibility going forward.

@mergify

mergify Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

rebase

✅ Branch has been successfully rebased

@nixpanic

nixpanic commented Apr 8, 2026

Copy link
Copy Markdown
Member

/test ci/centos/mini-e2e/k8s-1.35

@gadididi gadididi requested a review from a team April 8, 2026 20:09
@Madhu-1

Madhu-1 commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

@Mergifyio rebase

@Madhu-1

Madhu-1 commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

@Mergifyio queue

@mergify

mergify Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

rebase

☑️ Nothing to do, the required conditions are not met

Details
  • any of:
    • #commits-behind > 0 [📌 rebase requirement]
    • -linear-history [📌 rebase requirement]
  • -closed [📌 rebase requirement]
  • -conflict [📌 rebase requirement]
  • queue-position = -1 [📌 rebase requirement]

@mergify

mergify Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Merge Queue Status

🛑 Queue command has been cancelled

@nixpanic nixpanic added the ok-to-test Label to trigger E2E tests label Apr 9, 2026
@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.35

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.34

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.35

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.34

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.33

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e/k8s-1.35

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e/k8s-1.34

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.33

@ceph-csi-bot

Copy link
Copy Markdown
Collaborator

/test ci/centos/mini-e2e/k8s-1.33

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Apr 9, 2026
@mergify

mergify Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Deprecation notice: This pull request comes from a fork and was queued with update_method=rebase and update_bot_account impersonation. This capability will be removed on July 1, 2026. After this date, the merge queue will no longer be able to rebase fork pull requests with this configuration. To avoid disruption, switch to update_method=merge in your queue rule.

@mergify mergify Bot added the queued label Apr 9, 2026
@mergify

mergify Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

Merge Queue Status

  • Entered queue2026-04-09 10:25 UTC · Rule: default
  • Checks skipped · PR is already up-to-date
  • Merged2026-04-09 10:25 UTC · at 2a614e0a43d24b515412b2f3376ba5856fd68b1d

This pull request spent 15 seconds in the queue, including 1 second running CI.

Required conditions to merge

@mergify mergify Bot merged commit 736dc70 into ceph:devel Apr 9, 2026
41 checks passed
@gadididi gadididi deleted the nvmeof/add_mnt_cache_nodeserver branch April 9, 2026 10:26
@mergify mergify Bot removed the queued label Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/nvme-of Issues and PRs related to NVMe-oF.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants