Skip to content

Conversation

@kvaps
Copy link
Member

@kvaps kvaps commented Jan 12, 2026

What this PR does

Removes node-level RWX block validation from linstor-csi as controller-level check is sufficient. The controller already validates that all pods attached to RWX block volume belong to the same VM by extracting vmName from pod owner references (VirtualMachineInstance).

This simplifies the validation logic and fixes VM live migration issues.

Release note

[linstor] Remove node-level RWX block validation to fix VM live migration

Summary by CodeRabbit

  • Improvements

    • Enhanced RWX (read-write-many) block validation with VM-aware checks across node and controller flows, including support for hotplug-disk pods and stricter prevention of cross-VM block sharing.
    • Improved propagation and resolution of VM identity for attachments to ensure consistent validation.
  • Tests

    • Added comprehensive unit tests covering single/multiple pod scenarios, VM ownership, hotplug disks, upgrade paths, and legacy volumes.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 12, 2026

📝 Walkthrough

Walkthrough

Controller and node RWX block validation were extended to be VM-aware: controller-side validation now derives VM identity and propagates it in publish context; node-side validation consumes that VM identity to enforce single-VM RWX block usage. Hotplug-disk handling and unit tests were added.

Changes

Cohort / File(s) Summary
Driver constants
pkg/driver/driver.go
Added exported constants for VM ownership labeling: KubeVirtVMLabel and KubeVirtHotplugDiskLabel.
RWX validation logic
packages/system/linstor/images/linstor-csi/patches/001-rwx-validation.diff
Introduces validateRWXBlockAttachment flow for ControllerPublishVolume to derive VM name from pods/PVCs (including hotplug-disk pods) and to surface VM identity for RWX block attachments; enforces precondition failures on cross-VM block sharing.
Node-side validation integration
(changes in patch)
NodePublishVolume now reads VM identity from publish context and validates RWX block attachments on the node using that VM information.
Tests & test helpers
rwx_validation_test.go, test utility helpers (in patch)
New unit tests covering single/multiple pods, VM ownership scenarios, hotplug-disk pods, upgrade/legacy cases; added helpers to build unstructured Pods/PVs and extract VM names for tests.

Sequence Diagram

sequenceDiagram
    participant Client
    participant CSIController as CSI Controller
    participant ControllerVal as validateRWXBlockAttachment
    participant K8sAPI as Kubernetes API
    participant PubCtx as PublishContext
    participant Node as NodePublishVolume
    participant NodeVal as Node-side RWX Validation

    Client->>CSIController: ControllerPublishVolume(request)
    CSIController->>ControllerVal: validateRWXBlockAttachment(pvc)
    ControllerVal->>K8sAPI: List PV/PVC/Pods using PVC
    ControllerVal->>K8sAPI: Resolve VM name from Pod (labels/ownerRefs / hotplug)
    ControllerVal-->>CSIController: vmName or error
    CSIController->>PubCtx: Embed vmName in PublishContext
    Client->>Node: NodePublishVolume(request + PublishContext)
    Node->>PubCtx: Retrieve vmName
    Node->>NodeVal: validateRWXBlockOnNode(vmName)
    NodeVal->>K8sAPI: (optional) verify local attachments
    NodeVal-->>Node: validation result
    Node-->>Client: NodePublishVolume response
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibble logs and follow the trail,

VM names carried in context like mail,
RWX rules now tied to a single machine,
Hotplug friends join the validation scene,
Hooray — cleaner mounts for our storage tale!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: improving RWX validation for VM live migration, which directly relates to the changeset's core objective of enabling VMs to share block devices during live migration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers significant stability and functionality improvements to the LINSTOR ecosystem. It refines the validation logic for ReadWriteMany (RWX) block volumes, ensuring proper resource isolation and sharing for virtual machines. Concurrently, it rectifies a critical bug in the Piraeus server that previously led to TCP port conflicts during disk toggle operations, thereby preventing resource unavailability and improving overall system resilience.

Highlights

  • Improved RWX Block Validation: The linstor-csi driver now tracks vmName via publish_context for ReadWriteMany (RWX) block volumes. This enhancement allows multiple pods from the same VM to share a block device while preventing different VMs from accessing the same block device, which is crucial for scenarios like VM live migration.
  • Fix for Duplicate TCP Ports: A patch has been applied to the piraeus-server to resolve an issue causing duplicate TCP port assignments after toggle-disk operations. This was due to a redundant ensureStackDataExists() call in resetStoragePools() with an empty payload, leading to resources getting stuck in a StandAlone state.
  • Image Tag Updates: The image tags for piraeus-server and linstor-csi have been updated to incorporate the rebuilt images containing these new fixes and improvements.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two significant improvements: enhanced RWX block volume validation for KubeVirt live migration scenarios and a fix for duplicate TCP port assignments in piraeus-server. The RWX validation logic is comprehensive, with checks on both the controller and node sides. The fix for TCP ports appears correct based on the detailed explanation. My review focuses on the new node-side validation logic, where I've identified several high-severity issues. The current implementation fails open on file system errors, which could bypass the security checks. I have provided specific suggestions to adopt a "fail-closed" approach by returning errors, ensuring the system remains secure even when encountering unexpected I/O issues.

Comment on lines 274 to 276
+ if err := os.WriteFile(vmNameFile, []byte(vmName), 0o600); err != nil {
+ d.log.WithError(err).Warn("failed to save VM name for RWX validation")
+ }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If writing the .vmname file fails, the error is only logged, and the function proceeds to successfully return nil. This allows the mount to succeed without persisting the VM ownership information. Consequently, a subsequent mount request for a pod from a different VM will also be allowed, as it will not find the .vmname file and will fall into the "upgrade scenario" logic. This defeats the purpose of the validation. The function should return an error if it fails to write the tracking file.

			if err := os.WriteFile(vmNameFile, []byte(vmName), 0o600); err != nil {
				return fmt.Errorf("failed to save VM name for RWX validation: %w", err)
			}

Comment on lines 295 to 297
+ if writeErr := os.WriteFile(vmNameFile, []byte(vmName), 0o600); writeErr != nil {
+ d.log.WithError(writeErr).Warn("failed to save VM name for RWX validation")
+ }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to a previous point, if writing the .vmname file fails during the upgrade scenario, the error is only logged as a warning. This allows the mount to proceed but leaves the system in an inconsistent state where the VM ownership is not recorded. This would allow a pod from a different VM to mount the volume later. To ensure data integrity and security, an error should be returned to fail the mount operation.

				if writeErr := os.WriteFile(vmNameFile, []byte(vmName), 0o600); writeErr != nil {
					return fmt.Errorf("failed to save VM name for RWX validation during upgrade scenario: %w", writeErr)
				}

Comment on lines 315 to 317
+ d.log.WithError(err).Warn("failed to read existing VM name for RWX validation")
+
+ return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When reading the existing .vmname file fails with an error other than os.IsNotExist, the error is logged as a warning and the validation is skipped by returning nil. This is a "fail-open" approach. If the file is expected to be there but cannot be read (e.g., due to permissions), it's safer to "fail-closed" by returning an error. This prevents a potentially incorrect mount from proceeding.

		return fmt.Errorf("failed to read existing VM name for RWX validation: %w", err)

@kvaps kvaps force-pushed the fix/linstor-rwx-validation-and-tcp-ports branch from c52c342 to 27343c9 Compare January 12, 2026 21:25
@kvaps kvaps changed the title [linstor] Improve RWX validation and fix duplicate TCP ports [linstor] Improve RWX validation for VM live migration Jan 12, 2026
@kvaps kvaps force-pushed the fix/linstor-rwx-validation-and-tcp-ports branch from 27343c9 to 8c825c0 Compare January 12, 2026 21:26
@kvaps kvaps marked this pull request as ready for review January 12, 2026 21:27
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jan 12, 2026
@kvaps kvaps added backport Should change be backported on previus release backport-previous labels Jan 12, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/system/linstor/images/linstor-csi/patches/001-rwx-validation.diff (2)

126-131: Consider using a field selector or label selector to reduce pod listing overhead.

Listing all pods in a namespace and filtering in-memory could become expensive in namespaces with many pods. While Kubernetes doesn't support field selectors for volume claims, you could consider adding a known label to pods that use LINSTOR volumes to enable more efficient querying.


625-631: Test assertion may fail for "pods in different namespace" case.

The assertion assert.NotEmpty(t, vmName) when len(tc.pods) > 0 could fail for test cases where pods exist but are in a different namespace than the PVC. In the "pods in different namespace should not conflict" test case, only 1 pod is in the "default" namespace (the one the PVC references), so this should still pass, but the assertion logic is fragile.

Consider making the assertion more precise by checking against expected vmName values in the test cases.

♻️ Suggested improvement

Add an expectedVMName field to the test case struct:

 	testCases := []struct {
 		name          string
 		pods          []*unstructured.Unstructured
 		pvcName       string
 		namespace     string
 		expectError   bool
 		errorMsg      string
+		expectedVMName string
 	}{

Then assert against the expected value:

 			} else {
 				assert.NoError(t, err)
-				// VM name is returned when there are pods using the volume
-				if len(tc.pods) > 0 {
-					assert.NotEmpty(t, vmName)
-				}
+				assert.Equal(t, tc.expectedVMName, vmName)
 			}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8025657 and 8c825c0.

📒 Files selected for processing (1)
  • packages/system/linstor/images/linstor-csi/patches/001-rwx-validation.diff
🧰 Additional context used
📓 Path-based instructions (1)
packages/system/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/system/linstor/images/linstor-csi/patches/001-rwx-validation.diff
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build
🔇 Additional comments (8)
packages/system/linstor/images/linstor-csi/patches/001-rwx-validation.diff (8)

9-17: LGTM!

The node-side validation is correctly placed before the actual mount operation, using the vmName from the publish context for RWX block volumes. The condition properly checks both block volume type and MULTI_NODE_MULTI_WRITER access mode.


26-36: LGTM!

The KubeVirt label constants and GVR definitions are correctly defined for the intended use case.


38-87: LGTM!

The function correctly handles both regular virt-launcher pods and hotplug disk pods. The owner reference traversal for hotplug disks is properly implemented with appropriate error handling.


345-372: LGTM!

The controller correctly validates RWX block attachments and propagates the VM name through the publish context to enable node-side validation.


452-633: LGTM - Comprehensive test coverage for validateRWXBlockAttachment.

The table-driven tests cover the key scenarios including live migration, strict mode, hotplug disks, and edge cases. The test helper functions are well-structured.


774-914: LGTM - Good test coverage for node-side validation.

The tests for validateRWXBlockOnNode properly cover first mount, second mount (same/different VM), upgrade scenarios, and legacy VolumeAttachment handling.


381-381: No issues found. Line 381 correctly defines VMName = linstor.ParameterNamespace + "/vm-name" with the proper linstor namespace prefix. No "linsterr" typo exists in the patch file.

Likely an incorrect or invalid review comment.


246-280: Clarify whether .vmname cleanup is intentionally delegated to kubelet or if explicit cleanup should be added to NodeUnpublishVolume.

The .vmname file is created during first mount but no explicit cleanup code exists in this patch. The design treats it as persistent metadata (mountCount deliberately excludes it), which works correctly for state validation across mount cycles. However, confirm whether relying on kubelet to clean up the mount base directory is the intended behavior, or if NodeUnpublishVolume should explicitly remove .vmname when the last mount is unpublished.

Remove node-level RWX block validation from linstor-csi as
controller-level check is sufficient. The controller already validates
that all pods attached to RWX block volume belong to the same VM by
extracting vmName from pod owner references (VirtualMachineInstance).

This simplifies the validation logic and fixes VM live migration issues.

Update linstor-csi image tag with rebuilt image containing the fix.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@kvaps kvaps force-pushed the fix/linstor-rwx-validation-and-tcp-ports branch from 8c825c0 to 74589ce Compare January 12, 2026 22:09
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 12, 2026
@kvaps kvaps changed the title [linstor] Improve RWX validation for VM live migration [linstor] Remove node-level RWX validation Jan 12, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
packages/system/linstor/images/linstor-csi/patches/001-rwx-validation.diff (2)

78-79: Consider using Debug log level for routine validation calls.

The Info level logging on line 79 and lines 171-187 may be too verbose for production environments where this function is called frequently. Line 217 already uses Debug level for similar logging.

♻️ Suggested change
 func (d Driver) validateRWXBlockAttachment(ctx context.Context, volumeID string) (string, error) {
-	d.log.WithField("volumeID", volumeID).Info("validateRWXBlockAttachment called")
+	d.log.WithField("volumeID", volumeID).Debug("validateRWXBlockAttachment called")

Also apply similar changes to the Info logs on lines 171-178 and 182-187.


109-113: Consider optimizing pod listing for large namespaces.

Listing all pods in a namespace without filters could be expensive in namespaces with many pods. While this may be acceptable for typical KubeVirt deployments, consider adding pagination or evaluating if field selectors on volume claims are feasible.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c825c0 and 74589ce.

📒 Files selected for processing (1)
  • packages/system/linstor/images/linstor-csi/patches/001-rwx-validation.diff
🧰 Additional context used
📓 Path-based instructions (1)
packages/system/**

📄 CodeRabbit inference engine (AGENTS.md)

Use Helm Chart umbrella pattern with vendored upstream charts in charts/ directory

Files:

  • packages/system/linstor/images/linstor-csi/patches/001-rwx-validation.diff
🔇 Additional comments (5)
packages/system/linstor/images/linstor-csi/patches/001-rwx-validation.diff (5)

9-19: LGTM!

The KubeVirt label constants and GroupVersionResource definitions are correctly defined and well-documented.


21-70: LGTM!

The VM name extraction logic correctly handles both regular virt-launcher pods and hotplug disk pods. The owner reference traversal for hotplug disks is appropriate, and error handling covers edge cases well.


229-236: LGTM!

The RWX block validation is correctly integrated into ControllerPublishVolume. Using FailedPrecondition status code is appropriate for validation failures, and discarding the returned vmName is intentional since only validation is needed at the controller level.


451-456: Verify test assertion for edge cases without VM labels.

The assertion assert.NotEmpty(t, vmName) when len(tc.pods) > 0 could fail if a test case has pods without VM labels that aren't filtered out (single pod without label scenario). Currently all test cases with expectError: false and pods have at least one pod with a VM label, so this should work. Consider adding a test case for a single pod without a VM label to explicitly document that behavior.


559-598: No issue found with ownerReferences boolean handling.

The fake dynamic client correctly converts the "controller": true boolean value to a *bool pointer when GetOwnerReferences() is called. The Kubernetes apimachinery library's runtime converter handles this transparently through standard JSON marshaling/unmarshaling. The test case "hotplug disk pod with virt-launcher (should succeed)" explicitly validates this scenario and expects it to succeed, confirming the conversion works as intended. The production code also includes defensive nil-checking with owner.Controller == nil, which safely handles any edge cases.

@kvaps kvaps merged commit 65fcee0 into main Jan 12, 2026
24 checks passed
@kvaps kvaps deleted the fix/linstor-rwx-validation-and-tcp-ports branch January 12, 2026 22:56
@github-actions
Copy link

Backport failed for release-0.40, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-0.40
git worktree add -d .worktree/backport-1851-to-release-0.40 origin/release-0.40
cd .worktree/backport-1851-to-release-0.40
git switch --create backport-1851-to-release-0.40
git cherry-pick -x 74589ce9151706ee6687ad7b0233eafcbb987297

@github-actions
Copy link

Backport failed for release-0.39, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin release-0.39
git worktree add -d .worktree/backport-1851-to-release-0.39 origin/release-0.39
cd .worktree/backport-1851-to-release-0.39
git switch --create backport-1851-to-release-0.39
git cherry-pick -x 74589ce9151706ee6687ad7b0233eafcbb987297

kvaps added a commit that referenced this pull request Jan 13, 2026
## What this PR does

Removes node-level RWX block validation from linstor-csi as
controller-level check is sufficient. The controller already validates
that all pods attached to RWX block volume belong to the same VM by
extracting vmName from pod owner references (VirtualMachineInstance).

This simplifies the validation logic and fixes VM live migration issues.

### Release note

```release-note
[linstor] Remove node-level RWX block validation to fix VM live migration
```

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Improvements**
* Enhanced RWX (read-write-many) block validation with VM-aware checks
across node and controller flows, including support for hotplug-disk
pods and stricter prevention of cross-VM block sharing.
* Improved propagation and resolution of VM identity for attachments to
ensure consistent validation.

* **Tests**
* Added comprehensive unit tests covering single/multiple pod scenarios,
VM ownership, hotplug disks, upgrade paths, and legacy volumes.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Should change be backported on previus release backport-previous bug Something isn't working size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants