Skip to content

Fix volume creation race + refactor (User)VolumeConfigController#12122

Merged
talos-bot merged 1 commit intosiderolabs:mainfrom
laurazard:user-volumeconfig-race
Nov 12, 2025
Merged

Fix volume creation race + refactor (User)VolumeConfigController#12122
talos-bot merged 1 commit intosiderolabs:mainfrom
laurazard:user-volumeconfig-race

Conversation

@laurazard
Copy link
Copy Markdown
Contributor

@laurazard laurazard commented Nov 3, 2025

Pull Request

Fixes #11992

What? (description)

Merges (System)VolumeConfig and UserVolumeConfig controllers into a single controller, w/ extensive refactors:

  • reuse "transformer" pattern (machine config -> VolumeConfig) for system+user volumes
  • reuses handleCustomVolumeConfig (now createVolume) for both as well
  • introduces a small internal type volumeResource to replace all the arguments passed to handleCustomConfig/createVolume
  • Merged tests from user_volume_config_test.go into volume_config_test.go
  • Removed the r.StartTrackingOutput/safe.CleanupOutputs logic, and reused the custom logic from the UserVolumeConfig controller for keeping track of VolumeConfigs/removing unused ones (added a label, block.SystemVolumeLabel for tracking)

Additionally, a small, internal volumeConfigBuilder convenience wrapper for building VolumeConfig resource modifier funcs is added:

newVolumeConfigBuilder().
	WithType(block.VolumeTypeDirectory).
	WithMount(block.MountSpec{
		TargetPath:   userVolumeConfig.Name(),
		ParentID:     constants.UserVolumeMountPoint,
		SelinuxLabel: constants.EphemeralSelinuxLabel,
		FileMode:     0o755,
		UID:          0,
		GID:          0,
		BindTarget:   pointer.To(userVolumeConfig.Name()),
	})

Why? (reasoning)

Merging VolumeConfig and UserVolumeConfig controllers removes the race condition between system and user volume config creation, and these controllers are quite similar already.

Acceptance

Please use the following checklist:

  • you linked an issue (if applicable)
  • you included tests (if applicable)
  • you ran conformance (make conformance)
  • you formatted your code (make fmt)
  • you linted your code (make lint)
  • you generated documentation (make docs)
  • you ran unit-tests (make unit-tests)

See make help for a description of the available targets.

@github-project-automation github-project-automation bot moved this to To Do in Planning Nov 3, 2025
@laurazard laurazard moved this from To Do to In Progress in Planning Nov 3, 2025
@laurazard laurazard self-assigned this Nov 3, 2025
@laurazard laurazard force-pushed the user-volumeconfig-race branch 15 times, most recently from 6ce183d to a666610 Compare November 4, 2025 11:31
@laurazard laurazard marked this pull request as ready for review November 4, 2025 11:40
@laurazard laurazard force-pushed the user-volumeconfig-race branch 3 times, most recently from 301aaa5 to 3f43ff9 Compare November 4, 2025 16:13
@laurazard laurazard force-pushed the user-volumeconfig-race branch 4 times, most recently from 79b3412 to ab822d2 Compare November 6, 2025 11:16
@laurazard laurazard requested a review from smira November 6, 2025 12:53
@smira
Copy link
Copy Markdown
Member

smira commented Nov 6, 2025

TestLVMActivation is failing 🤔 I wonder if it's my changes

that one is flaky if it fails saying 'locking failed'...

@laurazard laurazard force-pushed the user-volumeconfig-race branch from c97176d to ae80b65 Compare November 6, 2025 12:55
@smira smira moved this from In Progress to In Review in Planning Nov 6, 2025
Copy link
Copy Markdown
Member

@smira smira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆒

}

var (
userVolumeTransformer = func(c configconfig.Config) ([]volumeResource, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb question, what is the benefit of this vs.

func userVolumeTransformer(c configconfig.Config) ([]volumeResource, error)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this right now, but we could also move these functions into internal/... sub-package, make them public, so we could unit-test them in isolation as well

this might provide some better grouping, remove some code from this package into smaller sub-packages

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about doing that too! I think I have a branch locally with a change similar to that + some unit tests, but then I decided I had to draw a line somewhere and leave that for another PR 😅

I'll push those changes anyway so I don't forget.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb question, what is the benefit of this vs.

No real benefit (at least, I don't think there is), just style – I'm happy to change it to either form. I think had it this way initially to kind of emphasize the fact that these should be grouped together. I also thought about adding a dummy type ConfigResourceTransformer struct{} that just does these, which is good for grouping, but I don't love the unnecessary bloat.

Moving these to another package like you suggested is a good approach, so I'll do that (in a separate PR, if that's alright).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #12196

@github-project-automation github-project-automation bot moved this from In Review to Approved in Planning Nov 6, 2025
@laurazard
Copy link
Copy Markdown
Contributor Author

/m

@laurazard laurazard force-pushed the user-volumeconfig-race branch 4 times, most recently from 1a2efbf to 4c22a9b Compare November 11, 2025 16:59
@laurazard
Copy link
Copy Markdown
Contributor Author

/m

Previously, system volumes (`META`, `STATE`, etc.) were created by
`VolumeConfigController` and user volumes were created by
`UserVolumeConfigController`. This resulted in these controllers
racing to create volumes, which could cause partitions to be created in
an incorrect order.

This patch fixes this potential race by merging these two controllers
into a single controller, and refactoring a lot of the similar code
paths into one single pipeline for volume config handling.

Signed-off-by: Laura Brehm <laurabrehm@hey.com>
@smira smira force-pushed the user-volumeconfig-race branch from 4c22a9b to 43f4e31 Compare November 12, 2025 08:11
@smira
Copy link
Copy Markdown
Member

smira commented Nov 12, 2025

/m

@talos-bot talos-bot merged commit 43f4e31 into siderolabs:main Nov 12, 2025
58 checks passed
@github-project-automation github-project-automation bot moved this from Approved to Done in Planning Nov 12, 2025
@laurazard laurazard removed the request for review from shanduur November 12, 2025 09:28
smira added a commit to smira/talos that referenced this pull request Feb 19, 2026
This is a regression introduced in siderolabs#12122 and siderolabs#12141.

Without this, only `kubelet` holds `/var/mnt`, so on kubelet restart,
Talos tries to unmount it, cascading into unmount of all user volumes,
which shouldn't be the case.

Fixes siderolabs#12797

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
smira added a commit to smira/talos that referenced this pull request Feb 19, 2026
This is a regression introduced in siderolabs#12122 and siderolabs#12141.

Without this, only `kubelet` holds `/var/mnt`, so on kubelet restart,
Talos tries to unmount it, cascading into unmount of all user volumes,
which shouldn't be the case.

Fixes siderolabs#12797

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
smira added a commit to smira/talos that referenced this pull request Feb 19, 2026
This is a regression introduced in siderolabs#12122 and siderolabs#12141.

Without this, only `kubelet` holds `/var/mnt`, so on kubelet restart,
Talos tries to unmount it, cascading into unmount of all user volumes,
which shouldn't be the case.

Fixes siderolabs#12797

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
smira added a commit to smira/talos that referenced this pull request Mar 6, 2026
This is a regression introduced in siderolabs#12122 and siderolabs#12141.

Without this, only `kubelet` holds `/var/mnt`, so on kubelet restart,
Talos tries to unmount it, cascading into unmount of all user volumes,
which shouldn't be the case.

Fixes siderolabs#12797

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit e5b0eb0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

VolumeConfig resources are processed after RawVolumeConfig resources

3 participants