Skip to content

Conversation

@jabr
Copy link
Contributor

@jabr jabr commented Oct 26, 2025

Update: I'm successfully running an uncloud service with GPU access using a custom uc/uncloudd of this branch.

New tests pass and it compiles for me , but I have not yet tried this on a real deploy .

…es.reservations.devices` DeviceRequests to Docker container
@jabr jabr changed the title feat: support DeviceRequests (service.gpus and service.deploy.resources.reservations.devices) feat: support DeviceRequests (service.gpus and reservations.devices) Oct 26, 2025
@jabr
Copy link
Contributor Author

jabr commented Oct 26, 2025

I'm now running this build of uncloudd on my GPU server and successfully deployed a test service with this uc build:

services:
  nvidia:
    image: nvidia/cuda:12.8.1-base-ubuntu24.04
    command: tail -f /dev/null
    x-machines: [ gpu-server ]
    gpus: all

I then verified it has access to the GPU with (run on gpu-server):

docker exec <container-id-of-nvidia-service> nvidia-smi

Copy link
Owner

@psviderski psviderski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! ❤️

Just one thing we need to update before merging is the comparison logic in EvalContainerSpecChange: https://github.com/psviderski/uncloud/blob/main/pkg/client/deploy/container.go#L91-L93

	if !reflect.DeepEqual(current.Container.Resources, newResources) {
		return ContainerNeedsUpdate
	}

Before adding DeviceRequests to ContainerResources, all ContainerResources properties were mutable, so if the Resources specs differ, we returned ContainerNeedsUpdate.

DeviceRequests are not mutable so we need to compare them first and return ContainerNeedsRecreate if they differ. Then return ContainerNeedsUpdate if other Resources properties differ.

@jabr
Copy link
Contributor Author

jabr commented Oct 27, 2025

I renamed DeviceRequests to DeviceReservations and updated EvalContainerSpecChange to return a ContainerNeedsRecreate instead of ContainerNeedsUpdate when they are changed.

@psviderski psviderski merged commit 3b4bcb7 into psviderski:main Oct 27, 2025
4 checks passed
@jabr
Copy link
Contributor Author

jabr commented Nov 2, 2025

Just a heads up that I apparently broke something in the last cleanup commit or two on this. The DeviceRequests aren't getting sent in the spec to the servers anymore so a subsequent redeploy of a service didn't have GPU access.

I'll take a look shortly. Apologies for not testing it again after the initial commit. 😢

@psviderski
Copy link
Owner

No worries at all and thank you for raising this! Do you need any help with troubleshooting?

@jabr
Copy link
Contributor Author

jabr commented Nov 2, 2025

Ah, nevermind, false alarm.

I still had the first version of the daemon running on my server. It was still expecting the DeviceRequests name that we changed to DeviceReservations.

I updated both it and my local uc to a fresh build from main and it's working fine.

@jabr jabr deleted the gpu-support branch November 13, 2025 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants