-
-
Notifications
You must be signed in to change notification settings - Fork 116
feat: support DeviceRequests (service.gpus and reservations.devices) #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…es.reservations.devices` DeviceRequests to Docker container
service.gpus and service.deploy.resources.reservations.devices)|
I'm now running this build of I then verified it has access to the GPU with (run on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! ❤️
Just one thing we need to update before merging is the comparison logic in EvalContainerSpecChange: https://github.com/psviderski/uncloud/blob/main/pkg/client/deploy/container.go#L91-L93
if !reflect.DeepEqual(current.Container.Resources, newResources) {
return ContainerNeedsUpdate
}Before adding DeviceRequests to ContainerResources, all ContainerResources properties were mutable, so if the Resources specs differ, we returned ContainerNeedsUpdate.
DeviceRequests are not mutable so we need to compare them first and return ContainerNeedsRecreate if they differ. Then return ContainerNeedsUpdate if other Resources properties differ.
…h compose nomenclature
…er recreate rather than update
|
I renamed |
|
Just a heads up that I apparently broke something in the last cleanup commit or two on this. The DeviceRequests aren't getting sent in the spec to the servers anymore so a subsequent redeploy of a service didn't have GPU access. I'll take a look shortly. Apologies for not testing it again after the initial commit. 😢 |
|
No worries at all and thank you for raising this! Do you need any help with troubleshooting? |
|
Ah, nevermind, false alarm. I still had the first version of the daemon running on my server. It was still expecting the I updated both it and my local |
Update: I'm successfully running an uncloud service with GPU access using a custom
uc/unclouddof this branch.New tests pass and it compiles for me
, but I have not yet tried this on a real deploy.