-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[RayTrain] ScalingConfig resources_per_worker input validation/error handling #49372
Copy link
Copy link
Closed
Closed
Copy link
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalcommunity-backlogenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilitytrainRay Train Related IssueRay Train Related Issue
Description
Description
Adding error handler to help users identify when they have input an invalid resource type (e.g. misspelling a resource as "cpu" or "Memory", adding a parameter that does not exist, etc.)
Currently if you provide something like "memory" misspelt as "Memory" Ray will complain that your cluster lacks resources (even if you are requesting less than the available amount of resources).
This change adds a simple error check that will tell users if they have provided a misspelt or invalid resource name type,
(See slack thread for issue inspiration: https://ray.slack.com/archives/C053M5UBEVD/p1734471893141579)
Use case
When performing a training run, making sure that users can quickly identify a mistyped/misnamed ScalingConfig input.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalcommunity-backlogenhancementRequest for new feature and/or capabilityRequest for new feature and/or capabilitytrainRay Train Related IssueRay Train Related Issue