[refactor cluster-task-manage 1/n] separate resource reporting logic into helper class#22215
Conversation
|
Also we think there are two bugs in the scheduler resource reporting logic:
|
| namespace raylet { | ||
|
|
||
| /// Helper class that reports resource_load and resource_load_by_shape to gcs. | ||
| class SchedulerResourceReporter { |
There was a problem hiding this comment.
Is this local scheduler or cluster?
|
Just want to make sure I understand it correctly, after this one, |
We only report tasks without pending lease requests to local raylet. For tasks with pending lease requests, they will be reported by the raylets that receive the lease requests. Combined together, gcs should see an almost accurate resource requirement. |
Let's remove that comment. The code doesn't seem to make that assumption. https://sourcegraph.com/github.com/ray-project/ray/-/blob/python/ray/autoscaler/_private/monitor.py?L85
I think that line reference changed? But if you're referring to the worker backlog reporting, I think the leawse request will get counted, but at the other raylet? (there is a short race condition when it's in transit and not counted i guess). |
…into helper class (ray-project#22215) Separate Scheduler Resource Reporting logic into a separate class for better readability and maintainability.
Why are these changes needed?
Separate Scheduler Resource Reporting logic into a separate class for better readability and maintainability.
Related issue number
Checks
scripts/format.shto lint the changes in this PR.