-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't
Description
What is the problem?
Ray version and other system information (Python version, TensorFlow version, OS):
There are several problems about new scheduler used in raylet.
- wrong allocation result for
softresource constraint in ClusterResourceScheduler::AllocateResourceInstances [New scheduler] Fix new scheduler bug #9467
In the following code (just copy from master), in the first for statement, we do this (*allocation)[i] = 1.; available[i] = 0;, but int the second for statement, we may clear it with (*allocation)[i] = available[i]; (because available[i] maybe set 0 in the first for statement)
if (remaining_demand >= 1.) {
for (size_t i = 0; i < available.size(); i++) {
if (available[i] == 1.) {
// Allocate a full unit-capacity instance.
(*allocation)[i] = 1.;
available[i] = 0;
remaining_demand -= 1.;
}
if (remaining_demand < 1.) {
break;
}
}
}
if (soft) {
// Just get as many resources as available.
for (size_t i = 0; i < available.size(); i++) {
if (available[i] >= remaining_demand) {
available[i] -= remaining_demand;
(*allocation)[i] = remaining_demand;
return true;
} else {
(*allocation)[i] = available[i];
remaining_demand -= available[i];
available[i] = 0;
}
}
return true;
}- we should call
new_resource_scheduler_->RemoveNode(...)in NodeManager::NodeRemoved to remove the resource information of the removed node. If not, the new scheduler may make a wrong schedule decision. [New scheduler] Fix new scheduler bug #9467
Reproduction (REQUIRED)
Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):
If we cannot run your script, we cannot fix your issue.
- I have verified my script runs in a clean environment and reproduces the issue.
- I have verified the issue also occurs with the latest wheels.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't