Marking a plasma manager as dead does not mark its local scheduler as dead.

The file `monitor-008015.err` on the head node looks like this.

```
WARNING:root:Timed out b'plasma_manager'
WARNING:root:Removed b'plasma_manager', client ID 00fb29d393f227ce044542f05065560325fb72fd
WARNING:root:Marked 1274 objects as lost.
```

The entry of `ray.global_state.client_table()` for this node is the following.

```
'172.31.30.57': [
  {'ClientType': 'plasma_manager',
   'DBClientID': '00fb29d393f227ce044542f05065560325fb72fd',
   'Deleted': True},
  {'AuxAddress': '172.31.30.57:11227',
   'ClientType': 'local_scheduler',
   'DBClientID': '46139b8d82494ce2480dfd37d98b05fea6da1984',
   'Deleted': False,
   'LocalSchedulerSocketName': '/tmp/scheduler40743926',
   'NumCPUs': 8.0,
   'NumGPUs': 0.0}]
```

So the plasma manager has been marked as dead, but the local scheduler on the same node has not.

When I run new workloads, it looks like tasks are scheduled on the node with the "dead" plasma manager. Note that when I run `ps aux | grep "plasma_manager " on the relevant node, the manager seems to still be alive.

What is the intended behavior here. If Ray thinks that the manager is dead, then shouldn't we stop assigning work that node?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marking a plasma manager as dead does not mark its local scheduler as dead. #569

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Marking a plasma manager as dead does not mark its local scheduler as dead. #569

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions