Skip to content

The many_tasks.py long running stress test crashes after an hour. #4256

@robertnishihara

Description

@robertnishihara

The relevant test is https://github.com/ray-project/ray/blob/master/ci/long_running_tests/workloads/many_tasks.py.

The issue appears to be that the node runs out of memory because the first raylet has many GB of messages buffered and waiting to be written to the other raylets. I determined this by looking at debug_state.txt.

Metadata

Metadata

Assignees

Labels

bugSomething that is supposed to be working; but isn't

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions