-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Datasets] [Bug] workers and actors leaking in trivial dataset use #21999
Copy link
Copy link
Closed
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't
Milestone
Description
Search before asking
- I searched the issues and found no similar issues.
Ray Component
Ray Clusters
What happened + What you expected to happen
I have a stock cluster and I'm testing datasets, but after running the code below, an actor named ray::_DesignatedBlockOwner persists in addition to three ray::IDLE workers. Eventually, these will cumulatively consume all the memory in the cluster.
I will also eventually receive warnings such as this:
The actor '_DesignatedBlockOwner' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.
The remote function 'ray.data.read_api.remote_read' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.
The remote function '__main__.foo' has been exported 100 times. It's possible that this warning is accidental, but this may indicate that the same remote function is being defined repeatedly from within many tasks and exported to all of the workers. This can be a performance issue and can be resolved by defining the remote function on the driver instead. See https://github.com/ray-project/ray/issues/6240 for more discussion.
Versions / Dependencies
ray-1.9.2
python-3.7.7 (and 3.9.5)
pyarrow-6.0.1
ubuntu-21.04 (default ray containers)
ami-029536273cb04d4d9; so-called deep learning ami (ubuntu) version 55
Reproduction script
import ray
@ray.remote
def foo():
return ray.data.range_arrow(10000, parallelism = 1)
ray.init(address="auto")
ref = foo.remote()
print(ray.get(ref))
ray.shutdown()Anything else
The code leaks three workers and one actor every time it is invoked against the cluster.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't