Skip to content

[Core] Resources are not returned when implicitly capturing object reference with max_call arg. #10960

@rkooo567

Description

@rkooo567

What is the problem?

Reported from this #10739

When running the script below, GPU resources are never returned. There is a couple interesting behaviors.

  1. We have local reference from our internal methods.

Screen Shot 2020-09-22 at 4 15 23 PM

  1. When we move X_id and int_ref into the task, or when we pass them by arg, the error disappeared.

  2. When we remove max_call, the error disappeared.

It might be some unknown bugs from ref counting, but I am not sure.

cc @stephanie-wang @ericl

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

import ray
import time

ray.init(num_cpus=2, num_gpus=1)

sleep_seconds = 5

import numpy as np
X = np.zeros(int(1e6))
X_id = ray.put(X)

int_ref = ray.put(11)

@ray.remote(num_cpus=1, num_gpus=0.25, max_calls=1)
def do_thing():
    time.sleep(sleep_seconds)
    return len(ray.get(X_id))

@ray.remote(num_cpus=1, num_gpus=0.25, max_calls=1)
def do_sum():
    time.sleep(sleep_seconds)
    return ray.get(int_ref) + 20

for i in range(3):
    print(ray.get(do_thing.remote()))
    print(ray.get(do_sum.remote()))
    print(i, ray.available_resources())
  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn't

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions