Skip to content

PoolActor now uses num_cpus=0 to avoid any deadlock#22048

Merged
pcmoritz merged 6 commits intoray-project:masterfrom
czgdp1807:deadlock
Feb 8, 2022
Merged

PoolActor now uses num_cpus=0 to avoid any deadlock#22048
pcmoritz merged 6 commits intoray-project:masterfrom
czgdp1807:deadlock

Conversation

@czgdp1807
Copy link
Copy Markdown
Contributor

@czgdp1807 czgdp1807 commented Feb 2, 2022

Why are these changes needed?

#21488 (comment) :

We discussed this issue in a bit more detail and came to the conclusion that we should set the CPU resource requirement for each actor in the actor pool to 0, to make the Ray Pool compatible/same behavior as the Python multiprocessing pool. Would that work for you @yogeveran ? (very similar to solution 4 mentioned above, but with 0.0 instead of 0.1, so it works in all cases).

Related issue number

Closes #21488

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@czgdp1807 czgdp1807 added the bug Something that is supposed to be working; but isn't label Feb 2, 2022
@czgdp1807
Copy link
Copy Markdown
Contributor Author

cc: @pcmoritz @wuisawesome

Copy link
Copy Markdown
Contributor

@wuisawesome wuisawesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the repro from #21488 as a test case?

@czgdp1807 czgdp1807 requested a review from wuisawesome February 3, 2022 11:26
with Pool(ray_address="auto") as pool:
return list(pool.map(poolit_a, range(2, 4, 1)))

ray.init(num_cpus=1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we initialize ray with a fixture if possible? Ideally we can use ray_start_shared_local_modes?

@pcmoritz pcmoritz merged commit 0f2a222 into ray-project:master Feb 8, 2022
@czgdp1807 czgdp1807 deleted the deadlock branch February 8, 2022 10:15
simonsays1980 pushed a commit to simonsays1980/ray that referenced this pull request Feb 27, 2022
ray-project#21488 (comment) :

> We discussed this issue in a bit more detail and came to the conclusion that we should set the CPU resource requirement for each actor in the actor pool to 0, to make the Ray Pool compatible/same behavior as the Python multiprocessing pool. Would that work for you @yogeveran ? (very similar to solution 4 mentioned above, but with 0.0 instead of 0.1, so it works in all cases).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something that is supposed to be working; but isn't

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Core] [Bug] Recursive ray.util.multiprocessing.Pool deadlock

3 participants