Skip to content

Fix concurrent actor starting too many threads.#14927

Merged
raulchen merged 12 commits intoray-project:masterfrom
antgroup:fix_concurrent_call
Apr 1, 2021
Merged

Fix concurrent actor starting too many threads.#14927
raulchen merged 12 commits intoray-project:masterfrom
antgroup:fix_concurrent_call

Conversation

@raulchen
Copy link
Copy Markdown
Contributor

Why are these changes needed?

Fixes #14894

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@raulchen raulchen requested review from edoakes and kfstorm March 25, 2021 08:05
Copy link
Copy Markdown
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but see comments on tests

Comment on lines +582 to +584
RAY_LOG(INFO) << "Setting actor as async with max_concurrency=" << max_concurrency
<< ", creating new fiber thread.";
pool_.reset(new BoundedExecutor(max_concurrency));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we assert here that the pool is only ever created once?

assert r1 == r2 == r3


def test_actor_max_concurrency(ray_start_regular):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test for the number of threads actually started? You could make a bunch of calls that return a thread ID (using https://docs.python.org/3/library/threading.html#threading.current_thread) and assert the set is the expected length. We should also do it for multiple callers (to catch the bug you identified).

self.concurrent_tasks += 1
self.max_concurrency = max(self.max_concurrency,
self.concurrent_tasks)
time.sleep(0.5)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid sleeping here please. you can use the ray.test_utils.SignalActor to control when to block/unblock explicitly

@raulchen
Copy link
Copy Markdown
Contributor Author

@edoakes thanks, all comments are addressed

Copy link
Copy Markdown
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

concurrent actor starts too many threads

2 participants