[core] Ref counting for actor handles#7434
[core] Ref counting for actor handles#7434stephanie-wang merged 23 commits intoray-project:masterfrom
Conversation
|
Can one of the admins verify this patch? |
src/ray/core_worker/core_worker.cc
Outdated
| status = local_raylet_client_->SubmitTask(task_spec); | ||
| } | ||
|
|
||
| *actor_object_id = return_ids[0]; |
There was a problem hiding this comment.
Is it true that this value will always be the same as ObjectID::ForTaskReturn(TaskID::ForActorCreationTask(actor_id), 0)? I didn't see any random bits in this.
If so, maybe we can generate the id on the fly instead of storing it.
There was a problem hiding this comment.
Oh hmm yeah I didn't see that function. That should work for C++.
We'll still have to store it in Python, though, since we're relying on the ObjectID's python ref to track when the local ref goes out of scope and when the handle gets passed into a task.
|
Test FAILed. |
|
Test FAILed. |
|
Could you do the callback when the actor handle goes out of scope and treat
it as the id?
…On Tue, Mar 3, 2020, 10:01 PM Stephanie Wang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/ray/core_worker/core_worker.cc
<#7434 (comment)>:
> }
+
+ *actor_object_id = return_ids[0];
Oh hmm yeah I didn't see that function. That should work for C++.
We'll still have to store it in Python, though, since we're relying on the
ObjectID's python ref to track when the local ref goes out of scope and
when the handle gets passed into a task.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7434?email_source=notifications&email_token=AAADUSSFHRHCJRRYFS2ITPLRFXVCHA5CNFSM4LAXOCG2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCX3TFIQ#discussion_r387463181>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSU3BMKIL63CPLRF62TRFXVCHANCNFSM4LAXOCGQ>
.
|
Yes, but that won't cover the cases where the actor handle gets passed or nested inside other objects. If you have suggestions on how to do it, I can try it, but this seemed much simpler to me. |
|
Right, I guess I was thinking you can synthesize the object id on the fly
in hooks for those cases too. Not sure how much code that adds.
…On Tue, Mar 3, 2020, 10:29 PM Stephanie Wang ***@***.***> wrote:
Could you do the callback when the actor handle goes out of scope and
treat it as the id?
Yes, but that won't cover the cases where the actor handle gets passed or
nested inside other objects. If you have suggestions on how to do it, I can
try it, but this seemed much simpler to me.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7434?email_source=notifications&email_token=AAADUSVJYSVKPPYZE4ZA2SLRFXYM3A5CNFSM4LAXOCG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENWREVQ#issuecomment-594350678>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSTKIH4MYICYDGTLHGLRFXYM3ANCNFSM4LAXOCGQ>
.
|
|
Test FAILed. |
|
Hmm let me try it and see. It might be good to separate out the logic about
actor handles more.
…On Tue, Mar 3, 2020 at 10:41 PM Eric Liang ***@***.***> wrote:
Right, I guess I was thinking you can synthesize the object id on the fly
in hooks for those cases too. Not sure how much code that adds.
On Tue, Mar 3, 2020, 10:29 PM Stephanie Wang ***@***.***>
wrote:
> Could you do the callback when the actor handle goes out of scope and
> treat it as the id?
>
> Yes, but that won't cover the cases where the actor handle gets passed or
> nested inside other objects. If you have suggestions on how to do it, I
can
> try it, but this seemed much simpler to me.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <
#7434?email_source=notifications&email_token=AAADUSVJYSVKPPYZE4ZA2SLRFXYM3A5CNFSM4LAXOCG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENWREVQ#issuecomment-594350678
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAADUSTKIH4MYICYDGTLHGLRFXYM3ANCNFSM4LAXOCGQ
>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7434?email_source=notifications&email_token=AATREBA6E23XMEV7YHKWK6LRFXZYPA5CNFSM4LAXOCG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENWR42Q#issuecomment-594353770>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATREBGQWPSZUN5VCUH5GE3RFXZYPANCNFSM4LAXOCGQ>
.
|
edoakes
left a comment
There was a problem hiding this comment.
Looks great!
The changes to the Exit() logic look nice too - might be better to separate that into a different PR though. I think @kfstorm was going to add the same logic in #7346. Maybe you could separate the Exit() logic here into a separate PR and then both this and #7346 can rebase off of that?
LMK when you update with change that Eric suggested and I'll look again.
python/ray/actor.py
Outdated
| state = worker.core_worker.serialize_actor_handle( | ||
| self._ray_actor_id) | ||
| state = (state, self._ray_actor_creation_return_id) |
There was a problem hiding this comment.
| state = worker.core_worker.serialize_actor_handle( | |
| self._ray_actor_id) | |
| state = (state, self._ray_actor_creation_return_id) | |
| state = (worker.core_worker.serialize_actor_handle( | |
| self._ray_actor_id), self._ray_actor_creation_return_id) |
There was a problem hiding this comment.
So cool you can do multi-line suggestions now!
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Sure that's fine - it's nice to try to separate independent/standalone changes into separate PRs whenever possible but obviously not a big deal in this case given that both changes are relatively small. Having another look now. |
edoakes
left a comment
There was a problem hiding this comment.
Everything looks good except for the comment I left about overwriting the pending force kills.
Also, one thing to keep in mind is it might be nice to extend the changes here to add an API to check when an actor creation task has finished without submitting a new task. Are the currently changes compatible with that? Seems like we just need to be able to resolve the object ID associated with an actor handle in the same way that we do for serialized object IDs. This actually might have been easier with the previous iteration that stored the object ID :/
python/ray/serialization.py
Outdated
| return obj._serialization_helper(True) | ||
| serialized, actor_handle_id = obj._serialization_helper() | ||
| # Update ref counting for the actor handle | ||
| if self.is_in_band_serialization(): |
There was a problem hiding this comment.
Looks like this code block is verbatim the same as in the object_id_serializer - might be nice to separate into another method so they maintain parity?
| bool force_kill) { | ||
| absl::MutexLock lock(&mu_); | ||
| pending_force_kills_.insert(actor_id); | ||
| pending_force_kills_[actor_id] = force_kill; |
There was a problem hiding this comment.
If I'm understanding this correctly, I think it would cause issues for the following example:
a = Actor.remote()
hanging_id = a.hang_forever.remote()
ray.kill(a)
del a
ray.get(hanging_id)
In this case the user would expect hanging_id to return an error because it was cancelled by the force kill, but the code here may overwrite the pending force kill with a graceful one, leading the actor to hang.
Er I'm not sure if I totally understand, but yes, I was thinking that eventually we would like to do something similar where we "resolve" an actor handle by asking its owner about the status of the actor creation task. I was actually thinking this separation would be good so that it's clear what the differences are between resolving actor handles vs normal objects. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
Thanks, took a quick look ant it looks good to me. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Note that this PR broke the streaming CI. |
Why are these changes needed?
Before, actors were kept alive as long as the creator's reference to the actor was active. This meant that if the creator exited, the actor might exit even though other handles to the actor were still active.
This PR reuses the ref counting for normal
ObjectIDs to implement ref counting for actor handles by adding anObjectIDto the PythonActorHandleobject. After this PR, an actor will be kept alive as the process that created it is still alive and there are any references to the actor's handle in scope.Because distributed reference counting (#6945) is still feature-flagged off, this will only cover cases where an actor handle is passed by the original creator. If the actor handle is passed by a task/actor that did not originally create it, then the actor may exit early.
Related issue number
Doesn't fix it yet, but one step closer to #6370
Closes #3472
Checks
scripts/format.shto lint the changes in this PR.