[PlacementGroup]Fix destroy placement group bug#13320
[PlacementGroup]Fix destroy placement group bug#13320ffbin wants to merge 4 commits intoray-project:masterfrom
Conversation
| const PlacementGroupID &placement_group_id, | ||
| StatusCallback on_placement_group_removed) { | ||
| RAY_CHECK(on_placement_group_removed); | ||
| RAY_LOG(INFO) << "Removing placement group " << placement_group_id; |
There was a problem hiding this comment.
The printing frequency of these logs is very low. In order to facilitate the problem location, it is set to INFO.
There was a problem hiding this comment.
Hmm I still feel like this log is unnecessary. Isn't it better adding a log when the placement group is actually deleted instead?
|
Hmm I am not sure if this semantic is a good idea. Think about this scenario.
In this case, isn't it creating placement groups whenever we restart the actor? So if we restart 5 times, there are 4 placement groups that are not removed and retrieveable, but we don't need them because the actor created a new one. |
rkooo567
left a comment
There was a problem hiding this comment.
Request change until the discussion is resolved.
Hi @rkooo567 , I actually encountered this problem in the production environment, two detached actors (A and B). Actor A creates PG, and then uses PG to create actor B. Actor A is killed, causing PG to be deleted and actor B to exit without restarting. So I think this is definitely a problem that needs to be solved, but the scenario you mentioned really can't be solved very well. I generally suggest users not to create PG in the constructor, because at present, the actor constructor fails and users can't perceive it. |
|
Hi @rkooo567 Pls help take a look, thx :) |
rkooo567
left a comment
There was a problem hiding this comment.
Can you update a doc to specify this behavior? (saying if the actor is restarted, the placement group created from it is not removed).
It'll be also great if you add a warning not to create a placement group in a constructor.
cc @richardliaw Will this behavior change cause any issue?
| const PlacementGroupID &placement_group_id, | ||
| StatusCallback on_placement_group_removed) { | ||
| RAY_CHECK(on_placement_group_removed); | ||
| RAY_LOG(INFO) << "Removing placement group " << placement_group_id; |
There was a problem hiding this comment.
Hmm I still feel like this log is unnecessary. Isn't it better adding a log when the placement group is actually deleted instead?
|
Hey @ffbin I thought more about it, and I think we shouldn't push this change. I think the scenario you mentioned is not a bug because we have the exact same behavior for actor handles. (imagine two detached actors. actor A creates a actor handle C and pass it to actor B. Current pg behavior is consistent with this). I think we cannot "encourage" users not to create pgs in the constructor. This is a really bad UX since many users don't see docs, and this can lead them to unexpected placement group leaks which are nearly impossible to debug in their end. I think the right solution is to keep the current behavior, but we implement this (#10508), so that users can clearly know why the actor B was killed, or we just don't fate-share actors with the placement group. But since we have diverged opinions, I'd like to hear what @ericl thinks about this. |
|
It sounds like what you actually want here is to "detach" the placement group. Should we just support |
|
That's a good idea. Serve also asked about the need for detached placement groups (for better fault tolerance). And I think that's the best solution here I believe. |
Support lifetime="detached" for placement group is a good idea, i will open a new PR for it. |
|
Can we close PR and instead merging the detached pg PR? |
|
Close this PR and instead merging the detached pg PR. |
Why are these changes needed?
At present, if the placement group creator actor restarts, the placement group will be destroyed. We should destroy placement group only when the placement group creator actor is dead.
Related issue number
Checks
scripts/format.shto lint the changes in this PR.