[Object Spilling] Multi node file spilling#13336
[Object Spilling] Multi node file spilling#13336rkooo567 wants to merge 3 commits intoray-project:masterfrom
Conversation
|
@stephanie-wang Can you take a look at the logic part before I write cleaning up the code? I need to
|
|
Just skimmed the code so far, and I'm a bit surprised there are changes to the ObjectManager. Is there a way we can keep the changes more contained to the LocalObjectManager or the Python-based spill/restore code? I guess I was imagining something like this:
It seems like we don't need many (any?) changes for steps 1 and 3, and just need to implement step 2. Or maybe I'm missing something? |
| RAY_CHECK(!local_objects_.empty() || used_memory_ == 0); | ||
| ray::Status status = | ||
| object_directory_->ReportObjectRemoved(object_id, self_node_id_, object_info); | ||
| if (!is_object_spilled_locally_(object_id)) { |
There was a problem hiding this comment.
Without this, remote nodes will never be able to find that this node has an object because the location is removed when the object is evicted.
| } | ||
| // If the object was spilled on this node, we should try pulling it. | ||
| // In this case, the object will be pulled from the external storage in a local node. | ||
| if (is_object_spilled_locally_(object_id)) { |
There was a problem hiding this comment.
This is essential because HandlePull assumes that the object already exists in the local node. In object spilling case, it is not the case, so we should have another pull here.
|
I think that's one of the possible implementations. In that case, I think we should also store which node spilled the object of the URL (so that we can send RPC). The current implementation is based on https://docs.google.com/document/d/1kN5D50kBkUp_xs8nTGHexxvn88zkZ3rKB22Vnc3rDMI/edit#heading=h.yy0u8n49j8ns, and it was chosen due to its simplicity (and more intuitive imo). This implementation restores the object only when it is spilled locally. But it came out that there were some edge cases I found due to the fact that our Pull/Push logic assumes object already exists in the local object store, and that requires some changes in object manager. That says, I am not so against your idea since the current implementation is more complex than expected. Thoughts @ericl? |
|
Hmm I agree @stephanie-wang 's solution is simpler given this PR. If I understand, we just need to add a new RPC and keep retrying that right? Also, the raylet address can be encoded in the URL? |
|
@stephanie-wang I think we should also figure out how to not restoring the same object again and again when there are multiple requests from many nodes. This means we should keep tracking of which objects are restoring and return the restore request if it is already in progress. @ericl Yeah that's also possible. I feel like just having one more field is probably simpler than encoding it to the URL (since we should pass the IP address to the io worker). Besides, I don't think the current solution is too complicated compared to the other? I think both solutions kind of require changes in the pull mechanism (Stephanie's solution will make a new path to pull the object which was only done by Pull + debugging might be harder because it is done in 2 different moving components whereas the current solution needs to change in some pulling logic to take into account of spilled objects) |
This seems orthogonal to multi-node file spilling. Doesn't this just mean we should deduplicate all restore requests in the LocalObjectManager, not just ones that are reloading from disk?
I think the advantage of what I suggested is that it keeps the logic related to spilling contained to the LocalObjectManager. This design seems to leak details related to spilling to the ObjectManager. |
Yeah, you are right that we need for every storage. But I am not sure if we need dedup with the current approach because I believe the pull should've been already deduped and we restore objects based on this pull (lmk if it is wrong. Then we need to add dedupe logic). I agree the new approach has a better abstraction barrier, and I don't think there's any more edge case than this dedup stuff. What about this? I will keep this open, and try implementing the new approach, and see if it is simple enough. If so, I will make a PR with that and close it. If there are tricky edge cases, I will probably go with the current approach. |
I think either way, we should dedupe at the LocalObjectManager because the Pull itself can duplicate calls (e.g., if the timer fires again before the restoration is complete).
Sounds good, feel free to open another PR for the other approach and assign me again. |
Why are these changes needed?
NOTE: Not done yet. Waiting for Stephanie's review just to make sure we are on the same page in terms of implementation before making changes that are necessary (e.g., cleaning stuff)
Related issue number
Checks
scripts/format.shto lint the changes in this PR.