Skip to content

[Object Spilling] Multi node file spilling#13336

Closed
rkooo567 wants to merge 3 commits intoray-project:masterfrom
rkooo567:multi-node-file-spilling
Closed

[Object Spilling] Multi node file spilling#13336
rkooo567 wants to merge 3 commits intoray-project:masterfrom
rkooo567:multi-node-file-spilling

Conversation

@rkooo567
Copy link
Copy Markdown
Contributor

@rkooo567 rkooo567 commented Jan 11, 2021

Why are these changes needed?

NOTE: Not done yet. Waiting for Stephanie's review just to make sure we are on the same page in terms of implementation before making changes that are necessary (e.g., cleaning stuff)

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@rkooo567
Copy link
Copy Markdown
Contributor Author

@stephanie-wang Can you take a look at the logic part before I write cleaning up the code? I need to

  • Remove spilled URL field from object location protobuf
  • Fix broken cpp unit tests

@rkooo567 rkooo567 changed the title [WIP] Multi node file spilling [Object Spilling] Multi node file spilling Jan 12, 2021
@stephanie-wang
Copy link
Copy Markdown
Contributor

Just skimmed the code so far, and I'm a bit surprised there are changes to the ObjectManager. Is there a way we can keep the changes more contained to the LocalObjectManager or the Python-based spill/restore code? I guess I was imagining something like this:

  1. On spilled URL found, ObjectManager asks LocalObjectManager to restore.
  2. Instead of restoring directly, LocalObjectManager (or the Python IO worker) sends an RPC to the remote raylet to restore
  3. On object location found, ObjectManager pulls.

It seems like we don't need many (any?) changes for steps 1 and 3, and just need to implement step 2. Or maybe I'm missing something?

@stephanie-wang stephanie-wang added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 12, 2021
RAY_CHECK(!local_objects_.empty() || used_memory_ == 0);
ray::Status status =
object_directory_->ReportObjectRemoved(object_id, self_node_id_, object_info);
if (!is_object_spilled_locally_(object_id)) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, remote nodes will never be able to find that this node has an object because the location is removed when the object is evicted.

}
// If the object was spilled on this node, we should try pulling it.
// In this case, the object will be pulled from the external storage in a local node.
if (is_object_spilled_locally_(object_id)) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essential because HandlePull assumes that the object already exists in the local node. In object spilling case, it is not the case, so we should have another pull here.

@rkooo567
Copy link
Copy Markdown
Contributor Author

I think that's one of the possible implementations. In that case, I think we should also store which node spilled the object of the URL (so that we can send RPC).

The current implementation is based on https://docs.google.com/document/d/1kN5D50kBkUp_xs8nTGHexxvn88zkZ3rKB22Vnc3rDMI/edit#heading=h.yy0u8n49j8ns, and it was chosen due to its simplicity (and more intuitive imo).

This implementation restores the object only when it is spilled locally. But it came out that there were some edge cases I found due to the fact that our Pull/Push logic assumes object already exists in the local object store, and that requires some changes in object manager.

That says, I am not so against your idea since the current implementation is more complex than expected. Thoughts @ericl?

@ericl
Copy link
Copy Markdown
Contributor

ericl commented Jan 12, 2021

Hmm I agree @stephanie-wang 's solution is simpler given this PR. If I understand, we just need to add a new RPC and keep retrying that right? Also, the raylet address can be encoded in the URL?

@rkooo567
Copy link
Copy Markdown
Contributor Author

rkooo567 commented Jan 12, 2021

@stephanie-wang I think we should also figure out how to not restoring the same object again and again when there are multiple requests from many nodes. This means we should keep tracking of which objects are restoring and return the restore request if it is already in progress.

@ericl Yeah that's also possible. I feel like just having one more field is probably simpler than encoding it to the URL (since we should pass the IP address to the io worker).

Besides, I don't think the current solution is too complicated compared to the other? I think both solutions kind of require changes in the pull mechanism (Stephanie's solution will make a new path to pull the object which was only done by Pull + debugging might be harder because it is done in 2 different moving components whereas the current solution needs to change in some pulling logic to take into account of spilled objects)

@stephanie-wang
Copy link
Copy Markdown
Contributor

@stephanie-wang I think we should also figure out how to not restoring the same object again and again when there are multiple requests from many nodes. This means we should keep tracking of which objects are restoring and return the restore request if it is already in progress.

This seems orthogonal to multi-node file spilling. Doesn't this just mean we should deduplicate all restore requests in the LocalObjectManager, not just ones that are reloading from disk?

Besides, I don't think the current solution is too complicated compared to the other? I think both solutions kind of require changes in the pull mechanism (Stephanie's solution will make a new path to pull the object which was only done by Pull + debugging might be harder because it is done in 2 different moving components whereas the current solution needs to change in some pulling logic to take into account of spilled objects)

I think the advantage of what I suggested is that it keeps the logic related to spilling contained to the LocalObjectManager. This design seems to leak details related to spilling to the ObjectManager.

@rkooo567
Copy link
Copy Markdown
Contributor Author

This seems orthogonal to multi-node file spilling. Doesn't this just mean we should deduplicate all restore requests in the LocalObjectManager, not just ones that are reloading from disk?

Yeah, you are right that we need for every storage. But I am not sure if we need dedup with the current approach because I believe the pull should've been already deduped and we restore objects based on this pull (lmk if it is wrong. Then we need to add dedupe logic).

I agree the new approach has a better abstraction barrier, and I don't think there's any more edge case than this dedup stuff. What about this? I will keep this open, and try implementing the new approach, and see if it is simple enough. If so, I will make a PR with that and close it. If there are tricky edge cases, I will probably go with the current approach.

@stephanie-wang
Copy link
Copy Markdown
Contributor

Yeah, you are right that we need for every storage. But I am not sure if we need dedup with the current approach because I believe the pull should've been already deduped and we restore objects based on this pull (lmk if it is wrong. Then we need to add dedupe logic).

I think either way, we should dedupe at the LocalObjectManager because the Pull itself can duplicate calls (e.g., if the timer fires again before the restoration is complete).

I agree the new approach has a better abstraction barrier, and I don't think there's any more edge case than this dedup stuff. What about this? I will keep this open, and try implementing the new approach, and see if it is simple enough. If so, I will make a PR with that and close it. If there are tricky edge cases, I will probably go with the current approach.

Sounds good, feel free to open another PR for the other approach and assign me again.

@rkooo567 rkooo567 closed this Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants