[core][rdt] Reuse previous metadata if transferring the same tensor list with nixl#58263
[core][rdt] Reuse previous metadata if transferring the same tensor list with nixl#58263dayshah merged 15 commits intoray-project:masterfrom
Conversation
| nixl_agent_meta: Optional[bytes] = None | ||
|
|
||
| __eq__ = object.__eq__ | ||
| __hash__ = object.__hash__ |
There was a problem hiding this comment.
without this the equality operator wouldn't work?
There was a problem hiding this comment.
i think it's needed by the hashset
| ) | ||
| gpu_object_store._managed_meta_nixl[obj_id] = ret | ||
| gpu_object_store._managed_meta_counts_nixl[ret] = 1 | ||
| return ret |
There was a problem hiding this comment.
i think the ai is right here aren't these 2 dicts under locks
| break | ||
|
|
||
| nixl_agent.release_xfer_handle(xfer_handle) | ||
| nixl_agent.deregister_memory(local_descs) |
There was a problem hiding this comment.
why do we need to deregister on recv now?
There was a problem hiding this comment.
It's for the memory registered by the receiver.
| ) | ||
| if not is_same_tensors: | ||
| raise ValueError( | ||
| f"The duplicate object {dst_obj_id} does not have the same tensors as the source object {src_obj_id}." |
There was a problem hiding this comment.
This error will get raised to users, right? Might want to say something a bit clearer, like:
"Some of the tensors in this object are still in scope as part of another RDT object. Ensure that ObjectRef({src_object_id}) is out of scope before creating this object."
|
|
||
|
|
||
| @pytest.mark.parametrize("ray_start_regular", [{"num_gpus": 2}], indirect=True) | ||
| def test_send_duplicate_tensor(ray_start_regular): |
There was a problem hiding this comment.
Can you add a test that checks that we throw an error when a different tensor subset is passed?
Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: Dhyey Shah <dhyey2019@gmail.com>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com>
Signed-off-by: Dhyey Shah <dhyey2019@gmail.com>
…ist with nixl (ray-project#58263) ## Description For nixl, reuse previous metadata if transferring the same tensor list. This is to avoid repeated `register_memory` before `deregister_memory` --------- Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu>
…ist with nixl (ray-project#58263) ## Description For nixl, reuse previous metadata if transferring the same tensor list. This is to avoid repeated `register_memory` before `deregister_memory` --------- Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu>
…ist with nixl (ray-project#58263) ## Description For nixl, reuse previous metadata if transferring the same tensor list. This is to avoid repeated `register_memory` before `deregister_memory` --------- Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu>
…ist with nixl (ray-project#58263) ## Description For nixl, reuse previous metadata if transferring the same tensor list. This is to avoid repeated `register_memory` before `deregister_memory` --------- Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…ist with nixl (ray-project#58263) ## Description For nixl, reuse previous metadata if transferring the same tensor list. This is to avoid repeated `register_memory` before `deregister_memory` --------- Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: Future-Outlier <eric901201@gmail.com>
…same tensor list with nixl (ray-project#58309) Cherry-picking ray-project#58263 for 2.51.1 release. Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu>
…ist with nixl (ray-project#58263) ## Description For nixl, reuse previous metadata if transferring the same tensor list. This is to avoid repeated `register_memory` before `deregister_memory` --------- Signed-off-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Dhyey Shah <dhyey2019@gmail.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
For nixl, reuse previous metadata if transferring the same tensor list. This is to avoid repeated
register_memorybeforederegister_memory