On top of
- #3909
prerequesite to:
- #3911
# What
- Set up the infrastructure needed for ipc handle exchange and caching
- Add an `Expr` node `hir::ShareMemHandles` to represent this op. We
cannot embed the op in the Send/Recv semantics because we need to group
the handle exchange between matching sends and recv to avoid deadlocks
# How
Most of the implementation is in `multidevice/ipc_handle.cpp`
- Define the class `IpcHandle` representing the ipc handle that is
exchanged. This class is supplemented with a semaphore, which is a local
cuda buffer allocated on the exporter's device.
- Define `IpcHandleCache` which handles exchanging and caching the ipc
handles. Caching is made on with respect to a combination of runtime and
symbolic ingredients: `(runtime peer, at::Tensor, Expr*)`. This caching
allows to have arbitrary number of p2p comms between pairs of ranks.
This PR is a small self-contained part belonging to the larger PR
What