Skip to content

[CudaIpc 1/3]P2PCommunication: add backend type#3909

Merged
samnordmann merged 3 commits intomainfrom
add_backend_type_to_p2p_comm
Feb 21, 2025
Merged

[CudaIpc 1/3]P2PCommunication: add backend type#3909
samnordmann merged 3 commits intomainfrom
add_backend_type_to_p2p_comm

Conversation

@samnordmann
Copy link
Collaborator

@samnordmann samnordmann commented Feb 17, 2025

This PR is a small self-contained part belonging to the larger PR

What

  • Add the backend type as an argument to P2PCommunication*

@github-actions
Copy link

github-actions bot commented Feb 17, 2025

Review updated until commit afc6d11

Description

  • Add backend type to P2PCommunication constructor

  • Update getWorld call to include backend type

  • Modify toString and toInlineString methods

  • Set default backend to kNccl in P2PCommunication


Changes walkthrough 📝

Relevant files
Enhancement
executor.cpp
Update getWorld call                                                                         

csrc/host_ir/executor.cpp

  • Update getWorld call to include backend type
+1/-1     
communication.cpp
Modify P2PCommunication constructor and string methods     

csrc/multidevice/communication.cpp

  • Add backend parameter to P2PCommunication constructor
  • Rename toString to toInlineString and modify it
  • Update toString to call toInlineString
  • +8/-5     
    communication.h
    Add backend parameter and method                                                 

    csrc/multidevice/communication.h

  • Add backend parameter to P2PCommunication constructor with default
    value
  • Add backend() method to retrieve backend attribute
  • +6/-1     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Backend Usage

    Ensure that the communicator_->getWorld(communication->backend()) call is correctly implemented and that getWorld supports the backend parameter.

    communicator_->getWorld(communication->backend()),
    String Representation

    Verify that the toInlineString and toString methods correctly handle the new backend attribute and that the output format is consistent with the rest of the codebase.

    std::string P2PCommunication::toInlineString(const int indent_size) const {
      std::stringstream ss;
      indent(ss, indent_size) << "P2PCommunication " << name() << " ("
                              << "type=" << type() << ", "
                              << "buffer=" << buffer() << ", "
                              << "peer=" << peer() << ", "
                              << "backend=" << backend() << ")";
      return ss.str();
    }
    
    std::string P2PCommunication::toString(int indent_size) const {
      return toInlineString(indent_size) + "\n";
    }
    Default Backend

    Confirm that the default backend (CommunicatorBackend::kNccl) is appropriate and that it does not introduce any unintended behavior.

    CommunicatorBackend backend = CommunicatorBackend::kNccl);

    @samnordmann
    Copy link
    Collaborator Author

    !test

    @samnordmann samnordmann changed the title [CudaIpc 1/3]P2PCommunication: add backend, src and dst [CudaIpc 1/3]P2PCommunication: add backend type Feb 17, 2025
    @samnordmann
    Copy link
    Collaborator Author

    !test

    @samnordmann samnordmann merged commit 644eaec into main Feb 21, 2025
    49 of 54 checks passed
    @samnordmann samnordmann deleted the add_backend_type_to_p2p_comm branch February 21, 2025 13:04
    samnordmann added a commit that referenced this pull request Mar 12, 2025
    On top of:
    - #3910
    - #3909
    - #3908
    
    Pending on issue:
    - #3907
    samnordmann added a commit that referenced this pull request Apr 14, 2025
    On top of 
    - #3909
    
    prerequesite to:
    - #3911
    
    # What
    - Set up the infrastructure needed for ipc handle exchange and caching
    - Add an `Expr` node `hir::ShareMemHandles` to represent this op. We
    cannot embed the op in the Send/Recv semantics because we need to group
    the handle exchange between matching sends and recv to avoid deadlocks
    
    # How
    Most of the implementation is in `multidevice/ipc_handle.cpp`
    - Define the class `IpcHandle` representing the ipc handle that is
    exchanged. This class is supplemented with a semaphore, which is a local
    cuda buffer allocated on the exporter's device.
    - Define `IpcHandleCache` which handles exchanging and caching the ipc
    handles. Caching is made on with respect to a combination of runtime and
    symbolic ingredients: `(runtime peer, at::Tensor, Expr*)`. This caching
    allows to have arbitrary number of p2p comms between pairs of ranks.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants