Skip to content

[Feature] Support Pipeline Parallelism (PP) in Piecewise CUDA Graph #14515

@baonudesifeizhai

Description

@baonudesifeizhai

Checklist

Motivation

Pipeline Parallelism (PP) is currently disabled in Piecewise CUDA Graph, limiting scalability for large models. Regular CUDA Graph already supports PP, so we should implement the same support for piecewise CUDA graph.

Tasks

  1. Remove PP check in can_run_piecewise_cuda_graph()
  2. Add pp_proxy_tensors support in replay_prepare() and replay()
  3. Handle PPProxyTensors output (follow CudaGraphRunner pattern)
    Related: [Feature] Roadmap for Prefill (Piecewise) CUDA Graph #11490

Related resources

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions