You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pipeline Parallelism (PP) is currently disabled in Piecewise CUDA Graph, limiting scalability for large models. Regular CUDA Graph already supports PP, so we should implement the same support for piecewise CUDA graph.
Tasks
Remove PP check in can_run_piecewise_cuda_graph()
Add pp_proxy_tensors support in replay_prepare() and replay()
Checklist
Motivation
Pipeline Parallelism (PP) is currently disabled in Piecewise CUDA Graph, limiting scalability for large models. Regular CUDA Graph already supports PP, so we should implement the same support for piecewise CUDA graph.
Tasks
can_run_piecewise_cuda_graph()pp_proxy_tensorssupport inreplay_prepare()andreplay()PPProxyTensorsoutput (followCudaGraphRunnerpattern)Related: [Feature] Roadmap for Prefill (Piecewise) CUDA Graph #11490
Related resources
No response