Skip to content

[RFC] Cuda Graph Runner Backend Refactor #23004

@Oasis-Git

Description

@Oasis-Git

Background

Currently we have three cuda graph implementations: full cuda graph, torch-compile-based piecewise cuda graph (TCPiecewiseCudaGraph), and breakable cuda graph. There is a lot of duplicated code across the runners. We hope to refactor the whole code structure to unify them and make it clearer for users to understand the cuda graph code structure.

Goal

Support flexible cuda graph backends across (decode, prefill) x (full, breakable, torch-compile-based pcg). Enable breakable cuda graph for prefill by default.

Proposed Design

Part 1: Cuda Graph Implementation

1. Runner

  • PrefillCudaGraphRunner: manages cuda graph execution for the prefill phase
  • DecodeCudaGraphRunner: manages cuda graph execution for the decode phase

Both inherit from BaseCudaGraphRunner.

2. Backend

  • FullCudaGraphBackend: captures the full model in a single cuda graph
  • BreakableCudaGraphBackend: supports breaking out of the cuda graph for dynamic ops
  • TCPiecewiseCudaGraphBackend: torch-compile-based piecewise cuda graph

Combination: each runner owns a pluggable BaseCudaGraphBackend, giving a clean cross product of (prefill, decode) x (full, breakable, tcpcg).

Part 2: Arguments

Two types of arguments are supported:

Config-based: full control via a JSON config per phase, e.g.
--cuda-graph-mode {"decode": "full", "prefill": "breakable"}

Convenience flags: shorthand arguments that translate to the corresponding config, e.g.

  • --prefill-disable-cuda-graph{"prefill": "disabled"}
  • --decode-disable-cuda-graph{"decode": "disabled"}
  • --prefill-cuda-graph-bs → sets batch sizes for prefill
  • --decode-cuda-graph-bs → sets batch sizes for decode

Plan

  1. Breakable cuda graph [Experimental] Breakable Piecewise Cuda Graph #22218
  2. Cuda graph runner/backend refactor [Refactor] Cuda Graph Runner/Backend Refactor #23906
  3. Default mode ({"decode": "full", "prefill": "breakable"}) enable

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions