[RFC] Cuda Graph Runner Backend Refactor

## Background

Currently we have three cuda graph implementations: full cuda graph, torch-compile-based piecewise cuda graph (`TCPiecewiseCudaGraph`), and breakable cuda graph. There is a lot of duplicated code across the runners. We hope to refactor the whole code structure to unify them and make it clearer for users to understand the cuda graph code structure.

## Goal

Support flexible cuda graph backends across `(decode, prefill) x (full, breakable, torch-compile-based pcg)`. Enable breakable cuda graph for prefill by default.

## Proposed Design

### Part 1: Cuda Graph Implementation

**1. Runner**
- `PrefillCudaGraphRunner`: manages cuda graph execution for the prefill phase
- `DecodeCudaGraphRunner`: manages cuda graph execution for the decode phase

Both inherit from `BaseCudaGraphRunner`.

**2. Backend**
- `FullCudaGraphBackend`: captures the full model in a single cuda graph
- `BreakableCudaGraphBackend`: supports breaking out of the cuda graph for dynamic ops
- `TCPiecewiseCudaGraphBackend`: torch-compile-based piecewise cuda graph

**Combination:** each runner owns a pluggable `BaseCudaGraphBackend`, giving a clean cross product of `(prefill, decode) x (full, breakable, tcpcg)`.

### Part 2: Arguments

Two types of arguments are supported:

**Config-based:** full control via a JSON config per phase, e.g.
`--cuda-graph-mode {"decode": "full", "prefill": "breakable"}`

**Convenience flags:** shorthand arguments that translate to the corresponding config, e.g.
- `--prefill-disable-cuda-graph` → `{"prefill": "disabled"}`
- `--decode-disable-cuda-graph` → `{"decode": "disabled"}`
- `--prefill-cuda-graph-bs` → sets batch sizes for prefill
- `--decode-cuda-graph-bs` → sets batch sizes for decode


## Plan

1. Breakable cuda graph #22218
2. Cuda graph runner/backend refactor #23906 
3. Default mode (`{"decode": "full", "prefill": "breakable"}`) enable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Cuda Graph Runner Backend Refactor #23004

Background

Goal

Proposed Design

Part 1: Cuda Graph Implementation

Part 2: Arguments

Plan

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Cuda Graph Runner Backend Refactor #23004

Description

Background

Goal

Proposed Design

Part 1: Cuda Graph Implementation

Part 2: Arguments

Plan

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions