Skip to content

Introduce CUDA graph debug mode with breakable CUDA graph#19102

Merged
ch-wan merged 8 commits intomainfrom
shiyang/breakable_cg
Apr 11, 2026
Merged

Introduce CUDA graph debug mode with breakable CUDA graph#19102
ch-wan merged 8 commits intomainfrom
shiyang/breakable_cg

Conversation

@cctry
Copy link
Copy Markdown
Collaborator

@cctry cctry commented Feb 21, 2026

Introduce Breakable CUDA Graph — a lightweight mechanism to insert graph breaks into CUDA graph capture. Marked operations run eagerly between captured graph segments, while the rest stays graph-captured.

CUDA graph debug mode

# Debug mode: all ops run eagerly through graph capture/replay path
python -m sglang.launch_server --model meta-llama/Llama-3-8B --debug-cuda-graph
# Selective graph breaks in model code
from sglang.srt.model_executor.breakable_cuda_graph.breakable_cuda_graph import non_graph

@non_graph(enable=True)
def my_dynamic_op(x):
    return some_incompatible_op(x)

Breakable CUDA graph

when enable SGLANG_USE_BREAKABLE_CUDA_GRAPH, the decode graph is breakable. The overhead is minimal if no graph break inserted.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@cctry
Copy link
Copy Markdown
Collaborator Author

cctry commented Feb 21, 2026

/tag-and-rerun-ci

Copy link
Copy Markdown
Collaborator

@ch-wan ch-wan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments. Also, can we also apply --debug-cuda-graph to piecewise cuda graph?

Comment thread python/sglang/srt/model_executor/breakable_cuda_graph/breakable_cuda_graph.py Outdated
Comment thread python/sglang/srt/model_executor/breakable_cuda_graph/cuda_utils.py Outdated
Comment thread python/sglang/srt/model_executor/breakable_cuda_graph/breakable_cuda_graph.py Outdated
Comment thread python/sglang/srt/server_args.py
Comment thread python/sglang/srt/model_executor/breakable_cuda_graph/breakable_cuda_graph.py Outdated
Comment thread python/sglang/srt/model_executor/breakable_cuda_graph/breakable_cuda_graph.py Outdated
@BBuf
Copy link
Copy Markdown
Collaborator

BBuf commented Mar 20, 2026

Can #20910 solve the debug issue you encountered?

ch-wan and others added 4 commits April 9, 2026 00:50
- Fix shared mutable ContextVar default ([] -> None) to prevent cross-context leaks
- Fix structured output writeback with _copy_output for dataclasses/dicts/tensors
- Fix replay no-break path using destroyed graph handle (last_graph -> last_graph_exec)
- Add thread-safe wait_stream hook with lock + refcount
- Add graph exec cleanup in __del__ to prevent GPU resource leaks
- Add HIP/ROCm guards in server_args and cuda_graph_runner
- Add clear error messages for missing cuda-python and incompatible modes
- Rename non_graph -> eager_on_graph, BreakableCUDAGraphContext -> BreakableCUDAGraphCapture
- Add __init__.py for breakable_cuda_graph package
- Add unit tests (11 tests covering capture/replay, breaks, _copy_output, break_graph)
- Add documentation (docs/advanced_features/breakable_cuda_graph.md)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 10, 2026
@ch-wan ch-wan merged commit f855a0b into main Apr 11, 2026
159 of 176 checks passed
@ch-wan ch-wan deleted the shiyang/breakable_cg branch April 11, 2026 07:36
pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026
…t#19102)

Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Cheng Wan <chwan@rice.edu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…t#19102)

Co-authored-by: Cheng Wan <54331508+ch-wan@users.noreply.github.com>
Co-authored-by: Cheng Wan <chwan@rice.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants