[Feature] support ACLGraph

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

# 1. Motivation

After some features had been merged, we could run SGLang on Ascend servers with eager mode. But if we want to get better performance, we need implement ACLGraph or NPUGraph now.

Goals
Goal 1: Define a `NPUGraphRunner` class for SGLang, which provides basic functions and supports llama or Qwen models.

Goal 2: Adapt to TP/DP , GraphTree and dynamic shape scenarios, including memory reuse.

Goal 3: Improve performance based on `torch.compile`, 

# 2. Technical Design

- Workflow Phases

<img width="1916" height="1268" alt="Image" src="https://github.com/user-attachments/assets/095ca9a9-97e9-4f47-bc8b-f3a0f9900ef6" />


- Key messages

  we have [torch_npu.npu.NPUGraph](https://gitee.com/ascend/pytorch/blob/master/torch_npu/npu/graphs.py#L246), which has similar interfaces and functions to [torch.cuda.CUDAGraph](https://docs.pytorch.org/docs/stable/generated/torch.cuda.CUDAGraph.html)

  Concerning the level of RTS, we can refer to this [document](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/developmentguide/appdevg/aclcppdevg/aclcppdevg_000519.html). 

# 3. Roadmap

- Phase 1: Basic support for dense model
   
   Implement `NPUGraphRunner` refer to `CUDAGraphRunner`, but we should handle some special case:

   Because we use this [torch_npu.npu_fused_infer_attention_score](https://www.hiascend.com/doc_center/source/zh/Pytorch/60RC2/apiref/apilist/ptaoplist_000787.html) API, which has a host_list input, we have to update its value each time using `torch_npu.npu.NPUGraph.update`. For more details, please refer to [task update](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/developmentguide/appdevg/aclcppdevg/aclcppdevg_000519.html#ZH-CN_TOPIC_0000002284281029__section4544173013324). 

- Phase 2: Basic support for moe model



### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] support ACLGraph #8030

Checklist

Motivation

1. Motivation

2. Technical Design

3. Roadmap

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] support ACLGraph #8030

Description

Checklist

Motivation

1. Motivation

2. Technical Design

3. Roadmap

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions