Skip to content

[Feature] support ACLGraph #8030

@ping1jing2

Description

@ping1jing2

Checklist

Motivation

1. Motivation

After some features had been merged, we could run SGLang on Ascend servers with eager mode. But if we want to get better performance, we need implement ACLGraph or NPUGraph now.

Goals
Goal 1: Define a NPUGraphRunner class for SGLang, which provides basic functions and supports llama or Qwen models.

Goal 2: Adapt to TP/DP , GraphTree and dynamic shape scenarios, including memory reuse.

Goal 3: Improve performance based on torch.compile,

2. Technical Design

  • Workflow Phases
Image

3. Roadmap

  • Phase 1: Basic support for dense model

    Implement NPUGraphRunner refer to CUDAGraphRunner, but we should handle some special case:

    Because we use this torch_npu.npu_fused_infer_attention_score API, which has a host_list input, we have to update its value each time using torch_npu.npu.NPUGraph.update. For more details, please refer to task update.

  • Phase 2: Basic support for moe model

Related resources

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions