Fix InternVL and vision attention for non-CUDA backends (e.g. XPU) by jmunetong · Pull Request #19997 · sgl-project/sglang

jmunetong · 2026-03-06T02:51:17Z

Motivation

InternVL and vision attention currently assume CUDA: they use hardcoded "cuda" and .cuda(), and the vision attention backend selection does not handle XPU. This prevents running InternVL and vision models on Intel XPU (and other non-CUDA devices). This PR makes both components device-agnostic so they work on the configured backend (CUDA, XPU, etc.).

Modifications

python/sglang/srt/multimodal/processors/internvl.py
- Import get_device from sglang.srt.utils.
- Replace all hardcoded device="cuda" with device=get_device() in normalization and preprocessing.
- Replace .cuda() and .to("cuda") with .to(get_device()) for image/video tensors and input_ids so tensors are created on the actual backend device.
python/sglang/srt/layers/attention/vision.py
- Import is_xpu and set _is_xpu = is_xpu() alongside existing _is_cuda, _is_npu, _is_hip.
- In VisionTritonAttention: use cu_seqlens.to(q.device) and seq_lens.to(q.device) instead of .cuda() so tensors follow the model device (works on XPU and other backends).
- In VisionAttention backend selection: add elif _is_xpu: backend = "triton_attn" so XPU uses the Triton attention backend instead of falling through to SDPA or unsupported paths.

Accuracy Tests

This PR does not change model forward or kernel math; it only changes device placement and backend selection for non-CUDA. No new accuracy test results are required. Existing InternVL and vision model behavior on CUDA is unchanged; on XPU, models can now run with correct device placement and backend.

Benchmarking and Profiling

Not applicable. Changes are for correctness and multi-device support (device placement and backend selection). No intentional inference-speed changes; benchmarking can be done by maintainers on XPU if needed.

…or xpu

gemini-code-assist · 2026-03-06T02:51:22Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

jmunetong · 2026-03-06T02:52:46Z

@mingfeima and @airMeng, I am currently working with @yangw1234 on these pull requests. I was wondering if you could help us get these reviewed.

airMeng · 2026-03-06T03:00:22Z

@jmunetong Thank you for your help, would you mind to add at least one test cases to XPU CI to avoid broken again? You can refer to https://github.com/sgl-project/sglang/tree/main/test/srt/xpu

cc validation leader @MingxuZh

jmunetong · 2026-03-09T00:29:31Z

@airMeng Should be passing now. I forgot to rebase from a different pr where we modified some cuda calls that were breaking the xpu test.

yangw1234 · 2026-03-09T19:54:02Z

@airMeng Should be passing now. I forgot to rebase from a different pr where we modified some cuda calls that were breaking the xpu test.

@jmunetong The internvl specific changes are gone, you need to add it back.

And I think @airMeng is asking you to add a test case similar to this one https://github.com/sgl-project/sglang/blob/main/test/srt/xpu/test_deepseek_ocr.py.

You may also refer this file https://github.com/sgl-project/sglang/blob/main/test/registered/vlm/test_vision_openai_server_a.py#L112

…l-jm

hnyls2002 · 2026-03-12T23:15:15Z

/tag-and-rerun-ci

…gl-project#19997) Co-authored-by: Yang Wang <mr.yang.wang@outlook.com>

This commit introduces comprehensive ROCm wheel building infrastructure for SGLang, targeting AWS S3 for internal distribution. scripts/check_aiter_version.sh: - Smart AITER version detection from docker/rocm.Dockerfile - Handles both version tags (e.g., v0.1.12.post1) and commit SHAs - Checks S3 for existing wheels to avoid unnecessary rebuilds - Returns rebuild decision and detected version .github/workflows/release-whl-sglang-rocm.yml: - Unified workflow with 4 stages and explicit job dependencies - Stage 1: Check if AITER needs rebuild (conditional) - Stage 2: Build AITER wheels for rocm700/rocm720 (only if needed) - Stage 3: Build sglang wheels for both ROCm versions - Stage 4: Upload to S3 with proper directory structure 1. S3 Structure: Clean separation of HTML indices and wheel files - simple/: HTML indices (PEP 503 compliant) - packages/: Actual wheel files - Relative links: ../../packages/pkg/file.whl 2. AITER Workflow: Integrated as conditional stage - Triggered by docker/rocm.Dockerfile changes - Smart rebuild: only when version changes - AITER must complete before sglang build 3. Version Format: Standard Python versioning - Release: 0.5.9 - Nightly: 0.5.10.dev20260421+g4cf4f08 - Note: AITER and sglang-kernel keep +rocm suffix (compiled binaries) 4. AWS Secrets: AMD_* naming convention - AMD_AWS_ACCESS_KEY_ID - AMD_AWS_SECRET_ACCESS_KEY - AMD_S3_BUCKET_NAME 5. Workflow Triggers: - Daily schedule (3 AM UTC) - Push to docker/rocm.Dockerfile - Manual dispatch with options Related-to: sgl-project#19997 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add pyproject_rocm.toml with ROCm/HIP-specific dependencies: - Inherits runtime_common, diffusion_common, tracing, test extras - Defines rocm700 and rocm720 extras with pinned packages: * torch, triton, torchaudio, torchvision from repo.radeon.com * sglang-kernel from GitHub releases * amd-aiter (discovered via --extra-index-url) * mooncake-transfer-engine-non-cuda - Defines srt_hip and diffusion_hip for HIP runtime - Removed non-ROCm architectures (HPU, MUSA, MPS) Users install with: pip install 'sglang[srt_hip,rocm700]' \ --extra-index-url https://aioss-pypi-prod.s3.amazonaws.com/sglang/rocm700/simple/ setuptools-scm configured with local_scheme='no-local-version' to suppress +rocm suffix on sglang wheels (standard Python versioning). Related-to: sgl-project#19997 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yangw1234 and others added 5 commits March 5, 2026 14:03

fix internvl cuda

61fe352

fix scheduler for non-cuda devices and disable piecewise cuda graph f…

ec5bb71

…or xpu

fix vision attn backend

3140e34

added pre-commit

cd337d6

Merge branch 'main' into fix_non_cuda

655909f

jmunetong requested review from Fridge003, HaiShaw, JustinTong0323, Qiaolin-Yu, hebiao064, ispobock, merrymercy, mickqian, yhyang201 and yuan-luo as code owners March 6, 2026 02:51

github-actions Bot added the Multi-modal multi-modal language model label Mar 6, 2026

jmunetong force-pushed the internvl-jm branch from 3fa406b to 655909f Compare March 9, 2026 00:18

jmunetong requested review from Ying1123, hnyls2002 and xiezhq-hermann as code owners March 9, 2026 00:18

Merge branch 'main' into internvl-jm

5a23234

jmunetong added 3 commits March 10, 2026 22:59

Merge remote-tracking branch 'contributor_fork/internvl' into internv…

2d052f6

…l-jm

resolved merge conflict

20500cb

adding test files -- previously stashed

e4594ac

hnyls2002 approved these changes Mar 12, 2026

View reviewed changes

github-actions Bot added the run-ci label Mar 12, 2026

hnyls2002 merged commit 7458407 into sgl-project:main Mar 15, 2026
192 of 223 checks passed

yeahdongcn mentioned this pull request Mar 16, 2026

[MLX] Add native MLX execution backend for Apple Silicon Mac #20342

Merged

5 tasks

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

Fix InternVL and vision attention for non-CUDA backends (e.g. XPU) (s…

eb417bf

…gl-project#19997) Co-authored-by: Yang Wang <mr.yang.wang@outlook.com>

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

Fix InternVL and vision attention for non-CUDA backends (e.g. XPU) (s…

f7c89a3

…gl-project#19997) Co-authored-by: Yang Wang <mr.yang.wang@outlook.com>

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

Fix InternVL and vision attention for non-CUDA backends (e.g. XPU) (s…

eed8761

…gl-project#19997) Co-authored-by: Yang Wang <mr.yang.wang@outlook.com>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

Fix InternVL and vision attention for non-CUDA backends (e.g. XPU) (s…

6385375

…gl-project#19997) Co-authored-by: Yang Wang <mr.yang.wang@outlook.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix InternVL and vision attention for non-CUDA backends (e.g. XPU)#19997

Fix InternVL and vision attention for non-CUDA backends (e.g. XPU)#19997
hnyls2002 merged 9 commits intosgl-project:mainfrom
jmunetong:internvl-jm

jmunetong commented Mar 6, 2026

Uh oh!

gemini-code-assist Bot commented Mar 6, 2026

Uh oh!

jmunetong commented Mar 6, 2026

Uh oh!

airMeng commented Mar 6, 2026

Uh oh!

jmunetong commented Mar 9, 2026

Uh oh!

yangw1234 commented Mar 9, 2026 •

edited

Loading

Uh oh!

hnyls2002 commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jmunetong commented Mar 6, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Uh oh!

gemini-code-assist Bot commented Mar 6, 2026

Uh oh!

jmunetong commented Mar 6, 2026

Uh oh!

airMeng commented Mar 6, 2026

Uh oh!

jmunetong commented Mar 9, 2026

Uh oh!

yangw1234 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hnyls2002 commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yangw1234 commented Mar 9, 2026 •

edited

Loading