Skip to content

Fix InternVL and vision attention for non-CUDA backends (e.g. XPU)#19997

Merged
hnyls2002 merged 9 commits intosgl-project:mainfrom
jmunetong:internvl-jm
Mar 15, 2026
Merged

Fix InternVL and vision attention for non-CUDA backends (e.g. XPU)#19997
hnyls2002 merged 9 commits intosgl-project:mainfrom
jmunetong:internvl-jm

Conversation

@jmunetong
Copy link
Copy Markdown
Contributor

Motivation

InternVL and vision attention currently assume CUDA: they use hardcoded "cuda" and .cuda(), and the vision attention backend selection does not handle XPU. This prevents running InternVL and vision models on Intel XPU (and other non-CUDA devices). This PR makes both components device-agnostic so they work on the configured backend (CUDA, XPU, etc.).

Modifications

  • python/sglang/srt/multimodal/processors/internvl.py

    • Import get_device from sglang.srt.utils.
    • Replace all hardcoded device="cuda" with device=get_device() in normalization and preprocessing.
    • Replace .cuda() and .to("cuda") with .to(get_device()) for image/video tensors and input_ids so tensors are created on the actual backend device.
  • python/sglang/srt/layers/attention/vision.py

    • Import is_xpu and set _is_xpu = is_xpu() alongside existing _is_cuda, _is_npu, _is_hip.
    • In VisionTritonAttention: use cu_seqlens.to(q.device) and seq_lens.to(q.device) instead of .cuda() so tensors follow the model device (works on XPU and other backends).
    • In VisionAttention backend selection: add elif _is_xpu: backend = "triton_attn" so XPU uses the Triton attention backend instead of falling through to SDPA or unsupported paths.

Accuracy Tests

This PR does not change model forward or kernel math; it only changes device placement and backend selection for non-CUDA. No new accuracy test results are required. Existing InternVL and vision model behavior on CUDA is unchanged; on XPU, models can now run with correct device placement and backend.

Benchmarking and Profiling

Not applicable. Changes are for correctness and multi-device support (device placement and backend selection). No intentional inference-speed changes; benchmarking can be done by maintainers on XPU if needed.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the Multi-modal multi-modal language model label Mar 6, 2026
@jmunetong
Copy link
Copy Markdown
Contributor Author

@mingfeima and @airMeng, I am currently working with @yangw1234 on these pull requests. I was wondering if you could help us get these reviewed.

@airMeng
Copy link
Copy Markdown
Collaborator

airMeng commented Mar 6, 2026

@jmunetong Thank you for your help, would you mind to add at least one test cases to XPU CI to avoid broken again? You can refer to https://github.com/sgl-project/sglang/tree/main/test/srt/xpu

cc validation leader @MingxuZh

@jmunetong
Copy link
Copy Markdown
Contributor Author

@airMeng Should be passing now. I forgot to rebase from a different pr where we modified some cuda calls that were breaking the xpu test.

@yangw1234
Copy link
Copy Markdown
Contributor

yangw1234 commented Mar 9, 2026

@airMeng Should be passing now. I forgot to rebase from a different pr where we modified some cuda calls that were breaking the xpu test.

@jmunetong The internvl specific changes are gone, you need to add it back.

And I think @airMeng is asking you to add a test case similar to this one https://github.com/sgl-project/sglang/blob/main/test/srt/xpu/test_deepseek_ocr.py.

You may also refer this file https://github.com/sgl-project/sglang/blob/main/test/registered/vlm/test_vision_openai_server_a.py#L112

@hnyls2002
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@hnyls2002 hnyls2002 merged commit 7458407 into sgl-project:main Mar 15, 2026
192 of 223 checks passed
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
akao-amd added a commit to akao-amd/sglang that referenced this pull request May 5, 2026
This commit introduces comprehensive ROCm wheel building infrastructure
for SGLang, targeting AWS S3 for internal distribution.

scripts/check_aiter_version.sh:
  - Smart AITER version detection from docker/rocm.Dockerfile
  - Handles both version tags (e.g., v0.1.12.post1) and commit SHAs
  - Checks S3 for existing wheels to avoid unnecessary rebuilds
  - Returns rebuild decision and detected version

.github/workflows/release-whl-sglang-rocm.yml:
  - Unified workflow with 4 stages and explicit job dependencies
  - Stage 1: Check if AITER needs rebuild (conditional)
  - Stage 2: Build AITER wheels for rocm700/rocm720 (only if needed)
  - Stage 3: Build sglang wheels for both ROCm versions
  - Stage 4: Upload to S3 with proper directory structure

1. S3 Structure: Clean separation of HTML indices and wheel files
   - simple/: HTML indices (PEP 503 compliant)
   - packages/: Actual wheel files
   - Relative links: ../../packages/pkg/file.whl

2. AITER Workflow: Integrated as conditional stage
   - Triggered by docker/rocm.Dockerfile changes
   - Smart rebuild: only when version changes
   - AITER must complete before sglang build

3. Version Format: Standard Python versioning
   - Release: 0.5.9
   - Nightly: 0.5.10.dev20260421+g4cf4f08
   - Note: AITER and sglang-kernel keep +rocm suffix (compiled binaries)

4. AWS Secrets: AMD_* naming convention
   - AMD_AWS_ACCESS_KEY_ID
   - AMD_AWS_SECRET_ACCESS_KEY
   - AMD_S3_BUCKET_NAME

5. Workflow Triggers:
   - Daily schedule (3 AM UTC)
   - Push to docker/rocm.Dockerfile
   - Manual dispatch with options

Related-to: sgl-project#19997

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
akao-amd added a commit to akao-amd/sglang that referenced this pull request May 5, 2026
Add pyproject_rocm.toml with ROCm/HIP-specific dependencies:

- Inherits runtime_common, diffusion_common, tracing, test extras
- Defines rocm700 and rocm720 extras with pinned packages:
  * torch, triton, torchaudio, torchvision from repo.radeon.com
  * sglang-kernel from GitHub releases
  * amd-aiter (discovered via --extra-index-url)
  * mooncake-transfer-engine-non-cuda

- Defines srt_hip and diffusion_hip for HIP runtime
- Removed non-ROCm architectures (HPU, MUSA, MPS)

Users install with:
  pip install 'sglang[srt_hip,rocm700]' \
    --extra-index-url https://aioss-pypi-prod.s3.amazonaws.com/sglang/rocm700/simple/

setuptools-scm configured with local_scheme='no-local-version' to suppress
+rocm suffix on sglang wheels (standard Python versioning).

Related-to: sgl-project#19997

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Multi-modal multi-modal language model run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants