Skip to content

Release SGLang Server AL2023 DLC#6179

Merged
Jyothirmaikottu merged 30 commits into
mainfrom
release-sglangamzn2023
Jun 11, 2026
Merged

Release SGLang Server AL2023 DLC#6179
Jyothirmaikottu merged 30 commits into
mainfrom
release-sglangamzn2023

Conversation

@sirutBuasai

Copy link
Copy Markdown
Member

Purpose

Test Plan

Test Result


Toggle if you are merging into master Branch

By default, docker image builds and tests are disabled. Two ways to run builds and tests:

  1. Using dlc_developer_config.toml
  2. Using this PR description (currently only supported for PyTorch, TensorFlow, vllm, and base images)
How to use the helper utility for updating dlc_developer_config.toml

Assuming your remote is called origin (you can find out more with git remote -v)...

  • Run default builds and tests for a particular buildspec - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -cp origin

  • Enable specific tests for a buildspec or set of buildspecs - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -t sanity_tests -cp origin

  • Restore TOML file when ready to merge

python src/prepare_dlc_dev_environment.py -rcp origin

NOTE: If you are creating a PR for a new framework version, please ensure success of the local, standard, rc, and efa sagemaker tests by updating the dlc_developer_config.toml file:

  • sagemaker_remote_tests = true
  • sagemaker_efa_tests = true
  • sagemaker_rc_tests = true
  • sagemaker_local_tests = true
How to use PR description Use the code block below to uncomment commands and run the PR CodeBuild jobs. There are two commands available:
  • # /buildspec <buildspec_path>
    • e.g.: # /buildspec pytorch/training/buildspec.yml
    • If this line is commented out, dlc_developer_config.toml will be used.
  • # /tests <test_list>
    • e.g.: # /tests sanity security ec2
    • If this line is commented out, it will run the default set of tests (same as the defaults in dlc_developer_config.toml): sanity, security, ec2, ecs, eks, sagemaker, sagemaker-local.
# /buildspec <buildspec_path>
# /tests <test_list>
Toggle if you are merging into main Branch

PR Checklist

  • [] I ran pre-commit run --all-files locally before creating this PR. (Read DEVELOPMENT.md for details).

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
sirutBuasai and others added 27 commits June 2, 2026 15:07
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>
Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>


FP8 models (qwen3.5-35b-a3b-fp8, qwen3-coder-next-fp8) require fp8e4nv
which is only supported on Hopper (sm_90+). The gpu-l40s-4gpu-runners
label doesn't exist, causing fallback to gpu-efa-runners (A100 sm_80).
LLaMA 3.3 70B OOMs on A100 runners. Move all three to gpu-h100-8gpu-runners
with tp=8 and appropriate memory settings.

Add CVE-2026-42504 to security allowlist — go/stdlib MIME header CPU
exhaustion in mooncake libetcd_wrapper.so, same root cause as existing
Go stdlib entries.
PyTorch 2.11.0+cu130 bundles an older nvidia-cutlass-dsl that has
incompatible MLIR bindings with FlashInfer 0.6.11.post1's rmsnorm_cute
kernel. Force-reinstall cutlass-dsl>=4.5.2 after torch re-pin to ensure
compatible GPUModuleOp API during CUDA graph capture.

Upstream SGLang applies the same fix (sgl-project/sglang#25958).
Benchmark run 27228675384 surfaced three distinct failures:

- qwen3.5-35b-a3b-fp8 / qwen3-coder-next-fp8: tp=8 shards the FP8 MoE
  gate/up output_size to 64, which is not divisible by block_n=128
  ("output_size ... not divisible by weight quantization block_n=128").
  Revert to tp=4 — the intended sharding for these FP8 models.

- qwen3-32b: shared gpu-efa-runners pod had a leftover process holding
  port 8000 ("address already in use" -> warmup timeout). Move to a
  dedicated gpu-h100-8gpu-runners pod to avoid the collision.

llama-3.3-70b stays at tp=8 (dense model, no block-quant constraint,
needs the memory headroom).
… CUDA graph

All gpu-h100-8gpu-runners benchmark jobs failed at server startup with
'[Errno 98] address already in use' on port 8000; port 8000 is occupied
on those pods. Remove the SGLANG_PORT=8000 override from the five GPU
models so they use the SGLang default (30000), matching the x86 jobs
that already pass.

Also add --disable-piecewise-cuda-graph to qwen3-32b: it crashed during
warmup_compile with 'FusedAddRMSNorm ... illegal memory access' while
capturing the experimental piecewise CUDA graph (same workaround as
llama-3.3-70b).
@Jyothirmaikottu Jyothirmaikottu merged commit 5705459 into main Jun 11, 2026
197 of 198 checks passed
@sirutBuasai sirutBuasai deleted the release-sglangamzn2023 branch June 11, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants