Release SGLang Server AL2023 DLC by sirutBuasai · Pull Request #6179 · aws/deep-learning-containers

sirutBuasai · 2026-06-02T21:53:14Z

Purpose

Test Plan

Test Result

Toggle if you are merging into master Branch

By default, docker image builds and tests are disabled. Two ways to run builds and tests:

Using dlc_developer_config.toml
Using this PR description (currently only supported for PyTorch, TensorFlow, vllm, and base images)

How to use the helper utility for updating dlc_developer_config.toml

Assuming your remote is called origin (you can find out more with git remote -v)...

Run default builds and tests for a particular buildspec - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -cp origin

Enable specific tests for a buildspec or set of buildspecs - also commits and pushes changes to remote; Example:

python src/prepare_dlc_dev_environment.py -b </path/to/buildspec.yml> -t sanity_tests -cp origin

Restore TOML file when ready to merge

python src/prepare_dlc_dev_environment.py -rcp origin

NOTE: If you are creating a PR for a new framework version, please ensure success of the local, standard, rc, and efa sagemaker tests by updating the dlc_developer_config.toml file:

sagemaker_remote_tests = true
sagemaker_efa_tests = true
sagemaker_rc_tests = true
sagemaker_local_tests = true

How to use PR description

Use the code block below to uncomment commands and run the PR CodeBuild jobs. There are two commands available:

# /buildspec <buildspec_path>
- e.g.: # /buildspec pytorch/training/buildspec.yml
- If this line is commented out, dlc_developer_config.toml will be used.
# /tests <test_list>
- e.g.: # /tests sanity security ec2
- If this line is commented out, it will run the default set of tests (same as the defaults in dlc_developer_config.toml): sanity, security, ec2, ecs, eks, sagemaker, sagemaker-local.

# /buildspec <buildspec_path>
# /tests <test_list>

Toggle if you are merging into main Branch

PR Checklist

[] I ran pre-commit run --all-files locally before creating this PR. (Read DEVELOPMENT.md for details).

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

FP8 models (qwen3.5-35b-a3b-fp8, qwen3-coder-next-fp8) require fp8e4nv which is only supported on Hopper (sm_90+). The gpu-l40s-4gpu-runners label doesn't exist, causing fallback to gpu-efa-runners (A100 sm_80). LLaMA 3.3 70B OOMs on A100 runners. Move all three to gpu-h100-8gpu-runners with tp=8 and appropriate memory settings. Add CVE-2026-42504 to security allowlist — go/stdlib MIME header CPU exhaustion in mooncake libetcd_wrapper.so, same root cause as existing Go stdlib entries.

PyTorch 2.11.0+cu130 bundles an older nvidia-cutlass-dsl that has incompatible MLIR bindings with FlashInfer 0.6.11.post1's rmsnorm_cute kernel. Force-reinstall cutlass-dsl>=4.5.2 after torch re-pin to ensure compatible GPUModuleOp API during CUDA graph capture. Upstream SGLang applies the same fix (sgl-project/sglang#25958).

Benchmark run 27228675384 surfaced three distinct failures: - qwen3.5-35b-a3b-fp8 / qwen3-coder-next-fp8: tp=8 shards the FP8 MoE gate/up output_size to 64, which is not divisible by block_n=128 ("output_size ... not divisible by weight quantization block_n=128"). Revert to tp=4 — the intended sharding for these FP8 models. - qwen3-32b: shared gpu-efa-runners pod had a leftover process holding port 8000 ("address already in use" -> warmup timeout). Move to a dedicated gpu-h100-8gpu-runners pod to avoid the collision. llama-3.3-70b stays at tp=8 (dense model, no block-quant constraint, needs the memory headroom).

… CUDA graph All gpu-h100-8gpu-runners benchmark jobs failed at server startup with '[Errno 98] address already in use' on port 8000; port 8000 is occupied on those pods. Remove the SGLANG_PORT=8000 override from the five GPU models so they use the SGLang default (30000), matching the x86 jobs that already pass. Also add --disable-piecewise-cuda-graph to qwen3-32b: it crashed during warmup_compile with 'FusedAddRMSNorm ... illegal memory access' while capturing the experimental piecewise CUDA graph (same workaround as llama-3.3-70b).

sirutBuasai added 2 commits June 2, 2026 14:22

add sglang amzn2023 autorelease

2801c4a

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

update sglang to cuda 13

d990c53

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

aws-deep-learning-containers-ci Bot added the authorized label Jun 2, 2026

sirutBuasai and others added 27 commits June 2, 2026 15:07

update build script

ef3fd56

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

fix tagging

8ea4c5c

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

fix dockerfile path

c704ed8

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

add cuda ref

f6cdfdd

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

update cron

8756c91

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

fix quoting

cc58dfa

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

fix linking

11541d7

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

clean tags

ade0b45

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

ad sglang amzn2023 allowlist

ea8dcc7

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

fix telemetry

63db872

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

fix telemetry

b7b9eaf

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

update model throughput threshold

e97d0e8

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

udpate sglang port

e8bcd51

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

mode port killing mechanism

b2f5a3f

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

kill port 30000

2e91955

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

add port randomization

8dfb4b4

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

revert port to 8000

05df9b8

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

revert port

114681d

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

debug port logs

9d9ef3f

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

check ports

a4c22d0

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

fix debug

61dff33

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

fix debug

0b9fbec

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

revert temp changes

348a785

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

revert debug statemtns

7ebe17f

Signed-off-by: sirutBuasai <sirutbuasai27@outlook.com>

Jyothirmaikottu approved these changes Jun 11, 2026

View reviewed changes

Jyothirmaikottu merged commit 5705459 into main Jun 11, 2026
197 of 198 checks passed

sirutBuasai deleted the release-sglangamzn2023 branch June 11, 2026 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release SGLang Server AL2023 DLC#6179

Release SGLang Server AL2023 DLC#6179
Jyothirmaikottu merged 30 commits into
mainfrom
release-sglangamzn2023

sirutBuasai commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sirutBuasai commented Jun 2, 2026

Purpose

Test Plan

Test Result

PR Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants