Skip to content

Conversation

@jgangani
Copy link
Collaborator

@jgangani jgangani commented Dec 2, 2025

Update to GPTOSS Aggregation performance. All 3 seq lens.

Passed E2E test: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19850790932

@cquil11 cquil11 merged commit 06667a5 into main Dec 2, 2025
2 checks passed
@cquil11 cquil11 deleted the jgangani_update_gptoss_agg branch December 2, 2025 23:01
rkarhila-amd pushed a commit to rkarhila-amd/InferenceMAX that referenced this pull request Dec 3, 2025
* Update GPTOSS B200 AGG

* set dp attention env vars

* Add DP attn comment

---------

Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
cquil11 added a commit that referenced this pull request Dec 5, 2025
… 0.5.5.post2 for AMD MI355 (#247)

* Change AMD MI355 docker image to lmsysorg/sglang:v0.5.5.post2-rocm700-mi35x for dsr1-fp8

* Adjust preview for dark mode and light mode (#250)

* Adjust preview for dark mode and light mode

picture element for better display based on color scheme.

* Rounded

* Update image alt text in README.md

* adding ISL/OSL to collect results table summary (#249)

Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

* chore: refactor Docker runner launch to be like SLURM (#227)

* initial poc

* remove -d flag when launching docker container

* syntax error

* compatibility fixes

* add correct endpoint prefix

* remove reference env var

* run vllm serve in background

* unescape sequences

* stop vllm to stdout after it stops

* stop vllm to stdout after it stops pt 2

* get rid of docker stop as no longer in detatched

* clone bench serving to tmp dir

* clone bench serving to tmp dir pt 2

* add explanatory comment

* cleaning up

* cleaning up

* adding mi355x refactor

* adding h200 initial refactor

* different way to see server logs

* cleanup

* now fail if server fails

* starting on b200

* doign b200

* reverting erroneous change

* fixing b200

* fixing b200 pt 2

* updating mi300

* updating mi300 pt 2

* updating mi300 pt 3 -- remove detached mode

* cleaning up mi355x

* fixing mi300x and updating 325x

* reverting max conc to 512 on gptoss fp4 b200 docker

* fixing mi300x and updating 325x

* cleanng up

* add wait for h200 slurm dsr1

* max num seqs back to 512 for gptoss fpr b200 docker

* fix port issue for dsr1 mi300x docker

* fix mi355x docker NUM_PROMPTS

* adding prop of failure for server logs

* add utils function for benchmark

* add utils function for benchmark

* function-ize the waiting for server to start

* dont show arg parsing set -x

* dont show arg parsing set +x oops

* dont show arg parsing set +x oops

* capture server pid

* nebdius dont scancel

* changes to comments in benchmark lib . sh

* Update benchmarks/dsr1_fp4_mi355x_docker.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update .github/workflows/benchmark-tmpl.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* adding back whitespace

* adding back whitespace

* adding back whitespace

* remove tg launch script

* Update benchmarks/gptoss_fp4_h100_docker.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update benchmarks/dsr1_fp8_mi325x_docker.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update benchmarks/dsr1_fp8_mi355x_docker.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update benchmarks/gptoss_fp4_b200_trt_slurm.sh

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Audit and correct required environment variables documentation in all benchmark scripts (#252)

* Initial plan

* Update required env vars documentation in all benchmark scripts

Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com>

* Fix required env vars - remove NF, PREFILL_SIZE, and correct PORT/PORT_OFFSET

Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com>

* Remove internally-calculated vars from required env vars (EXTRA_CONFIG_FILE, MAX_NUM_TOKENS, MOE_BACKEND)

Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: cquil11 <60715037+cquil11@users.noreply.github.com>

* removing oci node rebase with main

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

* Bump actions/checkout from 5.0.0 to 6.0.0 in the github-actions group (#253)

Bumps the github-actions group with 1 update: [actions/checkout](https://github.com/actions/checkout).


Updates `actions/checkout` from 5.0.0 to 6.0.0
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@08c6903...1af3b93)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 6.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add b200 DGXC node to b200 runners list (#245)

* Bump actions/setup-python in the github-actions group (#259)

Bumps the github-actions group with 1 update: [actions/setup-python](https://github.com/actions/setup-python).


Updates `actions/setup-python` from 6.0.0 to 6.1.0
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@e797f83...83679a8)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: 6.1.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat: refresh GB200 SGLang DSR1 submission (#257)

* Bumps DSR1 SGLang code

* update how we get the resulting log files

---------

Co-authored-by: Elnifio <elnifio0519@gmail.com>
Co-authored-by: Cameron Quilici <cjquilici@gmail.com>

* Update GPTOSS B200 TRTLLM (#266)

* Update GPTOSS B200 AGG

* set dp attention env vars

* Add DP attn comment

---------

Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>

* Fixed community container for MI35x dsr1 for fp8 for real

* Update dsr1_fp4_mi355x_docker.sh with env flags

* Update dsr1_fp4_mi355x_slurm.sh

* from post2 to post3 for fp8

* tidy formatting

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Austen Stone <austenstone@github.com>
Co-authored-by: Jatin Gangani <5560074+jgangani@users.noreply.github.com>
Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
Co-authored-by: Cameron Quilici <cjquilici@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ankur Singh <ankusingh@nvidia.com>
Co-authored-by: yunzhoul-nv <yunzhoul@nvidia.com>
Co-authored-by: Elnifio <elnifio0519@gmail.com>
Co-authored-by: ppalanga <ppalanga@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants