-
Notifications
You must be signed in to change notification settings - Fork 70
Add benchmark script for GPTOSS FP4 B200 TRT-LLM #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cquil11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
7c86d28 to
482ad30
Compare
|
Reminder:
So for this PR, you will add something like the following entry to the bottom of - config-keys:
- gptoss-fp4-b200-trt
description: |
- Add benchmark script for GPTOSS FP4 B200 TRT-LLM
PR: https://github.com/InferenceMAX/InferenceMAX/pull/256Then add the |
cquil11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment about perf-changelog.yaml
cquil11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added perf changelog so lgtm
|
@ankursingh-nv where are we on this? I added the perf changelog and kicked off test run here |
* Add benchmark script for GPTOSS FP4 B200 TRT-LLM * make changes to perf changelog --------- Co-authored-by: Cameron Quilici <cjquilici@gmail.com>
* Initial commit, for #304 * Allow testing on own PR * condense workflow * Rename Workflow * Use environments * Changed environment location * Stricter activation * Test replies * Test replies * Use token for comment perm * Forgot validation * feat: performance changelog triggered runs (as opposed to nightly) (#267) [skip-sweep] * add logic for event driven runs new single workflow that runs on merge to main, new perg-changelog.yaml to track performance changes, new logic to parse changelog, removed cron job in full sweep schedulers * testing pt 1 * raise error if yaml diff in perf changelog is not valid * remove unused imports in process_changelog.py * config data key fix * raise error if test-config subprocess fails to run * backfill changelog * backfill changelog pt 2 * backfill changelog pt 3 * backfill changelog pt 4 * backfill changelog pt 5 * backfill changelog pt 6 * add always() condition to upload changelog metadata * backfill changelog pt 7 (test) * backfill changelog pt 8 (revert test) * backfill changelog pt 9 * backfill changelog pt 11 * change if condition for jobs in run sweep workflow * debugging run sweep workflow * debugging run sweep workflow pt 2 * debugging run sweep workflow pt 3 (revert) * debugging run sweep workflow pt 4 * debugging run sweep workflow pt 5 * debugging run sweep workflow pt 6 * debugging run sweep workflow pt 7 * add always() condition to upload changelog metadata (add back, this got removed) * add bmk prefix to results * backfill changelog official * for concurrency group, use more unique sha * chore(deps): bump the github-actions group across 1 directory with 3 updates (#331) Bumps the github-actions group with 3 updates in the / directory: [actions/checkout](https://github.com/actions/checkout), [actions/upload-artifact](https://github.com/actions/upload-artifact) and [actions/download-artifact](https://github.com/actions/download-artifact). Updates `actions/checkout` from 6.0.0 to 6.0.1 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v6...8e8c483) Updates `actions/upload-artifact` from 5.0.0 to 6.0.0 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@330a01c...b7c566a) Updates `actions/download-artifact` from 6.0.0 to 7.0.0 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@018cc2c...37930b1) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: github-actions - dependency-name: actions/upload-artifact dependency-version: 6.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/download-artifact dependency-version: 7.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: add final newline to original perf-changelog.yaml so that there wont be erroneous negative diff [skip-sweep] (#333) * Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.6.post1 (#330) * Update amd-master.yaml * Update perf-changelog.yaml * Update dsr1_fp4_mi355x_docker.sh * Update dsr1_fp4_mi355x_docker.sh --------- Co-authored-by: Cameron Quilici <cjquilici@gmail.com> * TOCTOU * Test new env * Ready for merge * Add benchmark script for GPTOSS FP4 B200 TRT-LLM (#256) * Add benchmark script for GPTOSS FP4 B200 TRT-LLM * make changes to perf changelog --------- Co-authored-by: Cameron Quilici <cjquilici@gmail.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Cameron Quilici <cjquilici@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ppalanga <ppalanga@amd.com> Co-authored-by: Ankur Singh <ankusingh@nvidia.com>
Addressed all the requested changes in #245
Successful Run: https://github.com/InferenceMAX/InferenceMAX/actions/runs/19715912675
cc @cquil11 @kedarpotdar-nv