feat: performance changelog triggered runs (as opposed to nightly) #267

cquil11 · 2025-12-02T21:51:47Z

Design Doc: Changelog Triggered Only Runs

Test full sweep: https://github.com/InferenceMAX/InferenceMAX/actions/runs/20081821674
Test with changelog: https://github.com/InferenceMAX/InferenceMAX/actions/runs/20218553330

Introduction and Motivation

InferenceMAX prides itself on running regularly to keep up with the fast pace of software changes. However, running software release builds & release candidate builds don’t typically happen on a _nightly _basis and as such, it is unnecessary to perform a full-sweep every single night. Instead, it is sufficient to run only the configs (from the master configs) whose performance is affected as a result of any particular change.

Implementation

After the major refactors (#251 and #145), all run information is defined in source of truth master configuration files. These files are then parsed via the utils/matrix_logic/generate_sweep_configs.py script, whose JSON output is subsequently passed to a benchmark-*-tmpl.yml workflow. This sets up this proposal nicely, as each configuration has a “key” string associated with it. We propose a high-level “performance changelog” at the root of the repo (perf-changelog.yaml) that developers will use to specify config keys whose performance is affected in a particular PR.

The main change here can be summarized as follows: instead of running each config every night, configs will only be run on an “as needed” basis (i.e., when a change is made that affects their performance). This will more efficiently use compute resources while leading to a faster feedback cycle.

Example Developer Workflow

As an example, suppose a developer updates some code that will affect the performance of the dsr1-fp4-b200-sglang config. Perhaps they change the version of the SGLang serving image. Then, the developer should append to the end of the perf-changelog.yaml and add the following:

Then, upon merge to main, the run-sweep.yml workflow will be triggered (if and only if a change to perf-changelog.yaml has been detected). This workflow will subsequently run a script called process-changelog.py. This script takes in the following as arguments:

Path to the perf changelog file
A base Git ref (i.e., main/HEAD~1)
A head Git ref (i.e., main/HEAD)

The script then gets the diff between the base perf changelog and the head perf changelog. If there is any negative diff, the script fails and so does the workflow (there should never be any negative diff as this changelog is meant to be a running changelog of all perf changes, and it runs from the bottom). The positive diff is then pre-processed to remove the line diff indicators, and then loaded to a list of dicts representing the changes. Then, a function retrieves the list of all config keys to run and passes them to generate_sweep_configs.py with the new subcommand test-config. This subcommand simply takes in a list of config keys (from the master config YAMLs) and retrieves the cases to run for each config. Once all results are retrieved, we organize them by sequence length and single/multi node and then dump them to stdout and direct them to an output of the invoking job in the workflow.

Workflow Mechanisms

The run-sweep.yml workflow will be the main workflow used for running “official” sweeps. As described above, this runs on any merge to main where there is an _addition _to perf-changelog.yml. This workflow begins with this job that retrieves which jobs to run based on diff: \

The output of process-changelog.py will be split by single/multi node and sequence length. Recall this is due to the 256 matrix generation limit that GitHub Actions imposes. If this limit didn’t exist, we would only split by single and multi node. We note that this implementation is not very scalable in the event we add more sequence lengths and/or more models (if we add more sequence lengths, we will continue having to hard-code new jobs sweep-single/multi-node-XkYk for each one, if we add new models, then we will have to further split by model as adding new models will increase the number of configs for each scenario described above). We hope to have a more permanent solution to this in the near future (that will likely involve introducing a sub-workflow). However, this is not as big of an issue, since we don’t anticipate needing to run “full” sweeps often, if at all.

We also note that the GitHub Action engineers are close to implementing lazy loading for workflows with many jobs, so the 10s loading timeout will hopefully no longer be an issue.

Testing

In order to ensure that sweeps can be tested and validated before merging and running the “official” sweep, developers can enable sweeps on their pull request by ensuring three things:

The PR is marked as “ready for review”
The PR includes valid changes to perf-changelog.yaml
The PR includes the label sweep-enabled

This will trigger the run-sweep.yml workflow on the PR branch.

Ensuing Frontend Changes

These changes would have many frontend implications. First we will describe the high-level considerations (such as what the UI displays) and then we will discuss the low-level considerations (how data is scraped from the artifacts).

In order to not delay main InferenceMAX development, the plan is to “freeze” the InferenceMAX UI on the December 9th run and implement the frontend changes over the course of the following week.

High-Level Considerations

Currently, the InferenceMAX frontend displays runs for each date, with an option to view historical runs via a date selector. This method assumes that a full sweep of runs is being run each day. However with this proposal, we will be running configurations relatively infrequently (only when necessary). Therefore, it doesn’t make sense to display runs for each date in the date dropdown selector. We propose the following: for the “Select a run date” dropdown calendar, only make clickable the dates on which a run was triggered. When this date is selected, there are two scenarios:

For the configs that were _unchanged _as a result of the triggered run (i.e., the configs that were not specified in the perf-changelog.yaml change associated with this run), carry-forward the data points from that config’s most recent run. One thing to note here is that we ought to make sure that all runs are completely successful (i.e., all jobs completed successfully) so as not to carry-forward incomplete run data.
For the configs that were _updated _as a result of the triggered run, display the new data points associated with the run.

With this strategy, we should differentiate the data points/frontier line in a way such that it is obvious to the user, on any given date, what data is _new _and what data is carry-forward. Implementation details will be left up to Akira.

We also should get rid of reliability data (at least in its current form). If we want to keep it, we can simply display “success rate per GPU sku” and filter over “last X runs,” or something like that. But even so, it appears reliability sort of loses its meaning now that we aren’t running every night. Now when running via a trigger, we will want to test rigorously _before _merging such that the “official” run is successful.

Low-Level Considerations

Recall, we are currently running three separate workflows each night (separated by sequence length, 1k1k, 1k8k, and 8k1k). This is shown here. With this proposal, we will be running _all _desired configs in a _single _run. The resulting artifact will be called results_all (as opposed to being split up by model and sequence length e.g., results_dsr1_1k1k). This shouldn’t be too difficult to implement, in fact it should make things easier since we won’t have to aggregate all results into a single file.

Further, we need to ensure backwards compatibility such that the old workflows can still be processed correctly.

Additional Considerations

We plan on back-filling the perf-changelog.yaml to reflect perf changes since the launch of InferenceMAX v1. We will not run CI on these back-filled changes – it is just so that the changes will be reflected. In order to accomplish this, we need to add a mechanism to the run-sweep.yml workflow in order to skip running the actual CI pipeline upon merge to main. We add the following condition to the setup job:

Now, if the merge commit contains the “[skip-sweep]” string, CI will not be run. This will allow developers to “force” make alterations to perf-changelog.yaml without actually running CI.

github-actions · 2025-12-09T23:20:59Z

📊 Line Count Report

File: utils/matrix_logic/generate_sweep_configs.py

Total Lines: 682

Base Lines: 570

Change: +112 lines 📈

new single workflow that runs on merge to main, new perg-changelog.yaml to track performance changes, new logic to parse changelog, removed cron job in full sweep schedulers

functionstackx · 2025-12-10T21:20:25Z

@cquil11 does the result filenames change? since it is now one mega run instead of 3 spearate jobs? the frontend regex based on filenames

also shouldnt these all 3 files be dleted since not needed anymore?

functionstackx · 2025-12-10T21:22:57Z

@cquil11 can u show that an run where process_result.py & github summary & utils/collect_results.py all show similar results before and after

plz include link to test run & screenshots after u unzip the file

cquil11 · 2025-12-10T21:23:05Z

@functionstackx yes, this will have frontend implications which I briefly mentioned in PR description
I was planning on keeping these (for a bit at least) to still allow developers to re-run failed workflows (in the legacy way) for a bit. I can remove if you think that's best

can u show that an run where process_result.py & github summary & utils/collect_results.py all show similar results before and after
plz include link to test run & screenshots after u unzip the file

yes, one is finishing up now:
here is the run: https://github.com/InferenceMAX/InferenceMAX/actions/runs/20081821674
here is the aggregated results_all with all data points: https://github.com/InferenceMAX/InferenceMAX/actions/runs/20081821674/artifacts/4830384278

functionstackx

left some comments mostly around validatio of correctness of this PR

.github/workflows/run-sweep.yml

chunfangamd

Please refer to the messages in the code

.github/workflows/run-sweep.yml

perf-changelog.yaml

utils/matrix_logic/validation.py

utils/process_changelog.py

…ot removed)

) [skip-sweep] * add logic for event driven runs new single workflow that runs on merge to main, new perg-changelog.yaml to track performance changes, new logic to parse changelog, removed cron job in full sweep schedulers * testing pt 1 * raise error if yaml diff in perf changelog is not valid * remove unused imports in process_changelog.py * config data key fix * raise error if test-config subprocess fails to run * backfill changelog * backfill changelog pt 2 * backfill changelog pt 3 * backfill changelog pt 4 * backfill changelog pt 5 * backfill changelog pt 6 * add always() condition to upload changelog metadata * backfill changelog pt 7 (test) * backfill changelog pt 8 (revert test) * backfill changelog pt 9 * backfill changelog pt 11 * change if condition for jobs in run sweep workflow * debugging run sweep workflow * debugging run sweep workflow pt 2 * debugging run sweep workflow pt 3 (revert) * debugging run sweep workflow pt 4 * debugging run sweep workflow pt 5 * debugging run sweep workflow pt 6 * debugging run sweep workflow pt 7 * add always() condition to upload changelog metadata (add back, this got removed) * add bmk prefix to results * backfill changelog official * for concurrency group, use more unique sha

* Initial commit, for #304 * Allow testing on own PR * condense workflow * Rename Workflow * Use environments * Changed environment location * Stricter activation * Test replies * Test replies * Use token for comment perm * Forgot validation * feat: performance changelog triggered runs (as opposed to nightly) (#267) [skip-sweep] * add logic for event driven runs new single workflow that runs on merge to main, new perg-changelog.yaml to track performance changes, new logic to parse changelog, removed cron job in full sweep schedulers * testing pt 1 * raise error if yaml diff in perf changelog is not valid * remove unused imports in process_changelog.py * config data key fix * raise error if test-config subprocess fails to run * backfill changelog * backfill changelog pt 2 * backfill changelog pt 3 * backfill changelog pt 4 * backfill changelog pt 5 * backfill changelog pt 6 * add always() condition to upload changelog metadata * backfill changelog pt 7 (test) * backfill changelog pt 8 (revert test) * backfill changelog pt 9 * backfill changelog pt 11 * change if condition for jobs in run sweep workflow * debugging run sweep workflow * debugging run sweep workflow pt 2 * debugging run sweep workflow pt 3 (revert) * debugging run sweep workflow pt 4 * debugging run sweep workflow pt 5 * debugging run sweep workflow pt 6 * debugging run sweep workflow pt 7 * add always() condition to upload changelog metadata (add back, this got removed) * add bmk prefix to results * backfill changelog official * for concurrency group, use more unique sha * chore(deps): bump the github-actions group across 1 directory with 3 updates (#331) Bumps the github-actions group with 3 updates in the / directory: [actions/checkout](https://github.com/actions/checkout), [actions/upload-artifact](https://github.com/actions/upload-artifact) and [actions/download-artifact](https://github.com/actions/download-artifact). Updates `actions/checkout` from 6.0.0 to 6.0.1 - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v6...8e8c483) Updates `actions/upload-artifact` from 5.0.0 to 6.0.0 - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@330a01c...b7c566a) Updates `actions/download-artifact` from 6.0.0 to 7.0.0 - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@018cc2c...37930b1) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: github-actions - dependency-name: actions/upload-artifact dependency-version: 6.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions - dependency-name: actions/download-artifact dependency-version: 7.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: add final newline to original perf-changelog.yaml so that there wont be erroneous negative diff [skip-sweep] (#333) * Update MI355x Deepseek-R1 FP4 SGLang Image to v0.5.6.post1 (#330) * Update amd-master.yaml * Update perf-changelog.yaml * Update dsr1_fp4_mi355x_docker.sh * Update dsr1_fp4_mi355x_docker.sh --------- Co-authored-by: Cameron Quilici <cjquilici@gmail.com> * TOCTOU * Test new env * Ready for merge * Add benchmark script for GPTOSS FP4 B200 TRT-LLM (#256) * Add benchmark script for GPTOSS FP4 B200 TRT-LLM * make changes to perf changelog --------- Co-authored-by: Cameron Quilici <cjquilici@gmail.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Cameron Quilici <cjquilici@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ppalanga <ppalanga@amd.com> Co-authored-by: Ankur Singh <ankusingh@nvidia.com>

cquil11 force-pushed the diff-only-runs branch 2 times, most recently from 0051fe8 to 184b4ae Compare December 4, 2025 17:31

cquil11 changed the base branch from main to multinode-integration December 4, 2025 17:31

Base automatically changed from multinode-integration to main December 5, 2025 20:51

cquil11 force-pushed the diff-only-runs branch 2 times, most recently from 22ddb79 to 9acd1e7 Compare December 5, 2025 21:07

functionstackx added this to InferenceMAX Board Dec 7, 2025

functionstackx moved this to In Progress in InferenceMAX Board Dec 7, 2025

functionstackx added the code-health label Dec 7, 2025

functionstackx temporarily deployed to fork-pr-validation December 7, 2025 21:17 — with GitHub Actions Inactive

cquil11 force-pushed the diff-only-runs branch from e9209b5 to 6d674e9 Compare December 9, 2025 22:53

cquil11 marked this pull request as ready for review December 9, 2025 23:20

cquil11 requested a review from a team as a code owner December 9, 2025 23:20

cquil11 added the sweep-enabled label Dec 9, 2025

cquil11 temporarily deployed to fork-pr-validation December 9, 2025 23:23 — with GitHub Actions Inactive

cquil11 changed the title ~~[WIP]: Diff only runs~~ feat: performance changelog triggered runs (as opposed to nightly) Dec 10, 2025

add logic for event driven runs

433f2ef

new single workflow that runs on merge to main, new perg-changelog.yaml to track performance changes, new logic to parse changelog, removed cron job in full sweep schedulers

cquil11 force-pushed the diff-only-runs branch from 6dfc296 to 433f2ef Compare December 10, 2025 21:12

cquil11 requested review from chunfangamd and kedarpotdar-nv December 10, 2025 21:14

functionstackx requested changes Dec 10, 2025

View reviewed changes

cquil11 removed the sweep-enabled label Dec 10, 2025

functionstackx reviewed Dec 10, 2025

View reviewed changes

.github/workflows/run-sweep.yml Show resolved Hide resolved

testing pt 1

dd4682b

chunfangamd requested changes Dec 11, 2025

View reviewed changes

raise error if yaml diff in perf changelog is not valid

7d6e052

cquil11 temporarily deployed to fork-pr-validation December 12, 2025 14:42 — with GitHub Actions Inactive

cquil11 added 2 commits December 12, 2025 08:43

backfill changelog pt 7 (test)

763b394

backfill changelog pt 8 (revert test)

d0b2de7

cquil11 removed the sweep-enabled label Dec 12, 2025

backfill changelog pt 9

41341ad

cquil11 added the sweep-enabled label Dec 12, 2025

cquil11 temporarily deployed to fork-pr-validation December 12, 2025 15:20 — with GitHub Actions Inactive

cquil11 added 11 commits December 12, 2025 09:22

backfill changelog pt 11

f131962

change if condition for jobs in run sweep workflow

dfeba21

debugging run sweep workflow

fd07f40

debugging run sweep workflow pt 2

228e0a2

debugging run sweep workflow pt 3 (revert)

cb2cc8a

debugging run sweep workflow pt 4

055b324

debugging run sweep workflow pt 5

ae65551

debugging run sweep workflow pt 6

667d2e1

debugging run sweep workflow pt 7

ef3ba6b

add always() condition to upload changelog metadata (add back, this g…

fae8278

…ot removed)

add bmk prefix to results

2018ad3

cquil11 removed the sweep-enabled label Dec 15, 2025

cquil11 added 2 commits December 15, 2025 09:07

backfill changelog official

5e0c779

for concurrency group, use more unique sha

8d8ffa1

cquil11 merged commit 9c0fcc9 into main Dec 15, 2025
11 checks passed

cquil11 deleted the diff-only-runs branch December 15, 2025 15:18

github-project-automation bot moved this from In Progress to Done in InferenceMAX Board Dec 15, 2025

cquil11 restored the diff-only-runs branch December 16, 2025 20:35

cquil11 mentioned this pull request Dec 17, 2025

feat: adds more configurations for GB200 SGLang DSR1 #335

Merged

cquil11 mentioned this pull request Jan 2, 2026

chore: remove legacy full sweep schedulers #381

Merged

functionstackx deleted the diff-only-runs branch January 11, 2026 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: performance changelog triggered runs (as opposed to nightly) #267

feat: performance changelog triggered runs (as opposed to nightly) #267

cquil11 commented Dec 2, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

functionstackx commented Dec 10, 2025

Uh oh!

functionstackx commented Dec 10, 2025

Uh oh!

cquil11 commented Dec 10, 2025 •

edited

Loading

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

chunfangamd left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: performance changelog triggered runs (as opposed to nightly) #267

feat: performance changelog triggered runs (as opposed to nightly) #267

Conversation

cquil11 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design Doc: Changelog Triggered Only Runs

Introduction and Motivation

Implementation

Example Developer Workflow

Workflow Mechanisms

Testing

Ensuing Frontend Changes

High-Level Considerations

Low-Level Considerations

Additional Considerations

Uh oh!

github-actions bot commented Dec 9, 2025

📊 Line Count Report

Uh oh!

functionstackx commented Dec 10, 2025

Uh oh!

functionstackx commented Dec 10, 2025

Uh oh!

cquil11 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chunfangamd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cquil11 commented Dec 2, 2025 •

edited

Loading

cquil11 commented Dec 10, 2025 •

edited

Loading