Skip to content

ci(bencher): enforce PR thresholds and grant checks: write#11883

Merged
zkochan merged 1 commit into
mainfrom
bencher-pr-thresholds
May 23, 2026
Merged

ci(bencher): enforce PR thresholds and grant checks: write#11883
zkochan merged 1 commit into
mainfrom
bencher-pr-thresholds

Conversation

@zkochan

@zkochan zkochan commented May 23, 2026

Copy link
Copy Markdown
Member

Summary

  • Add --start-point-clone-thresholds to the non-main upload arms in benchmark.yml, pacquet-integrated-benchmark.yml, and pacquet-integrated-benchmark-comment.yml, so PR / feature-branch records inherit thresholds configured on main in the Bencher UI. Pair it with --err so the workflow fails when a sample breaches the upper boundary — without this, a regression is recorded but the GitHub check stays green.
  • Add checks: write to all three workflows. On push: main (no --ci-number, not a pull_request event) Bencher falls back to creating a GitHub Check on the commit; without the permission the upload step exits 1 with Failed to create GitHub Check, which is what's currently happening on main.

Main-branch uploads still skip the threshold/--err flags on purpose: by the time main fails, the regression has already landed.

This branch was forked from main so its own benchmark runs against the threshold can be compared against the main baseline once the workflows run.

Test plan

  • Configure Percentage thresholds in Bencher UI for main/pnpm/Latency and main/pacquet/Latency (upper boundary 0.20, min samples 10, max samples 30).
  • After merge (or via workflow_dispatch from this branch): confirm the next push: main run completes without the Failed to create GitHub Check error and shows a Check on the commit.
  • Dispatch Benchmarks and Pacquet integrated benchmark against this branch (or open a follow-up PR with an intentional perf regression) and confirm Bencher reports the upper-boundary breach and the job exits non-zero.

Written by an agent (Claude Code, claude-opus-4-7).

Summary by CodeRabbit

  • Chores
    • Improved benchmark regression reporting: workflow now surfaces regressions as errors for better visibility during pull request reviews
    • Enhanced threshold management: pull requests automatically inherit benchmark thresholds from the main branch for consistent comparison
    • Upgraded GitHub integration: benchmark results now fully integrate with GitHub's check system for improved workflow visibility

Review Change Stack

- Add `--start-point-clone-thresholds` to the non-main upload arms
  so PR/feature branches inherit thresholds configured on main; pair
  it with `--err` so a sample over the upper boundary fails the job.
- Add `checks: write` to the three workflows that call `bencher run`.
  On main pushes (no `--ci-number`, not a PR event) Bencher falls back
  to creating a GitHub Check on the commit; without the permission it
  exits 1 with "Failed to create GitHub Check".
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented May 23, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 03fc6139-53e7-4dbb-90b5-774d01da01c4

📥 Commits

Reviewing files that changed from the base of the PR and between 4088de0 and 42bb020.

📒 Files selected for processing (3)
  • .github/workflows/benchmark.yml
  • .github/workflows/pacquet-integrated-benchmark-comment.yml
  • .github/workflows/pacquet-integrated-benchmark.yml
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run benchmark on ubuntu-latest
  • GitHub Check: Analyze (javascript)
  • GitHub Check: Compile & Lint
🔇 Additional comments (3)
.github/workflows/benchmark.yml (1)

31-31: LGTM!

Also applies to: 127-149

.github/workflows/pacquet-integrated-benchmark-comment.yml (1)

29-29: LGTM!

Also applies to: 158-172

.github/workflows/pacquet-integrated-benchmark.yml (1)

51-51: LGTM!

Also applies to: 338-353


📝 Walkthrough

Walkthrough

Three GitHub Actions benchmark workflows are enhanced with checks: write permission and extended Bencher CLI flags. The --start-point-clone-thresholds flag enables PR branches to inherit performance thresholds from main, while --err surfaces regressions as workflow errors. Two workflows are reformatted to use multi-line argument construction.

Changes

Benchmark workflow enhancements

Layer / File(s) Summary
Permission expansion for GitHub checks writing
.github/workflows/benchmark.yml, .github/workflows/pacquet-integrated-benchmark-comment.yml, .github/workflows/pacquet-integrated-benchmark.yml
All three workflows are granted checks: write permission alongside existing permissions, enabling Bencher to write GitHub Checks output for regression reporting.
Bencher regression detection and threshold inheritance
.github/workflows/benchmark.yml, .github/workflows/pacquet-integrated-benchmark-comment.yml, .github/workflows/pacquet-integrated-benchmark.yml
All three bencher run invocations are extended with --start-point-clone-thresholds (to inherit configured thresholds from main for PR branches) and --err (to surface regressions as errors). Two workflows are reformatted into multi-line argument arrays for clarity.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • pnpm/pnpm#11875: Modifies the same Bencher-upload sections in benchmark workflows and bencher command argument construction for branch and start-point handling.

Poem

🐰 Three workflows, one mission so clear,
Benchmarks now check with fresh-found cheer!
Thresholds inherited, regressions exposed,
Main's wisdom shared when PR branches proposed!
checks: write grants the power to report,
A rabbit's delight—performance support! 🏃‍♂️✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically summarizes the main changes: adding threshold enforcement via --start-point-clone-thresholds and --err, and granting checks: write permission.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bencher-pr-thresholds

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

Integrated-Benchmark Report (Linux)

Scenario: Isolated linker: fresh restore, cold cache + cold store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 2.350 ± 0.103 2.238 2.502 1.01 ± 0.05
pacquet@main 2.322 ± 0.065 2.230 2.413 1.00
pnpm 4.537 ± 0.075 4.437 4.655 1.95 ± 0.06
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 2.3502830936600008,
      "stddev": 0.10279984077340087,
      "median": 2.3222429619600002,
      "user": 2.81672868,
      "system": 3.46245154,
      "min": 2.2375798329600003,
      "max": 2.5023931049600003,
      "times": [
        2.35498251396,
        2.5023931049600003,
        2.2895034099600005,
        2.42350979896,
        2.4890570699600003,
        2.2720729099600003,
        2.4263596299600003,
        2.26724772896,
        2.24012493696,
        2.2375798329600003
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 2.32187790036,
      "stddev": 0.06474385030324165,
      "median": 2.33550308646,
      "user": 2.73499808,
      "system": 3.4453162400000004,
      "min": 2.22973320696,
      "max": 2.41264889396,
      "times": [
        2.41264889396,
        2.23445012696,
        2.33499297096,
        2.3410116329600004,
        2.3991390689600003,
        2.2745589619600004,
        2.33601320196,
        2.22973320696,
        2.3746351739600002,
        2.28159576496
      ]
    },
    {
      "command": "pnpm",
      "mean": 4.537254470360001,
      "stddev": 0.07459892171558272,
      "median": 4.54326139346,
      "user": 7.65196818,
      "system": 4.002413839999999,
      "min": 4.4372579089599995,
      "max": 4.65468278096,
      "times": [
        4.65468278096,
        4.59426766196,
        4.51839921496,
        4.5984458759599995,
        4.4372579089599995,
        4.50407990096,
        4.568123571959999,
        4.44252524696,
        4.58952553596,
        4.46523700496
      ]
    }
  ]
}

Scenario: Isolated linker: fresh restore, hot cache + hot store

Command Mean [ms] Min [ms] Max [ms] Relative
pacquet@HEAD 670.8 ± 47.2 637.6 802.5 1.01 ± 0.08
pacquet@main 664.8 ± 20.4 639.0 697.4 1.00
pnpm 2435.3 ± 148.1 2270.4 2753.7 3.66 ± 0.25
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 0.6708084764400001,
      "stddev": 0.047240236745938545,
      "median": 0.66072281024,
      "user": 0.37954557999999994,
      "system": 1.43517426,
      "min": 0.63758168024,
      "max": 0.80246104524,
      "times": [
        0.80246104524,
        0.65856663124,
        0.66187004524,
        0.66215016824,
        0.64658823624,
        0.66977004224,
        0.65957557524,
        0.66233284324,
        0.6471884972399999,
        0.63758168024
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 0.6648497021399999,
      "stddev": 0.02043169834858673,
      "median": 0.65836854374,
      "user": 0.37647668,
      "system": 1.4398314600000002,
      "min": 0.6389551072399999,
      "max": 0.69737859524,
      "times": [
        0.68593009624,
        0.6389551072399999,
        0.69737859524,
        0.6468575022399999,
        0.65901990924,
        0.65771717824,
        0.69342408324,
        0.66596846724,
        0.65360194024,
        0.64964414224
      ]
    },
    {
      "command": "pnpm",
      "mean": 2.43530926094,
      "stddev": 0.1481373263485777,
      "median": 2.41473290724,
      "user": 2.9190334799999995,
      "system": 2.19278166,
      "min": 2.2703722862399998,
      "max": 2.75374672424,
      "times": [
        2.3187324022399998,
        2.2703722862399998,
        2.28225344724,
        2.75374672424,
        2.4605422622399997,
        2.35298852124,
        2.5019534962399996,
        2.36892355224,
        2.50363245024,
        2.5399474672399998
      ]
    }
  ]
}

Scenario: Isolated linker: fresh install, cold cache + cold store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 4.981 ± 0.121 4.811 5.231 1.00
pacquet@main 5.052 ± 0.205 4.752 5.389 1.01 ± 0.05
pnpm 6.352 ± 0.152 6.042 6.617 1.28 ± 0.04
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 4.9810691868,
      "stddev": 0.121280682407395,
      "median": 4.9621373679000005,
      "user": 6.77160982,
      "system": 3.50315648,
      "min": 4.8108650464,
      "max": 5.2308265784,
      "times": [
        4.9211465104,
        4.9691413284,
        4.8108650464,
        4.9712631984,
        4.9250876744,
        4.909152966400001,
        4.9610340454,
        5.148933829400001,
        5.2308265784,
        4.9632406904
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 5.051945107400001,
      "stddev": 0.20454059186307916,
      "median": 5.0785759509,
      "user": 6.808686019999999,
      "system": 3.5192217799999996,
      "min": 4.7519255404,
      "max": 5.3894233194000005,
      "times": [
        5.3894233194000005,
        5.1070102154,
        5.0036677544,
        5.068471183400001,
        5.3129088354,
        5.0886807184,
        4.8634214264,
        5.1211762354,
        4.7519255404,
        4.8127658454
      ]
    },
    {
      "command": "pnpm",
      "mean": 6.352407902400001,
      "stddev": 0.15205407973003585,
      "median": 6.362504315900001,
      "user": 10.49427922,
      "system": 4.390154379999999,
      "min": 6.0421095224000005,
      "max": 6.6169474554,
      "times": [
        6.3489738944,
        6.206308331400001,
        6.0421095224000005,
        6.3760347374,
        6.346234394400001,
        6.4068275374,
        6.3973043044,
        6.3148436644000006,
        6.4684951824,
        6.6169474554
      ]
    }
  ]
}

Scenario: Isolated linker: fresh install, hot cache + hot store

Command Mean [s] Min [s] Max [s] Relative
pacquet@HEAD 4.040 ± 0.110 3.842 4.217 1.00
pacquet@main 4.121 ± 0.118 3.963 4.380 1.02 ± 0.04
pnpm 4.233 ± 0.107 4.154 4.513 1.05 ± 0.04
BENCHMARK_REPORT.json
{
  "results": [
    {
      "command": "pacquet@HEAD",
      "mean": 4.040380751980001,
      "stddev": 0.10956843588089388,
      "median": 4.03451855718,
      "user": 4.388032920000001,
      "system": 2.2139571399999998,
      "min": 3.84191133968,
      "max": 4.21692350968,
      "times": [
        4.028736975679999,
        4.21692350968,
        4.05177583068,
        3.9509969266800002,
        4.02812827968,
        4.15463901568,
        3.95580159568,
        3.84191133968,
        4.04030013868,
        4.134593907679999
      ]
    },
    {
      "command": "pacquet@main",
      "mean": 4.12069823848,
      "stddev": 0.11819397151077456,
      "median": 4.1203162961799995,
      "user": 4.48911702,
      "system": 2.22456244,
      "min": 3.96288766968,
      "max": 4.37951931868,
      "times": [
        4.08028023068,
        4.37951931868,
        4.13095970068,
        3.96288766968,
        4.170711992679999,
        4.00918251568,
        4.12580481568,
        4.208768527679999,
        4.02403983668,
        4.114827776679999
      ]
    },
    {
      "command": "pnpm",
      "mean": 4.233272924979999,
      "stddev": 0.1065494973898752,
      "median": 4.20804441368,
      "user": 5.18660462,
      "system": 2.61768664,
      "min": 4.154485870679999,
      "max": 4.51251677268,
      "times": [
        4.27431497768,
        4.26310997568,
        4.20751348868,
        4.154485870679999,
        4.16655527868,
        4.51251677268,
        4.21559589768,
        4.208575338679999,
        4.15934210768,
        4.17071954168
      ]
    }
  ]
}

@github-actions

Copy link
Copy Markdown
Contributor

🐰 Bencher Report

Branchpr/11883
Testbedpacquet

⚠️ WARNING: No Threshold found!

Without a Threshold, no Alerts will ever be generated.

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds flag.

Click to view all benchmark results
BenchmarkLatencymilliseconds (ms)
isolated-linker.fresh-install.cold-cache.cold-store📈 view plot
⚠️ NO THRESHOLD
4,981.07 ms
isolated-linker.fresh-install.hot-cache.hot-store📈 view plot
⚠️ NO THRESHOLD
4,040.38 ms
isolated-linker.fresh-restore.cold-cache.cold-store📈 view plot
⚠️ NO THRESHOLD
2,350.28 ms
isolated-linker.fresh-restore.hot-cache.hot-store📈 view plot
⚠️ NO THRESHOLD
670.81 ms
🐰 View full continuous benchmarking report in Bencher

@zkochan zkochan merged commit bf581bb into main May 23, 2026
16 checks passed
@zkochan zkochan deleted the bencher-pr-thresholds branch May 23, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant