ci(supply-chain): use merge-base diff (A...B) so upstream drift doesn't false-positive#13411
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the supply-chain audit workflow’s diff computation to use merge-base (A...B) semantics so fork PR scans only evaluate changes introduced by the PR branch, avoiding false positives caused by upstream drift on main.
Changes:
- Switch
git diff "$BASE".."$HEAD"togit diff "$BASE"..."$HEAD"for the main diff used by pattern scanners. - Switch
.pthand install-hook filename detection to also use the merge-base diff range. - Preserve existing detection regexes and scan/fail behavior (critical-only).
Comments suppressed due to low confidence (2)
.github/workflows/supply-chain-audit.yml:60
git diff --name-onlywill list files that were deleted as well as added/modified. Currently this means a PR that removes a.pthfile would still trigger the “added or modified” finding. Consider either filtering the diff to additions/modifications only (e.g., via a diff-filter) or adjusting the finding text to explicitly include deletions.
PTH_FILES=$(git diff --name-only "$BASE"..."$HEAD" | grep '\.pth$' || true)
if [ -n "$PTH_FILES" ]; then
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: .pth file added or modified
Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required.
.github/workflows/supply-chain-audit.yml:103
- Same as the
.pthcheck:git diff --name-onlywill include deleted files. If a PR deletessetup.py/setup.cfg/sitecustomize.py/etc., this will still trigger the “added or modified” critical finding. Either filter out deletions or update the finding wording so it matches what’s being detected.
SETUP_HITS=$(git diff --name-only "$BASE"..."$HEAD" | grep -E '(^|/)(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
if [ -n "$SETUP_HITS" ]; then
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: Install-hook file added or modified
These files can execute code during package installation or interpreter startup.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thanks for the review @copilot-pull-request-reviewer — the main overview verdict is clean, which is the important signal. For the low-confidence / suppressed comment about Happy to file a follow-up if maintainers want stricter diff-filter semantics. |
66a0bb8 to
8de0c68
Compare
8de0c68 to
f49de84
Compare
|
Rebased onto current Fresh evidence this still bites: PR #24582 just got its The match is upstream's edit to |
…'t false-positive GitHub re-resolves `pull_request.base.sha` to the current upstream HEAD at workflow-trigger time, not at branch-off time. The supply-chain scanner's `git diff "$BASE".."$HEAD"` (two-dot) therefore includes every commit upstream merged after the PR contributor's last push. Concrete failure mode the scanner is currently hitting: if upstream modifies `hermes_cli/setup.py` (which happens often — ~20 commits in the last 14 days alone), every open PR re-scanned after that point will be flagged with `### 🚨 CRITICAL: Install-hook file added or modified`, even though the PR itself doesn't touch `setup.py`. The "Fail on critical findings" step then blocks the PR until the next scanner run. Switching to three-dot (`A...B`, which git rewrites to `git diff $(merge-base A B)..B`) restricts the diff to changes the PR branch actually introduced. The two file-list scans (`.pth` and install-hook) and the line-level DIFF all get the same fix, so B64_EXEC_HITS / PROC_HITS see consistent input. Evidence on a live PR (NousResearch#24582): 2-dot: hermes_cli/setup.py (false positive — touched by upstream) 3-dot: (empty) (correct — PR doesn't touch any install-hook) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f49de84 to
2a66dab
Compare
What does this PR do?
The supply-chain scan has been failing on every fork PR since
19db7fa3(Apr 19) — not because of real attack patterns, but because the diff computation includes all of main's upstream drift since the PR branch was forked.BASE..HEAD(two-dot) shows the full diff between the two trees in both directions. When main moves forward between when the PR was opened and when CI runs, files changed upstream appear in the scan as though the PR had touched them. Concrete measurement on one of my open fork PRs: two-dot returns 551 files; three-dot returns 2 — the PR's actual changes. The install-hook regex then matcheshermes_cli/setup.py(modified upstream) and the workflow fails.Why this only just surfaced: before
19db7fa3the fail condition wascritical == 'true'. WARNING-tier checks bumpedfound=truebut notcritical=true, so main-drift false positives surfaced only as a PR comment.19db7fa3simplified the fail tofound == 'true', unmasking the pre-existingBASE..HEADbug.The fix: three-character edit — change
"$BASE".."$HEAD"to"$BASE"..."$HEAD"in 3 places.A...B=diff(merge-base(A, B), B), which matches what GitHub's "Files changed" tab shows.Related Issue
Fixes #13411 / surfaces from the failing supply-chain scan on every open fork PR.
Type of Change
Changes Made
.github/workflows/supply-chain-audit.yml— changegit diff "$BASE".."$HEAD"to"$BASE"..."$HEAD"in theDIFF=,PTH_FILES=, andSETUP_HITS=lines. The other two scan checks (B64_EXEC_HITS,PROC_HITS) build on the already-corrected$DIFFvariable so they inherit the fix without their own edit. Detection regexes unchanged. Warning-vs-critical split unchanged. Fork-PR comment fallback unchanged. BASE/HEAD env derivation unchanged.How to Test
supply-chain-audit.ymlon any currently-red fork PR — scan now reports zero matches (e.g. fix(mcp): auto-reconnect + retry once when the transport session expires (#13383) #13406, fix(telegram): log document/video send failures instead of printing to stdout (#13356) #13387, test(interrupt-propagation): seed provider attr on bare agent to match __init__ #12908, fix(slack): key thread-context cache by team_id to prevent cross-workspace leak (#12421) #12889 all match nothing underA...B)."$BASE"..."$HEAD"on a branch with synthetic.pthaddition — matches as expected (no detection regression)..pthfile: fail → fail (correct)setup.py: fail → fail (correct)base64.b64decode() + exec(): fail → fail (correct)hermes_cli/setup.py: FAIL (false positive) → passChecklist
Code
fix(scope):,feat(scope):, etc.)git diffdocumented semanticsDocumentation & Housekeeping
docs/, docstrings) — N/Acli-config.yaml.exampleif I added/changed config keys — N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/A, behavior of the workflow narrows false-positives onlyubuntu-latestRelated / Positioning
git diff A..Bvsgit diff A...B— https://git-scm.com/docs/gitrevisions#_dotted_range_notationsA...Bdoes not skip attack-shaped commits introduced on the PR branch; it only excludes main's own commits, which are main's responsibility. Behaviour is identical toA..Bwhen the PR branch is up-to-date with main (merge-base equals base).