Skip to content

fix(ci): supply-chain scanner uses asymmetric two-dot diff, false-positives on stale-base PRs#23592

Closed
li0near wants to merge 1 commit into
NousResearch:mainfrom
li0near:fix/supply-chain-scanner-uses-asymmetric-diff
Closed

fix(ci): supply-chain scanner uses asymmetric two-dot diff, false-positives on stale-base PRs#23592
li0near wants to merge 1 commit into
NousResearch:mainfrom
li0near:fix/supply-chain-scanner-uses-asymmetric-diff

Conversation

@li0near

@li0near li0near commented May 11, 2026

Copy link
Copy Markdown
Contributor

Summary

The supply-chain audit workflow scans diffs with git diff "$BASE".."$HEAD"
(asymmetric two-dot) in three places. When a PR's base SHA is older than
upstream main at CI run time — which is the common case, since GitHub
re-resolves pull_request.base.sha to current upstream HEAD on every
trigger — the asymmetric diff includes every commit upstream landed
between the contributor's last push and CI. The install-hook regex then
matches files like hermes_cli/setup.py that upstream added, not the PR,
and the scan exits 1 as a CRITICAL finding.

Evidence

PR #21929 (a 2-file change to hermes_constants.py + its tests) was
rebased onto upstream d62808c373. By the time CI ran, upstream had
advanced 48 commits to a63a2b7c78. The webhook recorded BASE = a63a2b7c,
HEAD = 0f816e8a (the rebased branch tip).

Reproduced locally against those exact SHAs:

=== TWO-DOT (current) ===
files matched:    71
install-hook hits: 1
hermes_cli/setup.py        ← added by upstream, not by PR #21929

=== THREE-DOT (this PR) ===
files matched:    2
install-hook hits: 0

=== Files that PR #21929 actually changed (ground truth) ===
hermes_constants.py
tests/test_hermes_constants.py

The same shape applies to PR #21747 and #21428 (both currently showing the
false-positive on their CI). Any external contributor whose PR sits open
long enough for upstream to advance will hit it.

Root cause

git diff A..B (two dots) shows changes between the trees at A and B —
including any commits A has that B doesn't. git diff A...B (three dots)
shows changes from the merge-base of A and B to B — i.e. only what the
PR's branch path actually introduced. For a security scanner whose stated
job is to flag what the contributor added, three-dot is the correct
semantic. It's also what the GitHub PR UI uses for its diff view.

Fix

Single character on each of the three git diff invocations: ......

-DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
+DIFF=$(git diff "$BASE"..."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
...
-PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
+PTH_FILES=$(git diff --name-only "$BASE"..."$HEAD" | grep '\.pth$' || true)
...
-SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '...' || true)
+SETUP_HITS=$(git diff --name-only "$BASE"..."$HEAD" | grep -E '...' || true)

Added an inline comment explaining the choice so the next reader doesn't
"simplify" it back.

Why three-dot is strictly safer for this scanner

  • Adding a .pth / setup.py / base64+exec payload in this PR
    still on the branch path from merge-base to HEAD. Still caught.
  • Force-pushing malicious code post-open — head SHA changes, CI
    re-runs, merge-base recomputed, new commits caught.
  • Merging upstream into the branch — merge commits are on the path
    from merge-base to HEAD. Still caught.

There is no scenario where two-dot catches something three-dot misses
under this threat model. Two-dot's only "extra" coverage is upstream
commits the PR doesn't include — the noise this PR removes.

Workflow consistency

The workflow's own header comment notes that low-signal heuristics were
deliberately removed because "they fired on nearly every PR and trained
reviewers to ignore the scanner."
The current two-dot bug actively works
against that goal — every external contributor whose PR isn't merged
within minutes hits it.

Test plan

Workflow shell logic isn't easily unit-testable in isolation. Manual
reproduction:

  1. Pick any open PR whose base SHA is behind upstream HEAD by ≥1 commit
    that touches a setup.py / .pth / install-hook file.
  2. Run both diff forms locally:
    git diff BASE..HEAD --name-only | grep -E '...(setup\.py|...)$'
    git diff BASE...HEAD --name-only | grep -E '...(setup\.py|...)$'
    
  3. Two-dot matches upstream's setup.py; three-dot doesn't.

Verified for PR #21929 SHAs (head 0f816e8a, base re-resolved to
a63a2b7c at CI trigger time on 2026-05-10).

Platforms tested

N/A — workflow runs on ubuntu-latest only.

Related

Fixes the false-positive currently red on PRs #21428, #21747, #21929 (and
likely most other open external PRs with stale bases).

The content scan and the two file-name regex checks all use
`git diff "$BASE".."$HEAD"` (two dots, asymmetric). When a PR's
base SHA falls behind upstream main between push and CI run,
GitHub re-resolves `pull_request.base.sha` to current upstream
HEAD — the asymmetric diff then includes every commit upstream
landed in between, and the install-hook regex matches files like
`hermes_cli/setup.py` that upstream added, not the PR.

Switch all three uses to three-dot (merge-base..HEAD) so we only
see what THIS PR contributed. This is the standard "PR-only diff"
form and what GitHub itself uses for the diff view in the PR UI.

Verified against PR NousResearch#21929 (head 0f816e8, base re-resolved to
a63a2b7 at CI trigger time):
  two-dot:   71 files diffed, 1 install-hook match (false positive)
  three-dot:  2 files diffed, 0 install-hook matches (correct)

Threat-model coverage is preserved:
- adding a payload in this PR: still on path from merge-base to
  HEAD, still caught
- force-pushing malicious code: head SHA changes, CI re-runs,
  merge-base recomputed, caught
- merging upstream into the branch: merge commits on the branch
  path, caught
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists duplicate This issue or pull request already exists labels May 11, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #13411 — same fix (two-dot → three-dot diff in supply-chain-audit.yml to prevent false positives from upstream drift).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duplicate This issue or pull request already exists P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants