ci: advisory lockfile supply-chain audit (no install-script changes) by danielhanchen · Pull Request #5604 · unslothai/unsloth

danielhanchen · 2026-05-19T08:13:23Z

Summary

Adds a fast, focused supply-chain audit of every checked-in npm and cargo lockfile, plus a new GitHub Actions workflow that runs it on every PR touching one. Default mode is advisory: only public indicator-of-compromise strings, versions on the public known-malicious list, and structurally broken lockfiles fail the build. Structural anomalies (missing integrity hashes, non-default registry, etc.) surface as ::warning:: annotations without gating merges.

This is the minimum-blast-radius alternative to PR #5479. It keeps studio/setup.sh, studio/setup.ps1, build.sh, and pyproject.toml untouched so the existing install path keeps working unchanged. The audit only READS lockfiles; it never executes anything in them and never calls the network.

What is in this PR

scripts/lockfile_supply_chain_audit.py (843 lines)
- npm package-lock.json audit: registry origin, integrity presence, known IOC strings, known-malicious version pins (BLOCKED_NPM_VERSIONS).
- cargo Cargo.lock audit: registry source, checksum presence, known cargo IOC strings.
- --strict flag promotes every finding (e.g. missing integrity) to blocking; default mode only blocks on the high-signal kinds.
- unreadable-lockfile finding kind so a permissions error surfaces cleanly instead of crashing CI with a raw traceback.
- UNSLOTH_LOCKFILE_AUDIT_SKIP honours a justification value (>=5 chars, not 1/true/yes); a trivially-set value is rejected with a loud warning and the audit runs normally.
.github/workflows/lockfile-audit.yml (~30 second job)
- Triggers on PR / push paths touching any of the four lockfiles, the audit script, or this workflow.
- Daily schedule at 05:37 UTC so newly-published IOCs surface without waiting for a PR.
- concurrency.cancel-in-progress so stacked pushes do not pile up runners.
studio/package.json + studio/package-lock.json
- 10-line manifest pinning @tauri-apps/cli@2.10.1 for the desktop release path; lockfile holder so npm can resolve a pinned Tauri CLI without changing the install scripts.
studio/backend/core/data_recipe/oxc-validator/package-lock.json
- Matches the existing package.json (oxc-parser@^0.123.0, oxlint@^1.51.0), so the audit covers the data-recipe validator's runtime.
.gitignore
- Per-path opt-in (!studio/...package-lock.json) for the three Studio surfaces while keeping the global ignore for stray lockfiles.

What is intentionally NOT in this PR

No changes to studio/setup.sh, studio/setup.ps1, or build.sh.
No bun lockfiles (existing bun-based install behaviour is unchanged; this PR adds zero new constraints to the bun side).
No changes to pyproject.toml.
No changes to existing workflows; security-audit.yml already invokes this script and will see the new defaults transparently.

Test plan

python3 scripts/lockfile_supply_chain_audit.py -> exit 0, scans 3 npm + 1 cargo lockfile.
Negative: missing lockfile -> rc=1 (blocking).
Negative: unreadable lockfile (chmod 000) -> rc=1 (blocking), no traceback.
Negative: lockfile containing a known IOC string (tanstack_runner.js) -> rc=1 (blocking).
Negative: lockfile pinning a known-malicious version (@tanstack/history@1.161.9) -> rc=1 (blocking).
Advisory: lockfile with a missing-integrity entry -> rc=0 in default mode (printed as ::warning::); rc=1 under --strict.
YAML parse of the new workflow file.
CI Lockfile supply-chain audit job is green on this PR.

Why this design

PR #5479 made --frozen-lockfile mandatory and changed install scripts on three OSes; that risks breaking real installs if any of the three lockfiles drifts. This PR commits the lockfiles and the auditor without changing how anything is installed. The audit is advisory by default so it adds zero gating risk to merges, and the only failure conditions are public IOCs or versions someone has already publicly confirmed are malicious.

Adds a fast, focused workflow that scans every checked-in npm and cargo lockfile on PRs touching one. Default behaviour is advisory: only public indicator-of-compromise strings, versions on the public known-malicious list, and structurally broken lockfiles fail the build. Structural anomalies (missing integrity hashes, non-default registry, etc.) surface as ::warning:: annotations without gating merges, so reviewers see the audit result inline on every PR without changing the existing install behaviour. Also commits the two missing npm lockfiles the audit needs: studio/package-lock.json (Tauri CLI holder for desktop release) and studio/backend/core/data_recipe/oxc-validator/package-lock.json (oxc-parser runtime for the data-recipe validator). studio/setup.sh, studio/setup.ps1, build.sh, and pyproject.toml are intentionally left alone so the existing install path keeps working unchanged. Audit script behaviour: default mode -> exits 1 only on blocked-known-malicious, known-ioc-string, malformed-lockfile, missing-lockfile, unreadable-lockfile, or missing-toml-parser --strict -> promotes every finding to blocking (opt-in) Adds a try/except around lockfile reads so a permissions error prints a finding instead of crashing CI with a raw traceback.

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request enhances the lockfile supply chain audit script by introducing a distinction between blocking and advisory findings, adding a --strict mode, and improving error handling for missing or unreadable lockfiles. It also expands the default set of monitored lockfiles and adds new lockfiles for the studio components. Feedback was provided regarding the use of GitHub Actions workflow commands (::warning:: and ::error::), which do not support multi-line strings; the reviewer suggested escaping newlines as %0A to ensure full finding details are visible in the UI.

gemini-code-assist · 2026-05-19T08:16:50Z

+            print(file = sys.stderr)
+
+    if not blocking:
+        print(


GitHub Actions workflow commands (like ::warning::) do not support multi-line strings in the message field. Since the Finding class __str__ method returns a multi-line string, only the first line (the kind and path) will be displayed in the GitHub UI annotation, while the package and detail information will be lost. To fix this, you should escape the newlines as %0A when printing for GitHub Actions.

Suggested change

print(

print(f"::warning::{str(f).replace('\n', '%0A')}", file = sys.stderr)

Fixed in 4b35d2c. Added a _gha_escape() helper that applies the full workflow-commands escape set (% -> %25 first so the subsequent %0A / %0D escapes are not double-encoded, then \r -> %0D, then \n -> %0A), wrapped both the ::warning:: and ::error:: print sites with it, and pinned the behavior with two new regression tests (test_gha_escape_collapses_finding_to_one_line + test_advisory_finding_emitted_as_single_line_annotation). The full Finding (kind+path / package / detail) now renders inside the GH Actions annotation instead of being truncated at the first newline.

gemini-code-assist · 2026-05-19T08:16:51Z

-        "either a structural lockfile anomaly or a public indicator-of-"
-        "compromise. Investigate before running `npm ci` or `cargo fetch`.",
+        "[lockfile-audit] Refusing to proceed. Each blocking finding "
+        "above is either a public indicator-of-compromise, a known-"


Similar to the advisory findings above, the ::error:: command requires newlines to be escaped as %0A to ensure the full finding details (package and detail fields) are visible in the GitHub Actions UI annotations. Without escaping, the annotation will be truncated after the first line.

Suggested change

"above is either a public indicator-of-compromise, a known-"

print(f"::error::{str(f).replace('\n', '%0A')}", file = sys.stderr)

Same fix as the sibling thread (4b35d2c) covers this ::error:: site too -- both call sites now go through _gha_escape().

`scripts/lockfile_supply_chain_audit.py` now classifies `non-registry-cargo-source` as an advisory finding by default (returns exit 0 with a `::warning::` annotation) rather than unconditionally blocking with exit 1. Update the existing `test_malicious_cargo_lockfile_refused` to pass --strict so it keeps verifying the "refuse to install" behavior it is named for, and add a second test that pins the default-mode behavior: advisory finding emitted, exit code 0.

`::warning::` and `::error::` workflow commands truncate the annotation message at the first newline unless the message is %-encoded per the workflow-commands spec. Since `Finding.__str__` returns three lines (kind+path, package, detail), the package and detail fields were being dropped from the GitHub Actions UI. Add a `_gha_escape()` helper that applies the spec'd escapes (`%` -> `%25`, then `\r` -> `%0D`, then `\n` -> `%0A`; the `%` replacement must happen first so the subsequent escapes are not double-encoded), wrap every Finding rendered into a workflow command with it, and pin both the helper and the end-to-end single-line emission with two new regression tests. Caught by gemini-code-assist on PR #5604.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17e0c9dd4a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T12:58:44Z

+        "malformed-lockfile",
+        "missing-lockfile",
+        "unreadable-lockfile",
+        "missing-toml-parser",
+    }


Treat unsupported lockfiles as blocking

When a package-lock.json is downgraded to lockfileVersion: 1, audit_npm_lockfile() records unsupported-lockfile-version and then cannot inspect the old-format dependencies tree for blocked package versions. Because this kind is absent from BLOCKING_KINDS, the new default CI mode exits 0 with only a warning, so a PR can replace a v2/v3 lockfile with an unaudited v1 lockfile and still pass the audit. Please include unsupported-lockfile-version in the blocking set.

Useful? React with 👍 / 👎.

danielhanchen · 2026-05-21T13:02:38Z

Reviewed end-to-end and this looks good as a clean, regression-free addition.

Diff is exactly the 7 declared files (CI workflow, .gitignore allowlist, scripts/, tests/, three lockfiles, the tiny Tauri-CLI lockfile-holder studio/package.json). No install.sh / install.ps1 / Studio runtime touched.
tests/security/test_lockfile_supply_chain_audit.py: 12 passed in 0.63s.
python3 scripts/lockfile_supply_chain_audit.py on this branch: exit 0, "0 findings across 3 npm + 1 cargo lockfile(s)".
Workflow runs with contents: read, persist-credentials: false, 5-minute timeout, no --strict flag — advisory mode is preserved so merges cannot be accidentally blocked by structural anomalies.
_gha_escape() (scripts/lockfile_supply_chain_audit.py:392-401) addresses the gemini-code-assist comment about CRLF / % rendering in ::warning:: annotations.

Also pinned cross-OS in danielhanchen#137 so the audit script's parser stays green on macos-14 and windows-latest, not just ubuntu.

danielhanchen · 2026-05-21T13:09:44Z

Cross-OS validation finished in danielhanchen#137.

tests/security/test_lockfile_supply_chain_audit.py (plus running the audit script in default advisory mode):

Runner	Result
`ubuntu-latest`	12 passed in 0.59s, `scripts/lockfile_supply_chain_audit.py` exit 0
`macos-14`	12 passed in 0.52s, audit exit 0
`windows-latest`	12 passed in 0.83s, audit exit 0

Each PR ran the same staged source files before, which went stale when the upstream PR commits advanced. Refactor to one job per PR with an actions/checkout of that PR's head ref, so cross-OS validation always uses the latest commit: - PR unslothai#5603 sandbox -> studio-sandbox-hardening - PR unslothai#5620 parser parity -> studio-tools-multi-format-v2 - PR unslothai#5696 mtp reload guards -> followup-mtp-reload-guards (unslothai#5582 followup) - PR unslothai#5695 lockfile audit -> followup-lockfile-audit-regressions (unslothai#5604 followup) 4 jobs x 3 OSes = 12 runs; Windows = 4 (below the 5-concurrent cap). cancel-in-progress per (workflow, ref) keeps iteration cheap. All tests stay CPU-only and rely on the CUDA spoof harness in tests/conftest.py + tests/_zoo_aggressive_cuda_spoof.py, so no real GPU is required on any runner.

…nslothai#5604) * ci: add advisory lockfile supply-chain audit Adds a fast, focused workflow that scans every checked-in npm and cargo lockfile on PRs touching one. Default behaviour is advisory: only public indicator-of-compromise strings, versions on the public known-malicious list, and structurally broken lockfiles fail the build. Structural anomalies (missing integrity hashes, non-default registry, etc.) surface as ::warning:: annotations without gating merges, so reviewers see the audit result inline on every PR without changing the existing install behaviour. Also commits the two missing npm lockfiles the audit needs: studio/package-lock.json (Tauri CLI holder for desktop release) and studio/backend/core/data_recipe/oxc-validator/package-lock.json (oxc-parser runtime for the data-recipe validator). studio/setup.sh, studio/setup.ps1, build.sh, and pyproject.toml are intentionally left alone so the existing install path keeps working unchanged. Audit script behaviour: default mode -> exits 1 only on blocked-known-malicious, known-ioc-string, malformed-lockfile, missing-lockfile, unreadable-lockfile, or missing-toml-parser --strict -> promotes every finding to blocking (opt-in) Adds a try/except around lockfile reads so a permissions error prints a finding instead of crashing CI with a raw traceback. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test(security): update cargo regression test for advisory mode `scripts/lockfile_supply_chain_audit.py` now classifies `non-registry-cargo-source` as an advisory finding by default (returns exit 0 with a `::warning::` annotation) rather than unconditionally blocking with exit 1. Update the existing `test_malicious_cargo_lockfile_refused` to pass --strict so it keeps verifying the "refuse to install" behavior it is named for, and add a second test that pins the default-mode behavior: advisory finding emitted, exit code 0. * audit: escape Finding for GH Actions annotations `::warning::` and `::error::` workflow commands truncate the annotation message at the first newline unless the message is %-encoded per the workflow-commands spec. Since `Finding.__str__` returns three lines (kind+path, package, detail), the package and detail fields were being dropped from the GitHub Actions UI. Add a `_gha_escape()` helper that applies the spec'd escapes (`%` -> `%25`, then `\r` -> `%0D`, then `\n` -> `%0A`; the `%` replacement must happen first so the subsequent escapes are not double-encoded), wrap every Finding rendered into a workflow command with it, and pin both the helper and the end-to-end single-line emission with two new regression tests. Caught by gemini-code-assist on PR unslothai#5604. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

danielhanchen and others added 2 commits May 19, 2026 08:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

45c19b1

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

danielhanchen and others added 3 commits May 19, 2026 10:34

[pre-commit.ci] auto fixes from pre-commit.com hooks

17e0c9d

for more information, see https://pre-commit.ci

danielhanchen marked this pull request as ready for review May 19, 2026 12:56

danielhanchen requested a review from rolandtannous as a code owner May 19, 2026 12:56

danielhanchen merged commit dd0b557 into main May 19, 2026
30 of 33 checks passed

danielhanchen deleted the ci/lockfile-audit-advisory branch May 19, 2026 12:56

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

This was referenced May 19, 2026

ci: enforce npm ci on Studio install paths (follow-up to #5604) #5616

Draft

validate-may21-prs: cross-OS CI for May 18-21 PR cohort danielhanchen/unsloth-staging-2#137

Open

danielhanchen mentioned this pull request May 22, 2026

studio: harden lockfile audit followups for #5604 #5695

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: advisory lockfile supply-chain audit (no install-script changes)#5604

ci: advisory lockfile supply-chain audit (no install-script changes)#5604
danielhanchen merged 5 commits into
mainfrom
ci/lockfile-audit-advisory

danielhanchen commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

danielhanchen May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

danielhanchen May 19, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Uh oh!

danielhanchen commented May 21, 2026

Uh oh!

danielhanchen commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	print(
	print(f"::warning::{str(f).replace('\n', '%0A')}", file = sys.stderr)

	"above is either a public indicator-of-compromise, a known-"
	print(f"::error::{str(f).replace('\n', '%0A')}", file = sys.stderr)

Uh oh!

Conversation

danielhanchen commented May 19, 2026

Summary

What is in this PR

What is intentionally NOT in this PR

Test plan

Why this design

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 21, 2026

Uh oh!

danielhanchen commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant