ci: advisory lockfile supply-chain audit (no install-script changes)#5604
Conversation
Adds a fast, focused workflow that scans every checked-in npm and
cargo lockfile on PRs touching one. Default behaviour is advisory:
only public indicator-of-compromise strings, versions on the public
known-malicious list, and structurally broken lockfiles fail the
build. Structural anomalies (missing integrity hashes, non-default
registry, etc.) surface as ::warning:: annotations without gating
merges, so reviewers see the audit result inline on every PR
without changing the existing install behaviour.
Also commits the two missing npm lockfiles the audit needs:
studio/package-lock.json (Tauri CLI holder for desktop release)
and studio/backend/core/data_recipe/oxc-validator/package-lock.json
(oxc-parser runtime for the data-recipe validator). studio/setup.sh,
studio/setup.ps1, build.sh, and pyproject.toml are intentionally
left alone so the existing install path keeps working unchanged.
Audit script behaviour:
default mode -> exits 1 only on blocked-known-malicious,
known-ioc-string, malformed-lockfile,
missing-lockfile, unreadable-lockfile, or
missing-toml-parser
--strict -> promotes every finding to blocking (opt-in)
Adds a try/except around lockfile reads so a permissions error
prints a finding instead of crashing CI with a raw traceback.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request enhances the lockfile supply chain audit script by introducing a distinction between blocking and advisory findings, adding a --strict mode, and improving error handling for missing or unreadable lockfiles. It also expands the default set of monitored lockfiles and adds new lockfiles for the studio components. Feedback was provided regarding the use of GitHub Actions workflow commands (::warning:: and ::error::), which do not support multi-line strings; the reviewer suggested escaping newlines as %0A to ensure full finding details are visible in the UI.
| print(file = sys.stderr) | ||
|
|
||
| if not blocking: | ||
| print( |
There was a problem hiding this comment.
GitHub Actions workflow commands (like ::warning::) do not support multi-line strings in the message field. Since the Finding class __str__ method returns a multi-line string, only the first line (the kind and path) will be displayed in the GitHub UI annotation, while the package and detail information will be lost. To fix this, you should escape the newlines as %0A when printing for GitHub Actions.
| print( | |
| print(f"::warning::{str(f).replace('\n', '%0A')}", file = sys.stderr) |
There was a problem hiding this comment.
Fixed in 4b35d2c. Added a _gha_escape() helper that applies the full workflow-commands escape set (% -> %25 first so the subsequent %0A / %0D escapes are not double-encoded, then \r -> %0D, then \n -> %0A), wrapped both the ::warning:: and ::error:: print sites with it, and pinned the behavior with two new regression tests (test_gha_escape_collapses_finding_to_one_line + test_advisory_finding_emitted_as_single_line_annotation). The full Finding (kind+path / package / detail) now renders inside the GH Actions annotation instead of being truncated at the first newline.
| "either a structural lockfile anomaly or a public indicator-of-" | ||
| "compromise. Investigate before running `npm ci` or `cargo fetch`.", | ||
| "[lockfile-audit] Refusing to proceed. Each blocking finding " | ||
| "above is either a public indicator-of-compromise, a known-" |
There was a problem hiding this comment.
Similar to the advisory findings above, the ::error:: command requires newlines to be escaped as %0A to ensure the full finding details (package and detail fields) are visible in the GitHub Actions UI annotations. Without escaping, the annotation will be truncated after the first line.
| "above is either a public indicator-of-compromise, a known-" | |
| print(f"::error::{str(f).replace('\n', '%0A')}", file = sys.stderr) |
There was a problem hiding this comment.
Same fix as the sibling thread (4b35d2c) covers this ::error:: site too -- both call sites now go through _gha_escape().
`scripts/lockfile_supply_chain_audit.py` now classifies `non-registry-cargo-source` as an advisory finding by default (returns exit 0 with a `::warning::` annotation) rather than unconditionally blocking with exit 1. Update the existing `test_malicious_cargo_lockfile_refused` to pass --strict so it keeps verifying the "refuse to install" behavior it is named for, and add a second test that pins the default-mode behavior: advisory finding emitted, exit code 0.
`::warning::` and `::error::` workflow commands truncate the annotation message at the first newline unless the message is %-encoded per the workflow-commands spec. Since `Finding.__str__` returns three lines (kind+path, package, detail), the package and detail fields were being dropped from the GitHub Actions UI. Add a `_gha_escape()` helper that applies the spec'd escapes (`%` -> `%25`, then `\r` -> `%0D`, then `\n` -> `%0A`; the `%` replacement must happen first so the subsequent escapes are not double-encoded), wrap every Finding rendered into a workflow command with it, and pin both the helper and the end-to-end single-line emission with two new regression tests. Caught by gemini-code-assist on PR #5604.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17e0c9dd4a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "malformed-lockfile", | ||
| "missing-lockfile", | ||
| "unreadable-lockfile", | ||
| "missing-toml-parser", | ||
| } |
There was a problem hiding this comment.
Treat unsupported lockfiles as blocking
When a package-lock.json is downgraded to lockfileVersion: 1, audit_npm_lockfile() records unsupported-lockfile-version and then cannot inspect the old-format dependencies tree for blocked package versions. Because this kind is absent from BLOCKING_KINDS, the new default CI mode exits 0 with only a warning, so a PR can replace a v2/v3 lockfile with an unaudited v1 lockfile and still pass the audit. Please include unsupported-lockfile-version in the blocking set.
Useful? React with 👍 / 👎.
|
Reviewed end-to-end and this looks good as a clean, regression-free addition.
Also pinned cross-OS in danielhanchen#137 so the audit script's parser stays green on macos-14 and windows-latest, not just ubuntu. |
|
Cross-OS validation finished in danielhanchen#137.
|
Each PR ran the same staged source files before, which went stale when the upstream PR commits advanced. Refactor to one job per PR with an actions/checkout of that PR's head ref, so cross-OS validation always uses the latest commit: - PR unslothai#5603 sandbox -> studio-sandbox-hardening - PR unslothai#5620 parser parity -> studio-tools-multi-format-v2 - PR unslothai#5696 mtp reload guards -> followup-mtp-reload-guards (unslothai#5582 followup) - PR unslothai#5695 lockfile audit -> followup-lockfile-audit-regressions (unslothai#5604 followup) 4 jobs x 3 OSes = 12 runs; Windows = 4 (below the 5-concurrent cap). cancel-in-progress per (workflow, ref) keeps iteration cheap. All tests stay CPU-only and rely on the CUDA spoof harness in tests/conftest.py + tests/_zoo_aggressive_cuda_spoof.py, so no real GPU is required on any runner.
…nslothai#5604) * ci: add advisory lockfile supply-chain audit Adds a fast, focused workflow that scans every checked-in npm and cargo lockfile on PRs touching one. Default behaviour is advisory: only public indicator-of-compromise strings, versions on the public known-malicious list, and structurally broken lockfiles fail the build. Structural anomalies (missing integrity hashes, non-default registry, etc.) surface as ::warning:: annotations without gating merges, so reviewers see the audit result inline on every PR without changing the existing install behaviour. Also commits the two missing npm lockfiles the audit needs: studio/package-lock.json (Tauri CLI holder for desktop release) and studio/backend/core/data_recipe/oxc-validator/package-lock.json (oxc-parser runtime for the data-recipe validator). studio/setup.sh, studio/setup.ps1, build.sh, and pyproject.toml are intentionally left alone so the existing install path keeps working unchanged. Audit script behaviour: default mode -> exits 1 only on blocked-known-malicious, known-ioc-string, malformed-lockfile, missing-lockfile, unreadable-lockfile, or missing-toml-parser --strict -> promotes every finding to blocking (opt-in) Adds a try/except around lockfile reads so a permissions error prints a finding instead of crashing CI with a raw traceback. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test(security): update cargo regression test for advisory mode `scripts/lockfile_supply_chain_audit.py` now classifies `non-registry-cargo-source` as an advisory finding by default (returns exit 0 with a `::warning::` annotation) rather than unconditionally blocking with exit 1. Update the existing `test_malicious_cargo_lockfile_refused` to pass --strict so it keeps verifying the "refuse to install" behavior it is named for, and add a second test that pins the default-mode behavior: advisory finding emitted, exit code 0. * audit: escape Finding for GH Actions annotations `::warning::` and `::error::` workflow commands truncate the annotation message at the first newline unless the message is %-encoded per the workflow-commands spec. Since `Finding.__str__` returns three lines (kind+path, package, detail), the package and detail fields were being dropped from the GitHub Actions UI. Add a `_gha_escape()` helper that applies the spec'd escapes (`%` -> `%25`, then `\r` -> `%0D`, then `\n` -> `%0A`; the `%` replacement must happen first so the subsequent escapes are not double-encoded), wrap every Finding rendered into a workflow command with it, and pin both the helper and the end-to-end single-line emission with two new regression tests. Caught by gemini-code-assist on PR unslothai#5604. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Summary
Adds a fast, focused supply-chain audit of every checked-in npm and cargo lockfile, plus a new GitHub Actions workflow that runs it on every PR touching one. Default mode is advisory: only public indicator-of-compromise strings, versions on the public known-malicious list, and structurally broken lockfiles fail the build. Structural anomalies (missing integrity hashes, non-default registry, etc.) surface as
::warning::annotations without gating merges.This is the minimum-blast-radius alternative to PR #5479. It keeps
studio/setup.sh,studio/setup.ps1,build.sh, andpyproject.tomluntouched so the existing install path keeps working unchanged. The audit only READS lockfiles; it never executes anything in them and never calls the network.What is in this PR
scripts/lockfile_supply_chain_audit.py(843 lines)package-lock.jsonaudit: registry origin, integrity presence, known IOC strings, known-malicious version pins (BLOCKED_NPM_VERSIONS).Cargo.lockaudit: registry source, checksum presence, known cargo IOC strings.--strictflag promotes every finding (e.g. missing integrity) to blocking; default mode only blocks on the high-signal kinds.unreadable-lockfilefinding kind so a permissions error surfaces cleanly instead of crashing CI with a raw traceback.UNSLOTH_LOCKFILE_AUDIT_SKIPhonours a justification value (>=5 chars, not1/true/yes); a trivially-set value is rejected with a loud warning and the audit runs normally..github/workflows/lockfile-audit.yml(~30 second job)concurrency.cancel-in-progressso stacked pushes do not pile up runners.studio/package.json+studio/package-lock.json@tauri-apps/cli@2.10.1for the desktop release path; lockfile holder so npm can resolve a pinned Tauri CLI without changing the install scripts.studio/backend/core/data_recipe/oxc-validator/package-lock.jsonpackage.json(oxc-parser@^0.123.0,oxlint@^1.51.0), so the audit covers the data-recipe validator's runtime..gitignore!studio/...package-lock.json) for the three Studio surfaces while keeping the global ignore for stray lockfiles.What is intentionally NOT in this PR
studio/setup.sh,studio/setup.ps1, orbuild.sh.pyproject.toml.security-audit.ymlalready invokes this script and will see the new defaults transparently.Test plan
python3 scripts/lockfile_supply_chain_audit.py-> exit 0, scans 3 npm + 1 cargo lockfile.tanstack_runner.js) -> rc=1 (blocking).@tanstack/history@1.161.9) -> rc=1 (blocking).::warning::); rc=1 under--strict.Lockfile supply-chain auditjob is green on this PR.Why this design
PR #5479 made
--frozen-lockfilemandatory and changed install scripts on three OSes; that risks breaking real installs if any of the three lockfiles drifts. This PR commits the lockfiles and the auditor without changing how anything is installed. The audit is advisory by default so it adds zero gating risk to merges, and the only failure conditions are public IOCs or versions someone has already publicly confirmed are malicious.