fix(security): drop POST allow rule from huggingface preset (#1432)#1663
Conversation
The huggingface preset allowed both GET and POST on `huggingface.co/**`. The POST rule let any sandboxed agent that happened to find an HF token in the environment publish models, datasets, and create repositories on the Hub via /api/repos/create and friends — far beyond what a download-only `from_pretrained` workflow needs. Inference traffic for the HF Inference Provider router flows through `router.huggingface.co` (declared separately in the same preset), not `huggingface.co`, so removing the POST rule from huggingface.co doesn't break the inference path. cdn-lfs.huggingface.co stays GET-only as before. Adds two regression tests in test/validate-blueprint.test.ts: - huggingface.co MUST NOT have a POST allow rule (catches a re-add) - huggingface.co MUST retain the GET rule (catches an over-correction that would break the download path the preset exists to enable) Refs: NVIDIA#1432
📝 WalkthroughWalkthroughRemoved POST method access from the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/validate-blueprint.test.ts (1)
234-243: Make the GET regression test non-vacuous.Consider asserting at least one
huggingface.coendpoint here too, so this test still fails meaningfully if endpoint resolution breaks later.Suggested patch
it("regression `#1432`: huggingface.co retains GET so downloads still work", () => { const endpoints = presetEndpoints().filter((ep) => ep.host === "huggingface.co"); + expect(endpoints.length).toBeGreaterThan(0); for (const ep of endpoints) { const rules = Array.isArray(ep.rules) ? ep.rules : []; const hasGet = rules.some( (r) => r && r.allow && typeof r.allow.method === "string" && r.allow.method.toUpperCase() === "GET",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/validate-blueprint.test.ts` around lines 234 - 243, The test "regression `#1432`: huggingface.co retains GET so downloads still work" currently only checks that each discovered huggingface.co endpoint has a GET rule but doesn't assert that any endpoints were found; add an explicit assertion that the endpoints array is non-empty (e.g., expect(endpoints.length).toBeGreaterThan(0)) right after calling presetEndpoints() so the test fails if endpoint resolution returns zero results; reference the existing symbols endpoints and presetEndpoints() when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/validate-blueprint.test.ts`:
- Around line 234-243: The test "regression `#1432`: huggingface.co retains GET so
downloads still work" currently only checks that each discovered huggingface.co
endpoint has a GET rule but doesn't assert that any endpoints were found; add an
explicit assertion that the endpoints array is non-empty (e.g.,
expect(endpoints.length).toBeGreaterThan(0)) right after calling
presetEndpoints() so the test fails if endpoint resolution returns zero results;
reference the existing symbols endpoints and presetEndpoints() when making this
change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 98d94879-72f6-4cff-8187-4cdef5de9a78
📒 Files selected for processing (2)
nemoclaw-blueprint/policies/presets/huggingface.yamltest/validate-blueprint.test.ts
|
✨ Thanks for submitting this PR, which proposes a way to improve security by restricting access to Hugging Face and may prevent unauthorized model uploads. Possibly related open issues: |
cv
left a comment
There was a problem hiding this comment.
Maintainer review — all gates pass.
| Gate | Status |
|---|---|
| CI | ✅ green (check-pr-limit + CodeRabbit) |
| Conflicts | ✅ MERGEABLE |
| CodeRabbit major/critical | ✅ none |
| Risky code tested | ✅ two regression tests |
Security pass:
- Correctly removes
POST /**fromhuggingface.co— principle of least privilege. - Inference traffic flows through
router.huggingface.co(retains POST) — no functional breakage. - Grep confirms no internal NemoClaw flow
POSTs tohuggingface.co. - Regression tests block re-adding the rule.
Non-blocking nitpick (CodeRabbit): The GET-retention test at line 234 would be stronger with expect(endpoints.length).toBeGreaterThan(0) like the POST test already has — prevents vacuous pass if endpoint resolution breaks. Not blocking since the POST test already guards this.
Ready for merge.
) ## Summary The gate checker and triage scripts treated "all present checks green" as passing, even when only 2 of ~9 checks existed. This caused premature approvals on fork PRs where workflows hadn't been triggered yet. ### Root cause Fork PRs from first-time contributors need a maintainer to click "Approve and run" before `pull_request` workflows execute. Until then, only `pull_request_target` checks (`check-pr-limit`) and external bots (`CodeRabbit`) appear in `statusCheckRollup`. The scripts saw 2/2 green and reported CI as passing. A secondary bug: GitHub's `statusCheckRollup` returns two shapes — `CheckRun` (`name`/`status`/`conclusion`) and `StatusContext` (`context`/`state`). The scripts only read CheckRun fields, so CodeRabbit (a StatusContext) was always treated as "pending" even when `state` was `SUCCESS`. ### Changes - **`check-gates.ts`**: Add `REQUIRED_CHECK_NAMES` (`checks`, `commit-lint`, `dco-check`) validation. Add `StatusCheck` union type to correctly handle both `CheckRun` and `StatusContext` shapes. CI gate now fails with `"required check(s) not found — workflows may need approval"` when expected checks are absent. - **`triage.ts`**: Add same required-check validation so triage does not score unapproved-workflow PRs as `review-ready`. - **`MERGE-GATE.md`**: Add "Missing required checks" as first bullet in Step 2 interpretation guidance. ### Before / After | PR scenario | Before | After | |---|---|---| | Fork PR, workflows not approved (2 checks) | "All 2 checks green" ✅ | "3 required check(s) not found — workflows may need approval" ❌ | | Fork PR, workflows running, dco-check failing | "1 pending" (CodeRabbit misread) | "3 failing check(s): dco-check: FAILURE, ..." ❌ | | Internal PR, all 12 checks green | "1 pending" (CodeRabbit misread) | "All 12 checks green" ✅ | ### Test plan - [x] Verified against PR #1660 (fork, workflows not approved) — correctly reports missing checks - [x] Verified against PR #1663 (fork, workflows approved, dco-check failing) — correctly reports failures - [x] Verified against PR #1683 (internal, all green) — correctly reports all 12 green - [x] Triage script correctly classifies #1660 as `salvage-now` with `failing-checks` reason instead of `review-ready` Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Enhanced merge gate to require specific CI checks be present and completed before a PR can be approved; missing required checks will block approval until workflows finish and validation is re-run. * Improved CI evaluation to better distinguish pending vs failed states across different check types. * PRs missing required check contexts are now classified as not-green. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…1719) ## Summary - Remove `github` and `github_rest_api` from baseline network policy docs (now opt-in preset, #1660) - Add `github` preset to presets tables in customize-network-policy.md and best-practices.md - Add `brave` and `brew` presets to best-practices.md - Update `huggingface` description (download-only + inference router, #1663) - Update `npm` and `pypi` descriptions (GET-only, publishing blocked, #1672) - Fix binary-scoping example to reference `github` as a preset, not baseline - Add `docs/.docs-skip` exclusion file for suppressing docs on experimental/unreleased features - Update `nemoclaw-contributor-update-docs` skill with skip-features, skip-terms, and agent matrix filtering rules - Regenerate `nemoclaw-user-*` skills from updated docs ## Test plan - [x] `make docs` builds without warnings - [x] All pre-commit and pre-push hooks pass - [x] Skip-term scan passes (zero violations in branch diff) - [ ] Verify rendered pages in docs site preview 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Introduced skip-list configuration for controlling documentation generation based on commit patterns and excluded terminology. * **Documentation** * Updated network policy presets with new `github`, `brave`, and `brew` options. * Refined preset descriptions for `huggingface`, `npm`, and `pypi` to clarify access scopes. * Clarified that GitHub access is no longer baseline and requires applying the `github` preset during onboarding. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
) (NVIDIA#1663) ## Summary - Remove the `POST /**` rule from `huggingface.co` in the huggingface preset, leaving the host download-only. - Add regression tests asserting `huggingface.co` has no POST allow rule and still retains the GET rule. Fixes NVIDIA#1432. ## Why The huggingface preset allowed both GET and POST on `huggingface.co/**`. The POST rule let any sandboxed agent that found an HF token in the environment publish models, datasets, and create repositories on the Hub via `/api/repos/create` and friends — far beyond what a download-only `from_pretrained` workflow needs. Same shape as NVIDIA#1437 (sentry POST) and matches the principle of least privilege the preset system is built around. Inference traffic flows through `router.huggingface.co` (declared separately in the same preset), not `huggingface.co`, so removing the POST rule from `huggingface.co` does not break the inference path. `cdn-lfs.huggingface.co` stays GET-only as before. ## Test plan - [x] `npx vitest run test/validate-blueprint.test.ts` → 29/29 passing. - [x] Negative-case verification: stash the preset edit, re-run — the new regression test fails as expected, then passes again after pop. - [ ] Maintainer should sanity-check that no internal NemoClaw flow `POST`s to `huggingface.co` (grep currently shows the only reference is the preset file itself). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Security** * Updated huggingface.co endpoint access control to restrict to read-only operations; POST method access has been removed while GET access remains available and functional * Added regression tests to validate that access control rules are properly enforced for huggingface.co endpoints and prevent unauthorized write operations <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…IDIA#1707) ## Summary The gate checker and triage scripts treated "all present checks green" as passing, even when only 2 of ~9 checks existed. This caused premature approvals on fork PRs where workflows hadn't been triggered yet. ### Root cause Fork PRs from first-time contributors need a maintainer to click "Approve and run" before `pull_request` workflows execute. Until then, only `pull_request_target` checks (`check-pr-limit`) and external bots (`CodeRabbit`) appear in `statusCheckRollup`. The scripts saw 2/2 green and reported CI as passing. A secondary bug: GitHub's `statusCheckRollup` returns two shapes — `CheckRun` (`name`/`status`/`conclusion`) and `StatusContext` (`context`/`state`). The scripts only read CheckRun fields, so CodeRabbit (a StatusContext) was always treated as "pending" even when `state` was `SUCCESS`. ### Changes - **`check-gates.ts`**: Add `REQUIRED_CHECK_NAMES` (`checks`, `commit-lint`, `dco-check`) validation. Add `StatusCheck` union type to correctly handle both `CheckRun` and `StatusContext` shapes. CI gate now fails with `"required check(s) not found — workflows may need approval"` when expected checks are absent. - **`triage.ts`**: Add same required-check validation so triage does not score unapproved-workflow PRs as `review-ready`. - **`MERGE-GATE.md`**: Add "Missing required checks" as first bullet in Step 2 interpretation guidance. ### Before / After | PR scenario | Before | After | |---|---|---| | Fork PR, workflows not approved (2 checks) | "All 2 checks green" ✅ | "3 required check(s) not found — workflows may need approval" ❌ | | Fork PR, workflows running, dco-check failing | "1 pending" (CodeRabbit misread) | "3 failing check(s): dco-check: FAILURE, ..." ❌ | | Internal PR, all 12 checks green | "1 pending" (CodeRabbit misread) | "All 12 checks green" ✅ | ### Test plan - [x] Verified against PR NVIDIA#1660 (fork, workflows not approved) — correctly reports missing checks - [x] Verified against PR NVIDIA#1663 (fork, workflows approved, dco-check failing) — correctly reports failures - [x] Verified against PR NVIDIA#1683 (internal, all green) — correctly reports all 12 green - [x] Triage script correctly classifies NVIDIA#1660 as `salvage-now` with `failing-checks` reason instead of `review-ready` Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Enhanced merge gate to require specific CI checks be present and completed before a PR can be approved; missing required checks will block approval until workflows finish and validation is re-run. * Improved CI evaluation to better distinguish pending vs failed states across different check types. * PRs missing required check contexts are now classified as not-green. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…VIDIA#1719) ## Summary - Remove `github` and `github_rest_api` from baseline network policy docs (now opt-in preset, NVIDIA#1660) - Add `github` preset to presets tables in customize-network-policy.md and best-practices.md - Add `brave` and `brew` presets to best-practices.md - Update `huggingface` description (download-only + inference router, NVIDIA#1663) - Update `npm` and `pypi` descriptions (GET-only, publishing blocked, NVIDIA#1672) - Fix binary-scoping example to reference `github` as a preset, not baseline - Add `docs/.docs-skip` exclusion file for suppressing docs on experimental/unreleased features - Update `nemoclaw-contributor-update-docs` skill with skip-features, skip-terms, and agent matrix filtering rules - Regenerate `nemoclaw-user-*` skills from updated docs ## Test plan - [x] `make docs` builds without warnings - [x] All pre-commit and pre-push hooks pass - [x] Skip-term scan passes (zero violations in branch diff) - [ ] Verify rendered pages in docs site preview 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Introduced skip-list configuration for controlling documentation generation based on commit patterns and excluded terminology. * **Documentation** * Updated network policy presets with new `github`, `brave`, and `brew` options. * Refined preset descriptions for `huggingface`, `npm`, and `pypi` to clarify access scopes. * Clarified that GitHub access is no longer baseline and requires applying the `github` preset during onboarding. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
) (NVIDIA#1663) ## Summary - Remove the `POST /**` rule from `huggingface.co` in the huggingface preset, leaving the host download-only. - Add regression tests asserting `huggingface.co` has no POST allow rule and still retains the GET rule. Fixes NVIDIA#1432. ## Why The huggingface preset allowed both GET and POST on `huggingface.co/**`. The POST rule let any sandboxed agent that found an HF token in the environment publish models, datasets, and create repositories on the Hub via `/api/repos/create` and friends — far beyond what a download-only `from_pretrained` workflow needs. Same shape as NVIDIA#1437 (sentry POST) and matches the principle of least privilege the preset system is built around. Inference traffic flows through `router.huggingface.co` (declared separately in the same preset), not `huggingface.co`, so removing the POST rule from `huggingface.co` does not break the inference path. `cdn-lfs.huggingface.co` stays GET-only as before. ## Test plan - [x] `npx vitest run test/validate-blueprint.test.ts` → 29/29 passing. - [x] Negative-case verification: stash the preset edit, re-run — the new regression test fails as expected, then passes again after pop. - [ ] Maintainer should sanity-check that no internal NemoClaw flow `POST`s to `huggingface.co` (grep currently shows the only reference is the preset file itself). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Security** * Updated huggingface.co endpoint access control to restrict to read-only operations; POST method access has been removed while GET access remains available and functional * Added regression tests to validate that access control rules are properly enforced for huggingface.co endpoints and prevent unauthorized write operations <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…IDIA#1707) ## Summary The gate checker and triage scripts treated "all present checks green" as passing, even when only 2 of ~9 checks existed. This caused premature approvals on fork PRs where workflows hadn't been triggered yet. ### Root cause Fork PRs from first-time contributors need a maintainer to click "Approve and run" before `pull_request` workflows execute. Until then, only `pull_request_target` checks (`check-pr-limit`) and external bots (`CodeRabbit`) appear in `statusCheckRollup`. The scripts saw 2/2 green and reported CI as passing. A secondary bug: GitHub's `statusCheckRollup` returns two shapes — `CheckRun` (`name`/`status`/`conclusion`) and `StatusContext` (`context`/`state`). The scripts only read CheckRun fields, so CodeRabbit (a StatusContext) was always treated as "pending" even when `state` was `SUCCESS`. ### Changes - **`check-gates.ts`**: Add `REQUIRED_CHECK_NAMES` (`checks`, `commit-lint`, `dco-check`) validation. Add `StatusCheck` union type to correctly handle both `CheckRun` and `StatusContext` shapes. CI gate now fails with `"required check(s) not found — workflows may need approval"` when expected checks are absent. - **`triage.ts`**: Add same required-check validation so triage does not score unapproved-workflow PRs as `review-ready`. - **`MERGE-GATE.md`**: Add "Missing required checks" as first bullet in Step 2 interpretation guidance. ### Before / After | PR scenario | Before | After | |---|---|---| | Fork PR, workflows not approved (2 checks) | "All 2 checks green" ✅ | "3 required check(s) not found — workflows may need approval" ❌ | | Fork PR, workflows running, dco-check failing | "1 pending" (CodeRabbit misread) | "3 failing check(s): dco-check: FAILURE, ..." ❌ | | Internal PR, all 12 checks green | "1 pending" (CodeRabbit misread) | "All 12 checks green" ✅ | ### Test plan - [x] Verified against PR NVIDIA#1660 (fork, workflows not approved) — correctly reports missing checks - [x] Verified against PR NVIDIA#1663 (fork, workflows approved, dco-check failing) — correctly reports failures - [x] Verified against PR NVIDIA#1683 (internal, all green) — correctly reports all 12 green - [x] Triage script correctly classifies NVIDIA#1660 as `salvage-now` with `failing-checks` reason instead of `review-ready` Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Enhanced merge gate to require specific CI checks be present and completed before a PR can be approved; missing required checks will block approval until workflows finish and validation is re-run. * Improved CI evaluation to better distinguish pending vs failed states across different check types. * PRs missing required check contexts are now classified as not-green. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…VIDIA#1719) ## Summary - Remove `github` and `github_rest_api` from baseline network policy docs (now opt-in preset, NVIDIA#1660) - Add `github` preset to presets tables in customize-network-policy.md and best-practices.md - Add `brave` and `brew` presets to best-practices.md - Update `huggingface` description (download-only + inference router, NVIDIA#1663) - Update `npm` and `pypi` descriptions (GET-only, publishing blocked, NVIDIA#1672) - Fix binary-scoping example to reference `github` as a preset, not baseline - Add `docs/.docs-skip` exclusion file for suppressing docs on experimental/unreleased features - Update `nemoclaw-contributor-update-docs` skill with skip-features, skip-terms, and agent matrix filtering rules - Regenerate `nemoclaw-user-*` skills from updated docs ## Test plan - [x] `make docs` builds without warnings - [x] All pre-commit and pre-push hooks pass - [x] Skip-term scan passes (zero violations in branch diff) - [ ] Verify rendered pages in docs site preview 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Introduced skip-list configuration for controlling documentation generation based on commit patterns and excluded terminology. * **Documentation** * Updated network policy presets with new `github`, `brave`, and `brew` options. * Refined preset descriptions for `huggingface`, `npm`, and `pypi` to clarify access scopes. * Clarified that GitHub access is no longer baseline and requires applying the `github` preset during onboarding. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
POST /**rule fromhuggingface.coin the huggingface preset, leaving the host download-only.huggingface.cohas no POST allow rule and still retains the GET rule.Fixes #1432.
Why
The huggingface preset allowed both GET and POST on
huggingface.co/**. The POST rule let any sandboxed agent that found an HF token in the environment publish models, datasets, and create repositories on the Hub via/api/repos/createand friends — far beyond what a download-onlyfrom_pretrainedworkflow needs. Same shape as #1437 (sentry POST) and matches the principle of least privilege the preset system is built around.Inference traffic flows through
router.huggingface.co(declared separately in the same preset), nothuggingface.co, so removing the POST rule fromhuggingface.codoes not break the inference path.cdn-lfs.huggingface.costays GET-only as before.Test plan
npx vitest run test/validate-blueprint.test.ts→ 29/29 passing.POSTs tohuggingface.co(grep currently shows the only reference is the preset file itself).Summary by CodeRabbit