fix(security): add authenticated reverse proxy for local Ollama#1922
Conversation
Ollama has no built-in auth and binding to 0.0.0.0 exposes it to the network (CWE-668, #1140). This adds an authenticated reverse proxy so Ollama stays on localhost while containers can still reach it. - Add scripts/ollama-auth-proxy.js — Node.js proxy on 0.0.0.0:11435 that validates a per-instance Bearer token before forwarding to Ollama on 127.0.0.1:11434. Health check (GET /api/tags) is exempt. Uses crypto.timingSafeEqual for timing-safe token comparison. - Bind Ollama to 127.0.0.1 instead of 0.0.0.0 during onboard - Start the auth proxy after Ollama, with stale proxy cleanup and startup verification - Route sandbox inference through proxy port (11435) with the generated token as the OpenAI API key credential - Gate macOS hint on process.platform === "darwin" - Add OLLAMA_PROXY_PORT (11435) to ports.ts - Add 7 e2e tests and CI job for the proxy - Update unit tests for new port and error messages Reimplements the approach from #679 (closed in favor of #1104) against the current TypeScript codebase, addressing CodeRabbit findings from the original PR (timing-safe comparison, stale proxy cleanup, startup verification). Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an Ollama authentication reverse proxy, integrates it into onboarding and runtime to start/ensure/use the proxy and token, routes container reachability checks through the proxy, updates tests and adds an e2e proxy test plus a CI job to run that test. Changes
Sequence Diagram(s)sequenceDiagram
participant Container as Client Container
participant Proxy as Ollama Auth Proxy\n(Port 11435)
participant Backend as Ollama Backend\n(Port 11434)
rect rgba(0, 200, 100, 0.5)
Note over Container,Backend: Authenticated Request Flow
Container->>Proxy: POST /api/generate\nAuthorization: Bearer {token}
Proxy->>Proxy: Validate token (timing-safe)
Proxy->>Backend: Forward request (strip auth/host)
Backend->>Proxy: Response stream
Proxy->>Container: Response stream
end
rect rgba(200, 100, 0, 0.5)
Note over Container,Proxy: Unauthorized Request Flow
Container->>Proxy: POST /api/generate (no/invalid token)
Proxy->>Container: HTTP 401 Unauthorized
end
rect rgba(100, 150, 200, 0.5)
Note over Container,Backend: Health Check (Exempted)
Container->>Proxy: GET /api/tags (no auth)
Proxy->>Backend: Forward health check
Backend->>Proxy: Model list
Proxy->>Container: Model list
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
✨
Related open issues: Possibly related open PRs:
Possibly related open issues: |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
.github/workflows/pr.yaml (1)
54-65: Gate this E2E job behind the existingchangesfilter.Right now this runs on docs-only PRs too, so it bypasses the fast path the workflow already has for expensive jobs. If that is not intentional, add the same
needs: [checks, changes]/if: needs.changes.outputs.code == 'true'guard used bysandbox-images-and-e2e.♻️ Suggested change
test-e2e-ollama-proxy: + needs: [checks, changes] + if: needs.changes.outputs.code == 'true' runs-on: ubuntu-latest timeout-minutes: 5🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/pr.yaml around lines 54 - 65, The E2E job "test-e2e-ollama-proxy" should be gated by the existing changes filter: add the same needs and conditional used by "sandbox-images-and-e2e" by adding needs: [checks, changes] to the job definition and adding if: needs.changes.outputs.code == 'true' so the job only runs when code changes are present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 1487-1522: Persist the generated ollamaProxyToken into the
onboarding session and handle resume: when startOllamaAuthProxy generates the
token, save it to onboardSession.ollamaProxyToken (or a similarly named field)
and reuse that value in getOllamaProxyToken instead of only the in-memory
variable; when resuming (e.g. in setupInference / after setupNim is skipped)
detect if the session provider is "ollama-local" and if
onboardSession.ollamaProxyToken exists reuse it and ensure the proxy is running,
otherwise call startOllamaAuthProxy to recreate the proxy and update
onboardSession.ollamaProxyToken so resumed runs always have a valid token and
proxy.
- Around line 1491-1499: The cleanup currently kills any PID from
runCapture(`lsof -ti :${OLLAMA_PROXY_PORT}`), which may terminate unrelated
services; instead track and validate the proxy process before killing: when you
spawn the proxy (where the proxy is started), record its PID (or write it to a
known file) and on cleanup read that PID and verify its command line contains
the proxy marker (e.g., "ollama-auth-proxy.js") using a ps lookup (or filter
lsof output by command) before calling run(`kill ...`); update the cleanup block
(the code using run, runCapture and OLLAMA_PROXY_PORT) to only kill the verified
PID and fall back to no-op if verification fails.
In `@test/e2e-ollama-proxy.sh`:
- Around line 17-20: The cleanup function and EXIT trap must guard against unset
PID variables to avoid unbound-variable errors under set -u; update cleanup (and
any EXIT trap invocation) to safely reference MOCK_PID and PROXY_PID using
parameter expansion or existence checks (e.g. ${MOCK_PID-} / ${PROXY_PID-} or if
[ -n "${MOCK_PID-}" ] ) before calling kill, so cleanup() only attempts kill
when the PID variables are set and non-empty.
---
Nitpick comments:
In @.github/workflows/pr.yaml:
- Around line 54-65: The E2E job "test-e2e-ollama-proxy" should be gated by the
existing changes filter: add the same needs and conditional used by
"sandbox-images-and-e2e" by adding needs: [checks, changes] to the job
definition and adding if: needs.changes.outputs.code == 'true' so the job only
runs when code changes are present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: f4037d3e-ad05-4c00-a2da-d3698480e911
📒 Files selected for processing (7)
.github/workflows/pr.yamlscripts/ollama-auth-proxy.jssrc/lib/local-inference.test.tssrc/lib/local-inference.tssrc/lib/onboard.tssrc/lib/ports.tstest/e2e-ollama-proxy.sh
- Persist proxy token to ~/.nemoclaw/ollama-proxy-token (mode 0600) so it survives process restarts and onboard --resume - Add ensureOllamaAuthProxy() called on sandbox connect to auto-restart the proxy after host reboots - Restore WSL2 compatibility: skip proxy on WSL2 where Docker reaches the host directly (#1104), use OLLAMA_CONTAINER_PORT that adapts per platform - Fix e2e test: replace invalid 0.0.0.0 reachability test with localhost liveness check (0.0.0.0 as destination routes to loopback on both Linux and macOS) Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- killStaleProxy() now verifies process command contains
"ollama-auth-proxy" before killing, avoiding termination of
unrelated services on the same port
- Initialize MOCK_PID/PROXY_PID and guard EXIT trap with ${:-}
to prevent unbound-variable errors under set -u
Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/e2e-ollama-proxy.sh (1)
17-20:⚠️ Potential issue | 🟡 MinorGuard
cleanupagainst unset PIDs underset -u.Lines [17]-[20] can throw an unbound-variable error if the script exits before Lines [47] or [55], which obscures the real failure.
🐚 Suggested fix
+MOCK_PID="" +PROXY_PID="" + cleanup() { - kill "$MOCK_PID" 2>/dev/null || true - kill "$PROXY_PID" 2>/dev/null || true + [ -n "${MOCK_PID:-}" ] && kill "$MOCK_PID" 2>/dev/null || true + [ -n "${PROXY_PID:-}" ] && kill "$PROXY_PID" 2>/dev/null || true }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e-ollama-proxy.sh` around lines 17 - 20, The cleanup function calls kill on MOCK_PID and PROXY_PID which will cause an unbound-variable error under set -u if those PIDs were never set; update cleanup to test each PID variable safely (e.g. use parameter expansion like "${MOCK_PID-}" / "${PROXY_PID-}" or an explicit [ -n "${MOCK_PID-}" ] check) before calling kill, and ensure you only attempt kill when the variable is non-empty to avoid masking the original error and to clean up processes reliably.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e-ollama-proxy.sh`:
- Around line 89-96: The test currently calls the auth-exempt endpoint /api/tags
so it doesn't validate auth; change the request in test/e2e-ollama-proxy.sh that
builds BODY (using CORRECT_AUTH, TOKEN, PROXY_PORT) to call a protected endpoint
such as /api/generate instead of /api/tags, use the correct HTTP method and JSON
payload required by /api/generate in the curl invocation (keep the -H
"Authorization: $CORRECT_AUTH" header), then assert the response BODY contains
the expected generation marker and keep the same pass/fail logic.
---
Duplicate comments:
In `@test/e2e-ollama-proxy.sh`:
- Around line 17-20: The cleanup function calls kill on MOCK_PID and PROXY_PID
which will cause an unbound-variable error under set -u if those PIDs were never
set; update cleanup to test each PID variable safely (e.g. use parameter
expansion like "${MOCK_PID-}" / "${PROXY_PID-}" or an explicit [ -n
"${MOCK_PID-}" ] check) before calling kill, and ensure you only attempt kill
when the variable is non-empty to avoid masking the original error and to clean
up processes reliably.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: cc76b675-5b18-435b-bf2e-7b78105bf763
📒 Files selected for processing (6)
package.jsonsrc/lib/local-inference.test.tssrc/lib/local-inference.tssrc/lib/onboard.tssrc/nemoclaw.tstest/e2e-ollama-proxy.sh
✅ Files skipped from review due to trivial changes (1)
- package.json
🚧 Files skipped from review as they are similar to previous changes (3)
- src/lib/local-inference.test.ts
- src/lib/onboard.ts
- src/lib/local-inference.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 1535-1536: The code currently generates ollamaProxyToken in
startOllamaAuthProxy() and immediately calls persistProxyToken(ollamaProxyToken)
which can leave a stale token if the user cancels or chooses a different
provider; change this so the token is only persisted after the provider
selection/validation is committed for the Ollama branch (i.e., persist inside
the final path that sets provider to "ollama-local" / after model selection
succeeds), or alternatively add a cleanup path that clears the persisted token
when the onboarding flow leaves the Ollama branch; update references to
ensureOllamaAuthProxy() usage so it only finds a persisted token when Ollama was
actually chosen and ensure ollamaProxyToken is persisted/cleared consistently
with that flow.
- Around line 1546-1579: The current readiness check in ensureOllamaAuthProxy
uses an unauthenticated GET /api/tags (via runCapture) which can pass against an
unrelated listener; replace it with a probe that verifies the specific proxy
instance — either (A) perform an authenticated probe using the persisted token
(call runCapture with the Authorization header / token-bound endpoint so the
request only succeeds against a proxy that accepts that token, using
loadPersistedProxyToken before probing and set ollamaProxyToken accordingly) or
(B) verify the actual proxy process identity (store/read the proxy PID/command
when starting in run(...) and in ensureOllamaAuthProxy validate that PID is
running and matches the expected command). Update ensureOllamaAuthProxy to
return only when the authenticated/token-bound probe or PID/command check
succeeds; use runCapture/run and the existing OLLAMA_PROXY_PORT,
ollamaProxyToken, loadPersistedProxyToken, and scripts/ollama-auth-proxy.js
symbols to locate and change the logic.
- Around line 1493-1499: persistProxyToken currently relies on
fs.writeFileSync(..., { mode: 0o600 }) which only sets permissions on new files;
update persistProxyToken to explicitly reset permissions after writing by
calling fs.chmodSync(PROXY_TOKEN_PATH, 0o600) following the write to ensure the
token file at PROXY_TOKEN_PATH always has restrictive 0600 permissions
regardless of prior file state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 3df64d1d-08b3-4771-8705-2cfecf8e4dfd
📒 Files selected for processing (2)
src/lib/onboard.tstest/e2e-ollama-proxy.sh
✅ Files skipped from review due to trivial changes (1)
- test/e2e-ollama-proxy.sh
- Test 5 now uses POST /api/generate (protected) instead of GET /api/tags (auth-exempt) to verify token auth actually works - Gate test-e2e-ollama-proxy CI job behind changes filter to skip on docs-only PRs Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/pr.yaml:
- Around line 62-63: Replace the mutable ref for the setup-node action: locate
the uses entry "uses: actions/setup-node@v6" in the workflow and change it to
the equivalent immutable commit SHA (e.g., "uses:
actions/setup-node@<commit-sha> # v6") so the action is pinned to a specific
commit; ensure the chosen SHA matches the v6 release you intend to track.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 1db7bb91-c8e0-468d-8b3d-1e535b29ff28
📒 Files selected for processing (2)
.github/workflows/pr.yamltest/e2e-ollama-proxy.sh
✅ Files skipped from review due to trivial changes (1)
- test/e2e-ollama-proxy.sh
…, proxy identity check - Delay persisting proxy token to disk until ollama-local is confirmed in setupInference, so backing out to another provider doesn't leave a stale token that resurrects the proxy on non-Ollama sandboxes - ensureOllamaAuthProxy now verifies the proxy accepts our token via an authenticated POST (not auth-exempt GET /api/tags), preventing false positives from stale proxies or unrelated listeners - Add chmodSync after writeFileSync to ensure 0600 on existing files (Node.js mode option only applies on file creation) Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive e2e test using real Ollama (not mocks): - Install Ollama, pull qwen2.5:0.5b (small CPU model) - Start Ollama on 127.0.0.1 only, start auth proxy on 0.0.0.0:11435 - Token auth: reject unauthenticated, reject wrong token, accept correct - Real inference: /v1/chat/completions and /api/generate through proxy - Token persistence: file exists, 0600 permissions, content matches - Proxy recovery: kill, verify dead, restart from persisted token, verify inference works after restart - Container reachability: Docker container can reach proxy at host.openshell.internal:11435, cannot reach Ollama directly on 11434 Triggered via workflow_dispatch (manual) — not on every PR due to Ollama download and CPU inference time (~3-4 min). Also updates gpu-e2e test comments for auth proxy (127.0.0.1 binding). Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses Aaron's review: "Add an end-to-end onboard/connect regression test that provisions ollama-local, persists the token, restarts/ensures the proxy, and verifies container-reachable inference through 11435." Adds Phase 4.5 to the existing gpu-e2e test (runs after onboard, before inference): - Token file persisted at ~/.nemoclaw/ollama-proxy-token - Token file permissions 600 - Auth proxy running on :11435 - Proxy rejects unauthenticated POST (401) - Proxy accepts persisted token - Container reaches proxy at host.openshell.internal:11435 - Proxy recovery: kill proxy, verify restart via ensureOllamaAuthProxy Updates Phase 5 comments: inference path now goes through auth proxy (:11435) → Ollama (:11434). Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous recovery test called `nemoclaw status` to trigger ensureOllamaAuthProxy, but that function is only called on `sandboxConnect`, not `status`. Fix by restarting the proxy directly from the persisted token (simulating what ensureOllamaAuthProxy does on connect after a reboot). Also adds verification that the recovered proxy accepts the original persisted token. Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents lingering proxy process on reused runners after Phase 4.5g manually restarts the proxy for recovery testing. Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Refresh user-facing docs against the 34 commits merged between v0.0.17 and v0.0.18. Highlights: - Replace the Ollama 0.0.0.0 binding guidance with the new authenticated reverse proxy on 127.0.0.1:11435 (#1922). - Document the compatible-endpoint provider defaulting to /v1/chat/completions and the NEMOCLAW_PREFERRED_API=openai-responses opt-in (#1984). - Add the new nemoclaw upgrade-sandboxes command with --check, --auto, and --yes flags (#1943). - Note the cross-sandbox messaging overlap warning and 409 detection in nemoclaw <name> status (#1953). - Document the messaging-token rotation auto-rebuild flow (#1967). - Cover new troubleshooting entries for the Ollama auth proxy, IPv6 localhost resolution, orphan SSH port-forward cleanup on re-onboard, and rotated messaging credentials (#1978, #1950). - Note tar failure exit code for nemoclaw debug --output (#1770) and the orphaned openshell process cleanup in nemoclaw uninstall (#1940). Also: - Extend docs/.docs-skip to exclude the experimental sandbox-mgmt shields and config commands (#1976). - Fix a sphinx-autobuild infinite rebuild loop in docs/conf.py by writing docs/project.json only when its contents change. - Bump the docs version switcher preferred entry to 0.0.18. - Regenerate nemoclaw-user-* agent skills from docs/. Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> Made-with: Cursor
## Summary Refresh user-facing documentation against the 34 commits merged between v0.0.17 and v0.0.18, bump the docs version switcher to v0.0.18, and fix a `sphinx-autobuild` infinite-rebuild loop triggered by `docs/conf.py`. ## Changes - **Ollama authenticated reverse proxy** (#1922): Replace the `0.0.0.0:11434` guidance in `docs/inference/use-local-inference.md` with the new token-gated proxy on `127.0.0.1:11435`, including persisted token, health-check exemption, and sandbox provider wiring. Replace the matching troubleshooting entry in `docs/reference/troubleshooting.md`. - **Compatible-endpoint default API path** (#1984): Document that the compatible-endpoint provider now defaults to `/v1/chat/completions` and update `NEMOCLAW_PREFERRED_API` to describe `openai-responses` as the opt-in instead of `openai-completions`. Updates in `use-local-inference.md`, `switch-inference-providers.md`, and `troubleshooting.md`. - **`nemoclaw upgrade-sandboxes` command** (#1943): Add a new reference entry in `docs/reference/commands.md` covering `--check`, `--auto`, and `--yes` flags. - **Messaging token rotation auto-rebuild** (#1967, #1953): Note the automatic rebuild behavior and cross-sandbox overlap warning in `docs/deployment/set-up-telegram-bridge.md`, `commands.md`, and `troubleshooting.md`. - **Other troubleshooting additions**: - `localhost` → `127.0.0.1` IPv6 note (#1978) - Orphan SSH port-forward cleanup on re-onboard (#1950) - Orphan `openshell` process cleanup in `nemoclaw uninstall` (#1940) - Non-zero exit on tar failure in `nemoclaw debug --output` (#1770) - **Skip list**: Extend `docs/.docs-skip` to exclude the experimental sandbox-mgmt shields and config commands feature (#1976), which was explicitly merged as not-yet-documented. - **Build stability**: `docs/conf.py` now writes `docs/project.json` only when contents change, so `make docs-live` / `sphinx-autobuild` no longer detects its own generated file as a source change and enters an infinite rebuild loop. - **Version switcher**: Bump `docs/versions1.json` and `docs/project.json` preferred entry to v0.0.18 so this refresh renders under the new version. - **Agent skills**: Regenerate `nemoclaw-user-*` skills from `docs/` with `scripts/docs-to-skills.py`. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes (ran via pre-commit hook on staged files) - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure - [x] AI-assisted — tool: Cursor --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> Made with [Cursor](https://cursor.com) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added `nemoclaw upgrade-sandboxes` command to rebuild sandboxes when base-image digests change. * Introduced authenticated reverse proxy for local Ollama inference with token-based access control. * Automatic sandbox backup, recreation, and restore when messaging credentials are updated. * Cross-sandbox messaging token overlap detection with status warnings. * **Improvements** * Compatible-endpoint provider now defaults to `/v1/chat/completions` API path. * Enhanced troubleshooting documentation with new diagnostics sections. * **Documentation** * Updated onboarding and configuration guides. * Expanded version documentation to 0.0.18. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
The argv migration inadvertently changed OLLAMA_HOST from 127.0.0.1 to 0.0.0.0, reverting the security fix in PR #1922 which moved Ollama to localhost and added an authenticated reverse proxy on 0.0.0.0:11435. Restore 127.0.0.1 on both the non-WSL Linux and macOS install paths.
## Summary Fixes the remaining open issue in #709 — local Ollama inference from inside sandbox containers returns HTTP 403/401 even with `local-inference` policy enabled. **Root cause:** PR #1922 introduced an authenticated reverse proxy on port **11435**, but PR #2000's `local-inference` policy preset only allows port **11434** (direct Ollama) and **8000** (vLLM). On non-WSL Linux systems, container traffic is routed to port 11435 (`src/lib/local-inference.ts:21`), which the policy blocks. **Changes:** - **`local-inference.yaml`**: Add port 11435 (auth proxy) endpoint so containers can reach the proxy on non-WSL systems - **`onboard.ts`**: Upgrade proxy startup failure from a soft warning to a hard error with actionable diagnostics — prevents onboarding from completing with a broken provider config - **`policies.test.ts`**: Assert port 11435 is present in the preset to prevent regression ## Test plan - [x] All 79 policy tests pass (including updated port assertion) - [x] All 34 local-inference unit tests pass - [x] 125/129 onboard tests pass (4 pre-existing timeout failures in unrelated `--from` test) - [x] Ollama proxy recovery tests pass - [ ] Manual: onboard with Local Ollama on Linux Docker CE, verify inference works through proxy <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for an additional local inference network endpoint (host internal, port 11435). * **Bug Fixes** * Onboarding now fails gracefully when the local auth proxy cannot start and logs actionable diagnostic hints. * **Tests** * Updated tests and mocks to validate the new endpoint and the revised onboarding behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Closes #709. Supersedes #1140.
Ollama has no built-in auth. Today onboarding tells users to bind
0.0.0.0:11434, which exposes Ollama to the entire network with no authentication (CWE-668, flagged in #1140). This reimplements the auth proxy approach from #679 against the current TypeScript codebase.Fix — keep Ollama on localhost, add an authenticated proxy:
Auth proxy (
scripts/ollama-auth-proxy.js): Lightweight Node.js reverse proxy on0.0.0.0:11435that validates a per-instance Bearer token before forwarding to Ollama on127.0.0.1:11434. Usescrypto.timingSafeEqualfor timing-safe comparison.GET /api/tagsis exempt for container health checks.Onboard integration: Ollama binds to
127.0.0.1instead of0.0.0.0. Proxy starts automatically after Ollama with a random 24-byte token. Stale proxies from previous runs are cleaned up. Startup is verified before proceeding.Inference routing: Sandbox provider uses proxy port (11435) with the generated token as the
OPENAI_API_KEYcredential. OpenShell's L7 proxy injects the token at egress.Platform fix: macOS hint ("On macOS, local inference also depends on OpenShell host routing support.") now gated on
process.platform === "darwin"instead of always shown.Test plan
nemoclaw onboardwith Local Ollama, verify inference works through proxy🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
Chores