Skip to content

fix(security): add authenticated reverse proxy for local Ollama#1922

Merged
brandonpelfrey merged 16 commits into
mainfrom
fix/ollama-auth-proxy
Apr 16, 2026
Merged

fix(security): add authenticated reverse proxy for local Ollama#1922
brandonpelfrey merged 16 commits into
mainfrom
fix/ollama-auth-proxy

Conversation

@prekshivyas

@prekshivyas prekshivyas commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #709. Supersedes #1140.

Ollama has no built-in auth. Today onboarding tells users to bind 0.0.0.0:11434, which exposes Ollama to the entire network with no authentication (CWE-668, flagged in #1140). This reimplements the auth proxy approach from #679 against the current TypeScript codebase.

Fix — keep Ollama on localhost, add an authenticated proxy:

  1. Auth proxy (scripts/ollama-auth-proxy.js): Lightweight Node.js reverse proxy on 0.0.0.0:11435 that validates a per-instance Bearer token before forwarding to Ollama on 127.0.0.1:11434. Uses crypto.timingSafeEqual for timing-safe comparison. GET /api/tags is exempt for container health checks.

  2. Onboard integration: Ollama binds to 127.0.0.1 instead of 0.0.0.0. Proxy starts automatically after Ollama with a random 24-byte token. Stale proxies from previous runs are cleaned up. Startup is verified before proceeding.

  3. Inference routing: Sandbox provider uses proxy port (11435) with the generated token as the OPENAI_API_KEY credential. OpenShell's L7 proxy injects the token at egress.

  4. Platform fix: macOS hint ("On macOS, local inference also depends on OpenShell host routing support.") now gated on process.platform === "darwin" instead of always shown.

Test plan

  • 7 e2e tests (mock backend, no real Ollama needed): unauthenticated rejection, wrong token rejection, correct token proxying, health check exemption, POST-to-health-check requires auth
  • 34 unit tests passing (updated port expectations and error messages)
  • All pre-commit hooks pass (gitleaks, shellcheck, ESLint, commitlint, TypeScript)
  • Manual: nemoclaw onboard with Local Ollama, verify inference works through proxy

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Token-gated local authentication proxy for Ollama with persisted credential, automatic lifecycle management, and container-access routing; unauthenticated GET /api/tags remains allowed.
    • Host-side connect now ensures the proxy is running after reconnects.
  • Tests

    • Added end-to-end tests validating proxy routing and authentication behavior.
    • Updated local tests to expect proxy-mediated access.
  • Chores

    • CI job added to run proxy e2e tests; minor devDependency reordering.

Ollama has no built-in auth and binding to 0.0.0.0 exposes it to the
network (CWE-668, #1140). This adds an authenticated reverse proxy so
Ollama stays on localhost while containers can still reach it.

- Add scripts/ollama-auth-proxy.js — Node.js proxy on 0.0.0.0:11435
  that validates a per-instance Bearer token before forwarding to
  Ollama on 127.0.0.1:11434. Health check (GET /api/tags) is exempt.
  Uses crypto.timingSafeEqual for timing-safe token comparison.
- Bind Ollama to 127.0.0.1 instead of 0.0.0.0 during onboard
- Start the auth proxy after Ollama, with stale proxy cleanup and
  startup verification
- Route sandbox inference through proxy port (11435) with the
  generated token as the OpenAI API key credential
- Gate macOS hint on process.platform === "darwin"
- Add OLLAMA_PROXY_PORT (11435) to ports.ts
- Add 7 e2e tests and CI job for the proxy
- Update unit tests for new port and error messages

Reimplements the approach from #679 (closed in favor of #1104) against
the current TypeScript codebase, addressing CodeRabbit findings from
the original PR (timing-safe comparison, stale proxy cleanup, startup
verification).

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an Ollama authentication reverse proxy, integrates it into onboarding and runtime to start/ensure/use the proxy and token, routes container reachability checks through the proxy, updates tests and adds an e2e proxy test plus a CI job to run that test.

Changes

Cohort / File(s) Summary
Proxy Server Implementation
scripts/ollama-auth-proxy.js
New Node.js reverse proxy requiring OLLAMA_PROXY_TOKEN (timing-safe compare), forwards to backend at 127.0.0.1:${OLLAMA_BACKEND_PORT}, strips authorization and host, allows unauthenticated GET /api/tags, returns 401/502 on auth/backend failures.
Onboarding & Runtime
src/lib/onboard.ts, src/nemoclaw.ts
Adds proxy token persistence (~/.nemoclaw/ollama-proxy-token), start/kill/ensure helpers for the auth proxy, changes non-WSL Ollama binding to 127.0.0.1, uses proxy token as OPENAI_API_KEY for non-WSL ollama-local, exports ensureOllamaAuthProxy, and calls it during sandbox connect.
Local Inference Routing & Tests
src/lib/local-inference.ts, src/lib/local-inference.test.ts
Introduces OLLAMA_CONTAINER_PORT (uses OLLAMA_PROXY_PORT when not WSL); updates base URL and container reachability check to host.openshell.internal:${OLLAMA_CONTAINER_PORT}; tests parameterized for the port and expect proxy-related failure text.
Port Configuration
src/lib/ports.ts
Added exported OLLAMA_PROXY_PORT (env NEMOCLAW_OLLAMA_PROXY_PORT, default 11435).
E2E Test Script & CI
test/e2e-ollama-proxy.sh, .github/workflows/pr.yaml
Adds e2e script that launches a mock backend and the auth proxy to validate auth success/failure and health-check exemption; adds test-e2e-ollama-proxy job to PR workflow to run the script when code changes.
Miscellaneous
package.json
Reordered a devDependency entry for ajv (no version change).

Sequence Diagram(s)

sequenceDiagram
    participant Container as Client Container
    participant Proxy as Ollama Auth Proxy\n(Port 11435)
    participant Backend as Ollama Backend\n(Port 11434)

    rect rgba(0, 200, 100, 0.5)
    Note over Container,Backend: Authenticated Request Flow
    Container->>Proxy: POST /api/generate\nAuthorization: Bearer {token}
    Proxy->>Proxy: Validate token (timing-safe)
    Proxy->>Backend: Forward request (strip auth/host)
    Backend->>Proxy: Response stream
    Proxy->>Container: Response stream
    end

    rect rgba(200, 100, 0, 0.5)
    Note over Container,Proxy: Unauthorized Request Flow
    Container->>Proxy: POST /api/generate (no/invalid token)
    Proxy->>Container: HTTP 401 Unauthorized
    end

    rect rgba(100, 150, 200, 0.5)
    Note over Container,Backend: Health Check (Exempted)
    Container->>Proxy: GET /api/tags (no auth)
    Proxy->>Backend: Forward health check
    Backend->>Proxy: Model list
    Proxy->>Container: Model list
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I guard the tunnel on eleven-three-five,

Tokens snug while health checks may thrive.
I strip the headers, forward the call,
Reject the pretenders, let models stand tall.
A hopping proxy, so onboarding won't stall.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding an authenticated reverse proxy for local Ollama to improve security and container reachability.
Linked Issues check ✅ Passed The PR fully addresses issue #709's objective of enabling containerized components to reach local Ollama without binding to 0.0.0.0 via a lightweight authenticated proxy.
Out of Scope Changes check ✅ Passed All changes directly support the authenticated proxy implementation: proxy script, onboarding integration, inference routing updates, tests, and CI job—no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ollama-auth-proxy

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wscurran wscurran added security Potential vulnerability, unsafe behavior, or access risk Local Models labels Apr 15, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
.github/workflows/pr.yaml (1)

54-65: Gate this E2E job behind the existing changes filter.

Right now this runs on docs-only PRs too, so it bypasses the fast path the workflow already has for expensive jobs. If that is not intentional, add the same needs: [checks, changes] / if: needs.changes.outputs.code == 'true' guard used by sandbox-images-and-e2e.

♻️ Suggested change
  test-e2e-ollama-proxy:
+   needs: [checks, changes]
+   if: needs.changes.outputs.code == 'true'
    runs-on: ubuntu-latest
    timeout-minutes: 5
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr.yaml around lines 54 - 65, The E2E job
"test-e2e-ollama-proxy" should be gated by the existing changes filter: add the
same needs and conditional used by "sandbox-images-and-e2e" by adding needs:
[checks, changes] to the job definition and adding if:
needs.changes.outputs.code == 'true' so the job only runs when code changes are
present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 1487-1522: Persist the generated ollamaProxyToken into the
onboarding session and handle resume: when startOllamaAuthProxy generates the
token, save it to onboardSession.ollamaProxyToken (or a similarly named field)
and reuse that value in getOllamaProxyToken instead of only the in-memory
variable; when resuming (e.g. in setupInference / after setupNim is skipped)
detect if the session provider is "ollama-local" and if
onboardSession.ollamaProxyToken exists reuse it and ensure the proxy is running,
otherwise call startOllamaAuthProxy to recreate the proxy and update
onboardSession.ollamaProxyToken so resumed runs always have a valid token and
proxy.
- Around line 1491-1499: The cleanup currently kills any PID from
runCapture(`lsof -ti :${OLLAMA_PROXY_PORT}`), which may terminate unrelated
services; instead track and validate the proxy process before killing: when you
spawn the proxy (where the proxy is started), record its PID (or write it to a
known file) and on cleanup read that PID and verify its command line contains
the proxy marker (e.g., "ollama-auth-proxy.js") using a ps lookup (or filter
lsof output by command) before calling run(`kill ...`); update the cleanup block
(the code using run, runCapture and OLLAMA_PROXY_PORT) to only kill the verified
PID and fall back to no-op if verification fails.

In `@test/e2e-ollama-proxy.sh`:
- Around line 17-20: The cleanup function and EXIT trap must guard against unset
PID variables to avoid unbound-variable errors under set -u; update cleanup (and
any EXIT trap invocation) to safely reference MOCK_PID and PROXY_PID using
parameter expansion or existence checks (e.g. ${MOCK_PID-} / ${PROXY_PID-} or if
[ -n "${MOCK_PID-}" ] ) before calling kill, so cleanup() only attempts kill
when the PID variables are set and non-empty.

---

Nitpick comments:
In @.github/workflows/pr.yaml:
- Around line 54-65: The E2E job "test-e2e-ollama-proxy" should be gated by the
existing changes filter: add the same needs and conditional used by
"sandbox-images-and-e2e" by adding needs: [checks, changes] to the job
definition and adding if: needs.changes.outputs.code == 'true' so the job only
runs when code changes are present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f4037d3e-ad05-4c00-a2da-d3698480e911

📥 Commits

Reviewing files that changed from the base of the PR and between f079a37 and 2f505e1.

📒 Files selected for processing (7)
  • .github/workflows/pr.yaml
  • scripts/ollama-auth-proxy.js
  • src/lib/local-inference.test.ts
  • src/lib/local-inference.ts
  • src/lib/onboard.ts
  • src/lib/ports.ts
  • test/e2e-ollama-proxy.sh

Comment thread src/lib/onboard.ts
Comment thread src/lib/onboard.ts Outdated
Comment thread test/e2e-ollama-proxy.sh
prekshivyas and others added 2 commits April 15, 2026 14:49
- Persist proxy token to ~/.nemoclaw/ollama-proxy-token (mode 0600)
  so it survives process restarts and onboard --resume
- Add ensureOllamaAuthProxy() called on sandbox connect to auto-restart
  the proxy after host reboots
- Restore WSL2 compatibility: skip proxy on WSL2 where Docker reaches
  the host directly (#1104), use OLLAMA_CONTAINER_PORT that adapts
  per platform
- Fix e2e test: replace invalid 0.0.0.0 reachability test with
  localhost liveness check (0.0.0.0 as destination routes to loopback
  on both Linux and macOS)

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- killStaleProxy() now verifies process command contains
  "ollama-auth-proxy" before killing, avoiding termination of
  unrelated services on the same port
- Initialize MOCK_PID/PROXY_PID and guard EXIT trap with ${:-}
  to prevent unbound-variable errors under set -u

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
test/e2e-ollama-proxy.sh (1)

17-20: ⚠️ Potential issue | 🟡 Minor

Guard cleanup against unset PIDs under set -u.

Lines [17]-[20] can throw an unbound-variable error if the script exits before Lines [47] or [55], which obscures the real failure.

🐚 Suggested fix
+MOCK_PID=""
+PROXY_PID=""
+
 cleanup() {
-  kill "$MOCK_PID" 2>/dev/null || true
-  kill "$PROXY_PID" 2>/dev/null || true
+  [ -n "${MOCK_PID:-}" ] && kill "$MOCK_PID" 2>/dev/null || true
+  [ -n "${PROXY_PID:-}" ] && kill "$PROXY_PID" 2>/dev/null || true
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e-ollama-proxy.sh` around lines 17 - 20, The cleanup function calls
kill on MOCK_PID and PROXY_PID which will cause an unbound-variable error under
set -u if those PIDs were never set; update cleanup to test each PID variable
safely (e.g. use parameter expansion like "${MOCK_PID-}" / "${PROXY_PID-}" or an
explicit [ -n "${MOCK_PID-}" ] check) before calling kill, and ensure you only
attempt kill when the variable is non-empty to avoid masking the original error
and to clean up processes reliably.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e-ollama-proxy.sh`:
- Around line 89-96: The test currently calls the auth-exempt endpoint /api/tags
so it doesn't validate auth; change the request in test/e2e-ollama-proxy.sh that
builds BODY (using CORRECT_AUTH, TOKEN, PROXY_PORT) to call a protected endpoint
such as /api/generate instead of /api/tags, use the correct HTTP method and JSON
payload required by /api/generate in the curl invocation (keep the -H
"Authorization: $CORRECT_AUTH" header), then assert the response BODY contains
the expected generation marker and keep the same pass/fail logic.

---

Duplicate comments:
In `@test/e2e-ollama-proxy.sh`:
- Around line 17-20: The cleanup function calls kill on MOCK_PID and PROXY_PID
which will cause an unbound-variable error under set -u if those PIDs were never
set; update cleanup to test each PID variable safely (e.g. use parameter
expansion like "${MOCK_PID-}" / "${PROXY_PID-}" or an explicit [ -n
"${MOCK_PID-}" ] check) before calling kill, and ensure you only attempt kill
when the variable is non-empty to avoid masking the original error and to clean
up processes reliably.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: cc76b675-5b18-435b-bf2e-7b78105bf763

📥 Commits

Reviewing files that changed from the base of the PR and between 8ba4eca and 1916df2.

📒 Files selected for processing (6)
  • package.json
  • src/lib/local-inference.test.ts
  • src/lib/local-inference.ts
  • src/lib/onboard.ts
  • src/nemoclaw.ts
  • test/e2e-ollama-proxy.sh
✅ Files skipped from review due to trivial changes (1)
  • package.json
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/lib/local-inference.test.ts
  • src/lib/onboard.ts
  • src/lib/local-inference.ts

Comment thread test/e2e-ollama-proxy.sh Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 1535-1536: The code currently generates ollamaProxyToken in
startOllamaAuthProxy() and immediately calls persistProxyToken(ollamaProxyToken)
which can leave a stale token if the user cancels or chooses a different
provider; change this so the token is only persisted after the provider
selection/validation is committed for the Ollama branch (i.e., persist inside
the final path that sets provider to "ollama-local" / after model selection
succeeds), or alternatively add a cleanup path that clears the persisted token
when the onboarding flow leaves the Ollama branch; update references to
ensureOllamaAuthProxy() usage so it only finds a persisted token when Ollama was
actually chosen and ensure ollamaProxyToken is persisted/cleared consistently
with that flow.
- Around line 1546-1579: The current readiness check in ensureOllamaAuthProxy
uses an unauthenticated GET /api/tags (via runCapture) which can pass against an
unrelated listener; replace it with a probe that verifies the specific proxy
instance — either (A) perform an authenticated probe using the persisted token
(call runCapture with the Authorization header / token-bound endpoint so the
request only succeeds against a proxy that accepts that token, using
loadPersistedProxyToken before probing and set ollamaProxyToken accordingly) or
(B) verify the actual proxy process identity (store/read the proxy PID/command
when starting in run(...) and in ensureOllamaAuthProxy validate that PID is
running and matches the expected command). Update ensureOllamaAuthProxy to
return only when the authenticated/token-bound probe or PID/command check
succeeds; use runCapture/run and the existing OLLAMA_PROXY_PORT,
ollamaProxyToken, loadPersistedProxyToken, and scripts/ollama-auth-proxy.js
symbols to locate and change the logic.
- Around line 1493-1499: persistProxyToken currently relies on
fs.writeFileSync(..., { mode: 0o600 }) which only sets permissions on new files;
update persistProxyToken to explicitly reset permissions after writing by
calling fs.chmodSync(PROXY_TOKEN_PATH, 0o600) following the write to ensure the
token file at PROXY_TOKEN_PATH always has restrictive 0600 permissions
regardless of prior file state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3df64d1d-08b3-4771-8705-2cfecf8e4dfd

📥 Commits

Reviewing files that changed from the base of the PR and between 1916df2 and adfb16e.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/e2e-ollama-proxy.sh
✅ Files skipped from review due to trivial changes (1)
  • test/e2e-ollama-proxy.sh

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/onboard.ts Outdated
- Test 5 now uses POST /api/generate (protected) instead of GET
  /api/tags (auth-exempt) to verify token auth actually works
- Gate test-e2e-ollama-proxy CI job behind changes filter to skip
  on docs-only PRs

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr.yaml:
- Around line 62-63: Replace the mutable ref for the setup-node action: locate
the uses entry "uses: actions/setup-node@v6" in the workflow and change it to
the equivalent immutable commit SHA (e.g., "uses:
actions/setup-node@<commit-sha> # v6") so the action is pinned to a specific
commit; ensure the chosen SHA matches the v6 release you intend to track.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 1db7bb91-c8e0-468d-8b3d-1e535b29ff28

📥 Commits

Reviewing files that changed from the base of the PR and between adfb16e and 962caa6.

📒 Files selected for processing (2)
  • .github/workflows/pr.yaml
  • test/e2e-ollama-proxy.sh
✅ Files skipped from review due to trivial changes (1)
  • test/e2e-ollama-proxy.sh

Comment thread .github/workflows/pr.yaml Outdated
…, proxy identity check

- Delay persisting proxy token to disk until ollama-local is confirmed
  in setupInference, so backing out to another provider doesn't leave
  a stale token that resurrects the proxy on non-Ollama sandboxes
- ensureOllamaAuthProxy now verifies the proxy accepts our token via
  an authenticated POST (not auth-exempt GET /api/tags), preventing
  false positives from stale proxies or unrelated listeners
- Add chmodSync after writeFileSync to ensure 0600 on existing files
  (Node.js mode option only applies on file creation)

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
prekshivyas and others added 7 commits April 16, 2026 08:53
Comprehensive e2e test using real Ollama (not mocks):
- Install Ollama, pull qwen2.5:0.5b (small CPU model)
- Start Ollama on 127.0.0.1 only, start auth proxy on 0.0.0.0:11435
- Token auth: reject unauthenticated, reject wrong token, accept correct
- Real inference: /v1/chat/completions and /api/generate through proxy
- Token persistence: file exists, 0600 permissions, content matches
- Proxy recovery: kill, verify dead, restart from persisted token,
  verify inference works after restart
- Container reachability: Docker container can reach proxy at
  host.openshell.internal:11435, cannot reach Ollama directly on 11434

Triggered via workflow_dispatch (manual) — not on every PR due to
Ollama download and CPU inference time (~3-4 min).

Also updates gpu-e2e test comments for auth proxy (127.0.0.1 binding).

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses Aaron's review: "Add an end-to-end onboard/connect regression
test that provisions ollama-local, persists the token, restarts/ensures
the proxy, and verifies container-reachable inference through 11435."

Adds Phase 4.5 to the existing gpu-e2e test (runs after onboard,
before inference):
- Token file persisted at ~/.nemoclaw/ollama-proxy-token
- Token file permissions 600
- Auth proxy running on :11435
- Proxy rejects unauthenticated POST (401)
- Proxy accepts persisted token
- Container reaches proxy at host.openshell.internal:11435
- Proxy recovery: kill proxy, verify restart via ensureOllamaAuthProxy

Updates Phase 5 comments: inference path now goes through auth proxy
(:11435) → Ollama (:11434).

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous recovery test called `nemoclaw status` to trigger
ensureOllamaAuthProxy, but that function is only called on
`sandboxConnect`, not `status`. Fix by restarting the proxy
directly from the persisted token (simulating what
ensureOllamaAuthProxy does on connect after a reboot).

Also adds verification that the recovered proxy accepts the
original persisted token.

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents lingering proxy process on reused runners after Phase 4.5g
manually restarts the proxy for recovery testing.

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@brandonpelfrey brandonpelfrey merged commit 4f30b9d into main Apr 16, 2026
9 checks passed
@prekshivyas prekshivyas deleted the fix/ollama-auth-proxy branch April 16, 2026 17:36
miyoungc added a commit that referenced this pull request Apr 17, 2026
Refresh user-facing docs against the 34 commits merged between v0.0.17
and v0.0.18. Highlights:

- Replace the Ollama 0.0.0.0 binding guidance with the new authenticated
  reverse proxy on 127.0.0.1:11435 (#1922).
- Document the compatible-endpoint provider defaulting to
  /v1/chat/completions and the NEMOCLAW_PREFERRED_API=openai-responses
  opt-in (#1984).
- Add the new nemoclaw upgrade-sandboxes command with --check, --auto,
  and --yes flags (#1943).
- Note the cross-sandbox messaging overlap warning and 409 detection in
  nemoclaw <name> status (#1953).
- Document the messaging-token rotation auto-rebuild flow (#1967).
- Cover new troubleshooting entries for the Ollama auth proxy, IPv6
  localhost resolution, orphan SSH port-forward cleanup on re-onboard,
  and rotated messaging credentials (#1978, #1950).
- Note tar failure exit code for nemoclaw debug --output (#1770) and the
  orphaned openshell process cleanup in nemoclaw uninstall (#1940).

Also:

- Extend docs/.docs-skip to exclude the experimental sandbox-mgmt
  shields and config commands (#1976).
- Fix a sphinx-autobuild infinite rebuild loop in docs/conf.py by
  writing docs/project.json only when its contents change.
- Bump the docs version switcher preferred entry to 0.0.18.
- Regenerate nemoclaw-user-* agent skills from docs/.

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
Made-with: Cursor
miyoungc added a commit that referenced this pull request Apr 17, 2026
## Summary

Refresh user-facing documentation against the 34 commits merged between
v0.0.17 and v0.0.18, bump the docs version switcher to v0.0.18, and fix
a
`sphinx-autobuild` infinite-rebuild loop triggered by `docs/conf.py`.

## Changes

- **Ollama authenticated reverse proxy** (#1922): Replace the
`0.0.0.0:11434` guidance in `docs/inference/use-local-inference.md` with
the new token-gated proxy on `127.0.0.1:11435`, including persisted
token,
health-check exemption, and sandbox provider wiring. Replace the
matching
  troubleshooting entry in `docs/reference/troubleshooting.md`.
- **Compatible-endpoint default API path** (#1984): Document that the
compatible-endpoint provider now defaults to `/v1/chat/completions` and
  update `NEMOCLAW_PREFERRED_API` to describe `openai-responses` as the
  opt-in instead of `openai-completions`. Updates in
  `use-local-inference.md`, `switch-inference-providers.md`, and
  `troubleshooting.md`.
- **`nemoclaw upgrade-sandboxes` command** (#1943): Add a new reference
entry in `docs/reference/commands.md` covering `--check`, `--auto`, and
  `--yes` flags.
- **Messaging token rotation auto-rebuild** (#1967, #1953): Note the
  automatic rebuild behavior and cross-sandbox overlap warning in
  `docs/deployment/set-up-telegram-bridge.md`, `commands.md`, and
  `troubleshooting.md`.
- **Other troubleshooting additions**:
  - `localhost` → `127.0.0.1` IPv6 note (#1978)
  - Orphan SSH port-forward cleanup on re-onboard (#1950)
  - Orphan `openshell` process cleanup in `nemoclaw uninstall` (#1940)
  - Non-zero exit on tar failure in `nemoclaw debug --output` (#1770)
- **Skip list**: Extend `docs/.docs-skip` to exclude the experimental
  sandbox-mgmt shields and config commands feature (#1976), which was
  explicitly merged as not-yet-documented.
- **Build stability**: `docs/conf.py` now writes `docs/project.json`
only
when contents change, so `make docs-live` / `sphinx-autobuild` no longer
detects its own generated file as a source change and enters an infinite
  rebuild loop.
- **Version switcher**: Bump `docs/versions1.json` and
`docs/project.json`
preferred entry to v0.0.18 so this refresh renders under the new
version.
- **Agent skills**: Regenerate `nemoclaw-user-*` skills from `docs/`
with
  `scripts/docs-to-skills.py`.

## Type of Change

- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification

- [x] `npx prek run --all-files` passes (ran via pre-commit hook on
staged files)
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

## AI Disclosure

- [x] AI-assisted — tool: Cursor

---

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

Made with [Cursor](https://cursor.com)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added `nemoclaw upgrade-sandboxes` command to rebuild sandboxes when
base-image digests change.
* Introduced authenticated reverse proxy for local Ollama inference with
token-based access control.
* Automatic sandbox backup, recreation, and restore when messaging
credentials are updated.
* Cross-sandbox messaging token overlap detection with status warnings.

* **Improvements**
* Compatible-endpoint provider now defaults to `/v1/chat/completions`
API path.
* Enhanced troubleshooting documentation with new diagnostics sections.

* **Documentation**
  * Updated onboarding and configuration guides.
  * Expanded version documentation to 0.0.18.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
ericksoa added a commit that referenced this pull request Apr 20, 2026


The argv migration inadvertently changed OLLAMA_HOST from 127.0.0.1 to
0.0.0.0, reverting the security fix in PR #1922 which moved Ollama to
localhost and added an authenticated reverse proxy on 0.0.0.0:11435.
Restore 127.0.0.1 on both the non-WSL Linux and macOS install paths.
ericksoa added a commit that referenced this pull request Apr 21, 2026
## Summary

Fixes the remaining open issue in #709 — local Ollama inference from
inside sandbox containers returns HTTP 403/401 even with
`local-inference` policy enabled.

**Root cause:** PR #1922 introduced an authenticated reverse proxy on
port **11435**, but PR #2000's `local-inference` policy preset only
allows port **11434** (direct Ollama) and **8000** (vLLM). On non-WSL
Linux systems, container traffic is routed to port 11435
(`src/lib/local-inference.ts:21`), which the policy blocks.

**Changes:**
- **`local-inference.yaml`**: Add port 11435 (auth proxy) endpoint so
containers can reach the proxy on non-WSL systems
- **`onboard.ts`**: Upgrade proxy startup failure from a soft warning to
a hard error with actionable diagnostics — prevents onboarding from
completing with a broken provider config
- **`policies.test.ts`**: Assert port 11435 is present in the preset to
prevent regression

## Test plan

- [x] All 79 policy tests pass (including updated port assertion)
- [x] All 34 local-inference unit tests pass
- [x] 125/129 onboard tests pass (4 pre-existing timeout failures in
unrelated `--from` test)
- [x] Ollama proxy recovery tests pass
- [ ] Manual: onboard with Local Ollama on Linux Docker CE, verify
inference works through proxy

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added support for an additional local inference network endpoint (host
internal, port 11435).

* **Bug Fixes**
* Onboarding now fails gracefully when the local auth proxy cannot start
and logs actionable diagnostic hints.

* **Tests**
* Updated tests and mocks to validate the new endpoint and the revised
onboarding behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output area: local-models Local model providers, downloads, launch, or connectivity area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression and removed priority: high labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output area: local-models Local model providers, downloads, launch, or connectivity area: providers Inference provider integrations and provider behavior bug-fix PR fixes a bug or regression security Potential vulnerability, unsafe behavior, or access risk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu + Docker CE] Linux onboarding for Ollama does not explain required container-reachable bind address

5 participants