fix(onboard): validate proxy token before trusting persisted file#2642
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughWhen a persisted proxy PID exists, ensureOllamaAuthProxy() now probes the running proxy with the persisted bearer token. If the probe fails the proxy is killed and restarted with the persisted token; startup readiness uses up to 10 authenticated probes instead of a fixed delay. Tests and an e2e script cover token-divergence scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Ensure as ensureOllamaAuthProxy()
participant File as Persisted Token File
participant Probe as isProxyTokenValid()
participant Curl as curl (probe)
participant Proxy as Ollama Auth Proxy
participant Kill as killStaleProxy()
participant Start as spawnOllamaAuthProxy()
App->>Ensure: invoke ensureOllamaAuthProxy()
Ensure->>File: load persisted token
File-->>Ensure: return token
Ensure->>Ensure: check recorded PID / process
alt proxy process running
Ensure->>Probe: probe proxy with token
Probe->>Curl: curl -H "Authorization: Bearer {token}" /v1/models (timeout)
Curl->>Proxy: HTTP GET /v1/models
alt probe succeeds
Proxy-->>Curl: 200 OK
Curl-->>Probe: success
Probe-->>Ensure: token valid
Ensure-->>App: reuse existing proxy
else probe fails (401/error)
Proxy-->>Curl: 401 / error
Curl-->>Probe: failure
Probe-->>Ensure: token invalid
Ensure->>Kill: killStaleProxy()
Kill->>Proxy: terminate old process
Ensure->>Start: start proxy with persisted token
Start-->>Proxy: new auth proxy running
Ensure-->>App: proxy restarted with file token
end
else no running proxy
Ensure->>Start: start proxy with persisted token
Start-->>Proxy: new auth proxy running
Ensure-->>App: proxy started
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard-ollama-proxy.ts (1)
197-209:⚠️ Potential issue | 🟠 MajorWait for the replacement proxy to become ready before returning.
The mismatch path still hands control back after a fixed one-second sleep, but both callers use
ensureOllamaAuthProxy()immediately before reading the token and sending traffic. Poll untilisProxyTokenValid(token)succeeds, or fail fast after a timeout, so the restart path doesn't reintroduce the same race you’re fixing.🔧 Possible fix
if (isOllamaProxyProcess(pid)) { if (isProxyTokenValid(token)) { ollamaProxyToken = token; return; } // Proxy is running but rejects the persisted token — restart it. killStaleProxy(); } // Proxy not running (or token mismatch) — restart with the persisted token. ollamaProxyToken = token; spawnOllamaAuthProxy(token); - sleep(1); + for (let attempt = 0; attempt < 10; attempt++) { + if (isProxyTokenValid(token)) return; + sleep(1); + } + console.error(" Error: Ollama auth proxy did not become ready after restart.");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-ollama-proxy.ts` around lines 197 - 209, The restart path currently returns after a fixed sleep causing a race; update ensureOllamaAuthProxy so that after spawnOllamaAuthProxy(token) it actively polls isProxyTokenValid(token) (e.g., loop with short sleep intervals) until it becomes true or a bounded timeout elapses, set ollamaProxyToken only after validation succeeds, and on timeout fail fast (throw or return an error) instead of returning immediately; keep references to isProxyTokenValid, spawnOllamaAuthProxy, killStaleProxy, ensureOllamaAuthProxy and ollamaProxyToken to locate the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-ollama-auth-proxy-e2e.sh`:
- Around line 450-505: The divergence check must be a hard assertion: replace
the soft log that occurs when OLD_TOKEN_OK/NEW_TOKEN_OK doesn't show divergence
with an explicit failure using fail "Divergence not reproduced — aborting test",
and remove unused capture variables OLD_TOKEN_RESULT, NEW_TOKEN_RESULT and
FIXED_RESULT; instead run the curl probes directly to set OLD_TOKEN_OK and
NEW_TOKEN_OK (and later FIXED_OK) without assigning their stdout/stderr to
variables, then assert that OLD_TOKEN_OK is true and NEW_TOKEN_OK is false
before proceeding to the simulated ensureOllamaAuthProxy restart logic.
---
Outside diff comments:
In `@src/lib/onboard-ollama-proxy.ts`:
- Around line 197-209: The restart path currently returns after a fixed sleep
causing a race; update ensureOllamaAuthProxy so that after
spawnOllamaAuthProxy(token) it actively polls isProxyTokenValid(token) (e.g.,
loop with short sleep intervals) until it becomes true or a bounded timeout
elapses, set ollamaProxyToken only after validation succeeds, and on timeout
fail fast (throw or return an error) instead of returning immediately; keep
references to isProxyTokenValid, spawnOllamaAuthProxy, killStaleProxy,
ensureOllamaAuthProxy and ollamaProxyToken to locate the changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 22ac496a-0a84-4838-a76b-b3fa5b8e004f
📒 Files selected for processing (2)
src/lib/onboard-ollama-proxy.tstest/e2e/test-ollama-auth-proxy-e2e.sh
25c7294 to
fa442e7
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/e2e/test-ollama-auth-proxy-e2e.sh (1)
456-468:⚠️ Potential issue | 🟠 MajorMake the divergence phase fail fast and drop the unused curl captures.
This still lets the test continue when the old/new token split is not reproduced, so the regression can go green without ever hitting the restart path.
OLD_TOKEN_RESULT,NEW_TOKEN_RESULT, andFIXED_RESULTare also unused and trip ShellCheck. As per coding guidelines:**/*.sh: Shell scripts must include ShellCheck compliance (.shellcheckrcenforced at root), be formatted withshfmt, have shebangs, and be executable.🧪 Tighten the regression check
- OLD_TOKEN_RESULT=$(curl -sf --max-time 3 \ - -H "Authorization: Bearer $ORIGINAL_TOKEN" \ - "http://localhost:${PROXY_PORT}/v1/models" 2>&1) && OLD_TOKEN_OK=true || OLD_TOKEN_OK=false + if curl -sf --max-time 3 \ + -H "Authorization: Bearer $ORIGINAL_TOKEN" \ + "http://localhost:${PROXY_PORT}/v1/models" >/dev/null 2>&1; then + OLD_TOKEN_OK=true + else + OLD_TOKEN_OK=false + fi - NEW_TOKEN_RESULT=$(curl -sf --max-time 3 \ - -H "Authorization: Bearer $DIVERGENT_TOKEN" \ - "http://localhost:${PROXY_PORT}/v1/models" 2>&1) && NEW_TOKEN_OK=true || NEW_TOKEN_OK=false + if curl -sf --max-time 3 \ + -H "Authorization: Bearer $DIVERGENT_TOKEN" \ + "http://localhost:${PROXY_PORT}/v1/models" >/dev/null 2>&1; then + NEW_TOKEN_OK=true + else + NEW_TOKEN_OK=false + fi if [ "$OLD_TOKEN_OK" = true ] && [ "$NEW_TOKEN_OK" = false ]; then pass "Confirmed: proxy running with old token, rejects new token (divergence exists)" else - info "Token state: old=$OLD_TOKEN_OK new=$NEW_TOKEN_OK (may already be fixed)" + fail "Token divergence phase did not reproduce the expected old/new token split" fi @@ - FIXED_RESULT=$(curl -sf --max-time 3 \ - -H "Authorization: Bearer $DIVERGENT_TOKEN" \ - "http://localhost:${PROXY_PORT}/v1/models" 2>&1) && FIXED_OK=true || FIXED_OK=false + if curl -sf --max-time 3 \ + -H "Authorization: Bearer $DIVERGENT_TOKEN" \ + "http://localhost:${PROXY_PORT}/v1/models" >/dev/null 2>&1; then + FIXED_OK=true + else + FIXED_OK=false + fiAlso applies to: 497-499
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-ollama-auth-proxy-e2e.sh` around lines 456 - 468, The divergence check currently captures curl output into OLD_TOKEN_RESULT/NEW_TOKEN_RESULT (unused) and lets the test continue if divergence isn't reproduced; remove those unused captures and make the check fail fast: call curl with -sf --max-time 3 and only use its exit status to set OLD_TOKEN_OK and NEW_TOKEN_OK (or test curl directly), then if the expected state (OLD_TOKEN_OK=true and NEW_TOKEN_OK=false) is not met, call fail/exit 1 immediately (instead of info) so the regression stops and subsequent restart path runs; also remove any references to FIXED_RESULT to avoid ShellCheck warnings and keep the pass/info logic only for the success case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/e2e/test-ollama-auth-proxy-e2e.sh`:
- Around line 456-468: The divergence check currently captures curl output into
OLD_TOKEN_RESULT/NEW_TOKEN_RESULT (unused) and lets the test continue if
divergence isn't reproduced; remove those unused captures and make the check
fail fast: call curl with -sf --max-time 3 and only use its exit status to set
OLD_TOKEN_OK and NEW_TOKEN_OK (or test curl directly), then if the expected
state (OLD_TOKEN_OK=true and NEW_TOKEN_OK=false) is not met, call fail/exit 1
immediately (instead of info) so the regression stops and subsequent restart
path runs; also remove any references to FIXED_RESULT to avoid ShellCheck
warnings and keep the pass/info logic only for the success case.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c6342c5f-6299-4d48-aa38-a2813640cac9
📒 Files selected for processing (3)
src/lib/onboard-ollama-proxy.tstest/e2e/test-ollama-auth-proxy-e2e.shtest/ollama-proxy-recovery.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/onboard-ollama-proxy.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard-ollama-proxy.ts`:
- Around line 195-207: The restart path must always run the cleanup so stale
auth-proxy processes don't hold the port when the PID file is missing or
invalid; modify the logic around loadPersistedProxyPid(),
isOllamaProxyProcess(), isProxyTokenValid(), killStaleProxy(),
spawnOllamaAuthProxy(), and ollamaProxyToken so that before calling
spawnOllamaAuthProxy(token) you unconditionally run killStaleProxy() (not only
in the branch where isOllamaProxyProcess(pid) is true), then set
ollamaProxyToken = token and spawn; this ensures stale processes are killed even
when the PID file is absent or corrupted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e5f27b31-2429-4cb9-bcfa-6df2fa3adb1e
📒 Files selected for processing (3)
src/lib/onboard-ollama-proxy.tstest/e2e/test-ollama-auth-proxy-e2e.shtest/ollama-proxy-recovery.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e/test-ollama-auth-proxy-e2e.sh
4a17e94 to
aa62ebb
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/lib/onboard-ollama-proxy.ts (1)
195-207:⚠️ Potential issue | 🟠 MajorRun stale-proxy cleanup unconditionally before restart.
At Line 205 restart can still happen without cleanup when PID is missing/corrupt, leaving an old proxy on the port and causing restart to fail or stay divergent.
Suggested fix
const pid = loadPersistedProxyPid(); - if (isOllamaProxyProcess(pid)) { - if (isProxyTokenValid(token)) { - ollamaProxyToken = token; - return; - } - // Proxy is running but rejects the persisted token — restart it. - killStaleProxy(); - } + if (isOllamaProxyProcess(pid) && isProxyTokenValid(token)) { + ollamaProxyToken = token; + return; + } + // Always clean stale proxies (PID-file and port-scan paths) before spawning. + killStaleProxy(); // Proxy not running (or token mismatch) — restart with the persisted token. ollamaProxyToken = token; spawnOllamaAuthProxy(token);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-ollama-proxy.ts` around lines 195 - 207, The restart logic may skip stale-proxy cleanup when the persisted PID is missing or corrupt, so always invoke killStaleProxy before starting a new proxy: in the block handling restart (where ollamaProxyToken is set and spawnOllamaAuthProxy(token) is called), ensure killStaleProxy() is executed unconditionally whenever you intend to restart the proxy (regardless of loadPersistedProxyPid() or isOllamaProxyProcess(pid) results), then set ollamaProxyToken = token and call spawnOllamaAuthProxy(token); keep checks using loadPersistedProxyPid, isOllamaProxyProcess, and isProxyTokenValid to avoid unnecessary restarts but ensure cleanup runs before spawn.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-ollama-auth-proxy-e2e.sh`:
- Around line 469-472: The failure branch that restores the token file uses a
top-level "return" which is invalid outside functions; change that to "exit 1"
so the script aborts properly after echoing "$ORIGINAL_TOKEN" >"$TOKEN_FILE".
Update the block that checks OLD_TOKEN_OK/NEW_TOKEN_OK (the failure message
referencing OLD_TOKEN_OK and NEW_TOKEN_OK) to call exit 1 instead of return,
ensuring the token is restored to TOKEN_FILE (using ORIGINAL_TOKEN) before
exiting.
---
Duplicate comments:
In `@src/lib/onboard-ollama-proxy.ts`:
- Around line 195-207: The restart logic may skip stale-proxy cleanup when the
persisted PID is missing or corrupt, so always invoke killStaleProxy before
starting a new proxy: in the block handling restart (where ollamaProxyToken is
set and spawnOllamaAuthProxy(token) is called), ensure killStaleProxy() is
executed unconditionally whenever you intend to restart the proxy (regardless of
loadPersistedProxyPid() or isOllamaProxyProcess(pid) results), then set
ollamaProxyToken = token and call spawnOllamaAuthProxy(token); keep checks using
loadPersistedProxyPid, isOllamaProxyProcess, and isProxyTokenValid to avoid
unnecessary restarts but ensure cleanup runs before spawn.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 84cd0c10-c8a9-4ea6-83fd-3204638d8d7a
📒 Files selected for processing (3)
src/lib/onboard-ollama-proxy.tstest/e2e/test-ollama-auth-proxy-e2e.shtest/ollama-proxy-recovery.test.ts
2b45c71 to
12f7c8d
Compare
ensureOllamaAuthProxy() trusted that the persisted token file matched the running proxy without validation. After a failed re-onboard, the proxy could run with token-A while the file stores token-B, causing persistent HTTP 401 on all inference calls. Add isProxyTokenValid() that probes the running proxy with the file token via curl. If the proxy rejects it, kill and restart with the correct token. Also adds token divergence regression test to ollama-proxy-e2e. Fixes NVIDIA#2553 Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Made-with: Cursor
12f7c8d to
1695442
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e/test-ollama-auth-proxy-e2e.sh (1)
448-449: Replace the unconditionalxxddependency with Node.js crypto for divergent token generation.Line 448 uses
xxdwithout validating availability. Since Node.js is already a script prerequisite, use its crypto module for consistent environment contracts:Suggested change
-DIVERGENT_TOKEN="divergent-$(date +%s)-$(head -c 16 /dev/urandom | xxd -p)" +DIVERGENT_TOKEN="divergent-$(date +%s)-$(node -e 'console.log(require("node:crypto").randomBytes(16).toString("hex"))')"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-ollama-auth-proxy-e2e.sh` around lines 448 - 449, The DIVERGENT_TOKEN generation uses an external xxd binary (DIVERGENT_TOKEN="divergent-$(date +%s)-$(head -c 16 /dev/urandom | xxd -p)"); replace that with a Node.js-built token to avoid the xxd dependency: invoke node to produce 16 random bytes as hex combined with the existing $(date +%s) prefix (i.e., generate the hex via Node's crypto.randomBytes(16).toString('hex') and assemble the same "divergent-<timestamp>-<hex>" value) and assign it to the DIVERGENT_TOKEN variable so the shell script relies only on Node.js for randomness.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/e2e/test-ollama-auth-proxy-e2e.sh`:
- Around line 448-449: The DIVERGENT_TOKEN generation uses an external xxd
binary (DIVERGENT_TOKEN="divergent-$(date +%s)-$(head -c 16 /dev/urandom | xxd
-p)"); replace that with a Node.js-built token to avoid the xxd dependency:
invoke node to produce 16 random bytes as hex combined with the existing $(date
+%s) prefix (i.e., generate the hex via Node's
crypto.randomBytes(16).toString('hex') and assemble the same
"divergent-<timestamp>-<hex>" value) and assign it to the DIVERGENT_TOKEN
variable so the shell script relies only on Node.js for randomness.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 52c4dba3-9a35-4018-929e-29726931c3b7
📒 Files selected for processing (3)
src/lib/onboard-ollama-proxy.tstest/e2e/test-ollama-auth-proxy-e2e.shtest/ollama-proxy-recovery.test.ts
ericksoa
left a comment
There was a problem hiding this comment.
Blocking issue in src/lib/onboard-ollama-proxy.ts:172: isProxyTokenValid() uses curl -sf against /v1/models, so it treats every non-2xx response as an invalid token. The proxy validates auth before forwarding to Ollama, which means a correct token plus an unavailable/unready Ollama backend returns 502, but this code interprets that exactly like 401 Unauthorized. On sandbox connect/rebuild with Ollama down, we will kill and restart a healthy proxy, then wait through the readiness loop and log Ollama auth proxy did not become ready even though the token was valid and the proxy was fine.
Please distinguish auth failure from backend failure here. For example, probe without -f, read the HTTP status code, and only treat 401 as token mismatch; connection failure can still mean the proxy is missing. A unit test covering “correct token but backend 502 does not restart” and “wrong token 401 restarts” would close the gap.
Local verification from /tmp/nemoclaw-pr-2642-review: npm run build:cli, npx vitest run test/ollama-proxy-recovery.test.ts --testTimeout 60000, and git diff --check all passed. CI is green except wsl-e2e, which is still pending.
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
ericksoa
left a comment
There was a problem hiding this comment.
Approved after follow-up fix in adad64a9. The proxy token probe now distinguishes auth rejection from backend readiness, so an accepted token with a backend 502 no longer causes a healthy proxy restart.
Local verification in /tmp/nemoclaw-pr-2642-review:
npm run build:clinpx vitest run test/ollama-proxy-recovery.test.ts --testTimeout 60000npm ci --ignore-scripts && npm run buildinnemoclaw/npx vitest run test/ssrf-parity.test.ts nemoclaw/src/commands/migration-state.test.ts --testTimeout 60000npx vitest run test/onboard.test.ts --testNamePattern "builds the sandbox without uploading an external OpenClaw config file" --testTimeout 120000npm test -- --testTimeout 120000bash -n test/e2e/test-ollama-auth-proxy-e2e.shgit diff --check
CI is green except wsl-e2e, which Aaron said can be ignored for this PR.
…IDIA#2642) ## Summary Fix persistent HTTP 401 on inference after re-onboarding with local Ollama. The auth proxy could run with a stale token while the persisted file stored a different one. ## Related Issue Fixes NVIDIA#2553 ## Changes - **Code fix** (`src/lib/onboard-ollama-proxy.ts`): Added `isProxyTokenValid()` that probes the running proxy with the persisted file token via curl. `ensureOllamaAuthProxy()` now validates the token before trusting it — if the running proxy rejects the file token, it kills and restarts the proxy with the correct token. - **Regression test** (`test/e2e/test-ollama-auth-proxy-e2e.sh`): New "Token Divergence" phase that writes a divergent token to the file while the proxy runs with the old one, then simulates the fix (probe → kill → restart), and verifies inference works with the new token. ## Type of Change - Code change (feature, bug fix, or refactor) ## Verification - `npx prek run --all-files` passes - Ollama Auth Proxy E2E: PASS on fork CI (1m 14s) - No secrets, API keys, or credentials committed ## AI Disclosure - AI-assisted — tool: Cursor --- Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Made with [Cursor](https://cursor.com) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Authentication proxy now verifies that the persisted bearer token is accepted by a running proxy; kills and restarts stale proxies and performs repeated readiness probes, logging an error if readiness never occurs. * **Tests** * Added an end-to-end regression that creates token/file divergence and verifies detection and automatic recovery. * Enhanced tests to simulate probe and restart behaviors for recovery scenarios. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Truong Nguyen <tgnguyen@nvidia.com> Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Co-authored-by: Aaron Erickson 🦞 <aerickson@nvidia.com>
Summary
Fix persistent HTTP 401 on inference after re-onboarding with local Ollama. The auth proxy could run with a stale token while the persisted file stored a different one.
Related Issue
Fixes #2553
Changes
src/lib/onboard-ollama-proxy.ts): AddedisProxyTokenValid()that probes the running proxy with the persisted file token via curl.ensureOllamaAuthProxy()now validates the token before trusting it — if the running proxy rejects the file token, it kills and restarts the proxy with the correct token.test/e2e/test-ollama-auth-proxy-e2e.sh): New "Token Divergence" phase that writes a divergent token to the file while the proxy runs with the old one, then simulates the fix (probe → kill → restart), and verifies inference works with the new token.Type of Change
Verification
npx prek run --all-filespassesAI Disclosure
Signed-off-by: Truong Nguyen tgnguyen@nvidia.com
Made with Cursor
Summary by CodeRabbit
Bug Fixes
Tests