fix(onboard): classify TLS certificate errors during sandbox create#1936
Conversation
Add "tls_cert_mismatch" classification to classifySandboxCreateFailure() so that TLS/certificate errors (e.g. "invalid peer certificate: BadSignature") are detected instead of falling through to "unknown". When this failure kind is detected, printSandboxCreateRecoveryHints() now tells the user to re-trust the gateway certificate with `openshell gateway trust -g nemoclaw` before resuming onboarding. Closes NVIDIA#1933 Signed-off-by: John Liu <lijohn@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRecognizes TLS certificate/handshake verification failures during sandbox creation as Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/validation.ts (1)
69-71: Tighten TLS pattern to avoid false cert-mismatch classification.Line 69 currently matches any
tls.*error/ssl.*error, which can include non-certificate TLS failures and trigger a misleading re-trust hint. Consider narrowing this branch to certificate/trust-specific phrases.♻️ Proposed regex refinement
- if (/invalid peer certificate|BadSignature|handshake verification failed|certificate verify failed|tls.*error|ssl.*error/i.test(text)) { + if ( + /invalid peer certificate|BadSignature|handshake verification failed|certificate verify failed|SSL certificate problem|x509: certificate|unknown authority/i.test( + text, + ) + ) { return { kind: "tls_cert_mismatch", uploadedToGateway }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/validation.ts` around lines 69 - 71, The current regex in the TLS branch is too broad and matches generic "tls.*error" / "ssl.*error", causing non-cert TLS failures to be classified as kind: "tls_cert_mismatch"; update the pattern used in the conditional that returns { kind: "tls_cert_mismatch", uploadedToGateway } to only match certificate/trust-specific phrases (e.g., include words like "certificate", "cert", "x509", "trust", "verify", "handshake verification failed", "certificate verify failed", "BadSignature" etc.) and remove the generic tls.*error and ssl.*error alternatives so only true certificate/verification errors trigger this branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/lib/validation.ts`:
- Around line 69-71: The current regex in the TLS branch is too broad and
matches generic "tls.*error" / "ssl.*error", causing non-cert TLS failures to be
classified as kind: "tls_cert_mismatch"; update the pattern used in the
conditional that returns { kind: "tls_cert_mismatch", uploadedToGateway } to
only match certificate/trust-specific phrases (e.g., include words like
"certificate", "cert", "x509", "trust", "verify", "handshake verification
failed", "certificate verify failed", "BadSignature" etc.) and remove the
generic tls.*error and ssl.*error alternatives so only true
certificate/verification errors trigger this branch.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 438ae893-fad6-422d-a03b-9f791a791be3
📒 Files selected for processing (3)
src/lib/build-context.tssrc/lib/validation.test.tssrc/lib/validation.ts
|
✨ Thanks for submitting this pull request, which proposes a way to fix the issue with TLS certificate errors not being properly detected and handled during onboarding. This change could help improve the security and reliability of the onboarding process. Possibly related open issues: |
Automated PR review summaryReviewed PR #1936: fix(onboard): classify TLS certificate errors during sandbox create Recommendation
Installation and setup findings
What was validated
Failing tests and unresolved impactFailing test 1: Broad tls/ssl regex overmatches non-certificate failures
Passing tests and why they matteredPassing test 1: TLS certificate strings classify to tls_cert_mismatch
Passing test 2: TLS mismatch emits trust-and-resume recovery hints
Bottom line
Failed test specifics |
…ation Remove overly broad 'tls.*error|ssl.*error' patterns that incorrectly classified generic TLS/SSL transport errors (e.g. connection refused by proxy, unsupported protocol version) as tls_cert_mismatch. Replace with specific certificate-related patterns: 'SSL certificate problem', 'x509: certificate', and 'unknown authority'. Add regression tests proving non-certificate TLS/SSL errors remain classified as 'unknown'. Signed-off-by: John Liu <lijohn@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/validation.ts`:
- Around line 75-80: The sandbox-detection branch short-circuits TLS certificate
error handling: in validate logic where you check /Created sandbox:/i and return
{ kind: "sandbox_create_incomplete", uploadedToGateway: true }, move the
certificate-error regex check (the long regex matching invalid peer
certificate|BadSignature|handshake verification failed|certificate verify
failed|SSL certificate problem|x509: certificate|unknown authority) to run
before the Created sandbox check, or alter the sandbox branch to explicitly
exclude cert-related matches (i.e., only return "sandbox_create_incomplete" when
the text matches Created sandbox and does not match the cert regex); update the
code paths that return { kind: "tls_cert_mismatch", uploadedToGateway } and {
kind: "sandbox_create_incomplete", uploadedToGateway: true } accordingly so cert
failures are detected and returned as tls_cert_mismatch instead of being masked
by sandbox_create_incomplete.
🪄 Autofix (Beta)
❌ Autofix failed (check again to retry)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5c25c6d8-05ed-4b88-95c1-b831a5eea582
📒 Files selected for processing (2)
src/lib/validation.test.tssrc/lib/validation.ts
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. An unexpected error occurred while generating fixes: Not Found - https://docs.github.com/rest/git/refs#get-a-reference |
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. An unexpected error occurred while generating fixes: Not Found - https://docs.github.com/rest/git/refs#get-a-reference |
Move the TLS certificate error regex check before the 'Created sandbox:' branch so that cert failures are never masked by sandbox_create_incomplete when both patterns appear in the output. Add regression test proving cert errors win when output contains both 'Created sandbox:' and a TLS verification failure. Signed-off-by: John Liu <lijohn@nvidia.com>
|
@jneeee please ensure future commits/changes have signed commits. I've administratively merged this change to get this one in. Normally we will be blocked without signed commits. |
## Summary Refreshes the daily docs from NemoClaw commits merged in the past 24 hours and advances the docs metadata from 0.0.29 to 0.0.31, the next version after tag v0.0.30. The updates cover documented behavior gaps found in the merged PRs listed below. ## Related Issue None. ## Changes - `docs/versions1.json` and `docs/project.json`: bump the preferred docs version to `0.0.31` for daily release preparation after latest tag `v0.0.30`. - `docs/reference/commands.md`: document non-interactive Brave Search validation fallback from #2511 / 9bfe30b, missing `--from <Dockerfile>` path validation from #2597 / 7186834, and `logs` reading OpenShell audit events from #2590 / e225dfb. - `docs/inference/use-local-inference.md`: document local inference reachability retry and host-side fallback from #2453 / 9dbe855, plus compatible-endpoint timeout coverage from #2583 / b4ef3db. - `docs/reference/troubleshooting.md`: document source-install shim fallback from #2520 / 01a177c, TLS gateway trust recovery from #1936 / 24725d2, compatible-endpoint timeout coverage from #2583 / b4ef3db, local reachability diagnostics from #2453 / 9dbe855, and host proxy `NO_PROXY` injection from #2662 / b4df07e. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Additional verification: - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --dry-run` passed. - `git diff --check` passed. - Pre-push hooks passed through markdownlint, docs-to-skills, JSON checks, gitleaks, and version sync before `Test (skills YAML)` failed because this fresh worktree lacked `vitest/config`. - `npx prek run --all-files` could not run from the fresh worktree because `npx prek` resolved to a missing `prek@*` package; downloading `@j178/prek` was not approved. - `npm test` could not complete from the fresh worktree because dependencies and compiled `dist/lib/*` artifacts were absent. ## AI Disclosure - [x] AI-assisted — tool: OpenAI Codex --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Version updated to 0.0.31 * Local inference onboarding now includes retry logic for container reachability checks * Web search setup failure handling clarified with fallback guidance * Dockerfile path validation timing documented * Logging behavior clarified for concurrent stream reading * New TLS/certificate troubleshooting section added * Install path and proxy configuration troubleshooting updated <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
…VIDIA#1936) Add "tls_cert_mismatch" classification to classifySandboxCreateFailure() so that TLS/certificate errors (e.g. "invalid peer certificate: BadSignature") are detected instead of falling through to "unknown". When this failure kind is detected, printSandboxCreateRecoveryHints() now tells the user to re-trust the gateway certificate with `openshell gateway trust -g nemoclaw` before resuming onboarding. Closes NVIDIA#1933 <!-- markdownlint-disable MD041 --> ## Summary <!-- 1-3 sentences: what this PR does and why. --> ## Related Issue <!-- Fixes #NNN or Closes #NNN. Remove this section if none. --> ## Changes <!-- Bullet list of key changes. --> ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification <!-- Check each item you ran and confirmed. Leave unchecked items you skipped. --> - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [ ] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure <!-- If an AI agent authored or co-authored this PR, check the box and name the tool. Remove this section for fully human-authored PRs. --> - [ ] AI-assisted — tool: Claude code, Hermes --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: John Liu <lijohn@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved detection and messaging for TLS certificate/handshake mismatches during sandbox creation; users now receive TLS-specific recovery hints (including guidance to trust the gateway and to rerun onboarding with --resume) instead of generic retry messages. * **Tests** * Added tests covering TLS certificate/handshake failure cases and negative scenarios to ensure correct classification and prevent misidentification. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: John Liu <lijohn@nvidia.com> Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com>
## Summary Refreshes the daily docs from NemoClaw commits merged in the past 24 hours and advances the docs metadata from 0.0.29 to 0.0.31, the next version after tag v0.0.30. The updates cover documented behavior gaps found in the merged PRs listed below. ## Related Issue None. ## Changes - `docs/versions1.json` and `docs/project.json`: bump the preferred docs version to `0.0.31` for daily release preparation after latest tag `v0.0.30`. - `docs/reference/commands.md`: document non-interactive Brave Search validation fallback from NVIDIA#2511 / 9bfe30b, missing `--from <Dockerfile>` path validation from NVIDIA#2597 / 7186834, and `logs` reading OpenShell audit events from NVIDIA#2590 / e225dfb. - `docs/inference/use-local-inference.md`: document local inference reachability retry and host-side fallback from NVIDIA#2453 / 9dbe855, plus compatible-endpoint timeout coverage from NVIDIA#2583 / b4ef3db. - `docs/reference/troubleshooting.md`: document source-install shim fallback from NVIDIA#2520 / 01a177c, TLS gateway trust recovery from NVIDIA#1936 / 24725d2, compatible-endpoint timeout coverage from NVIDIA#2583 / b4ef3db, local reachability diagnostics from NVIDIA#2453 / 9dbe855, and host proxy `NO_PROXY` injection from NVIDIA#2662 / b4df07e. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Additional verification: - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --dry-run` passed. - `git diff --check` passed. - Pre-push hooks passed through markdownlint, docs-to-skills, JSON checks, gitleaks, and version sync before `Test (skills YAML)` failed because this fresh worktree lacked `vitest/config`. - `npx prek run --all-files` could not run from the fresh worktree because `npx prek` resolved to a missing `prek@*` package; downloading `@j178/prek` was not approved. - `npm test` could not complete from the fresh worktree because dependencies and compiled `dist/lib/*` artifacts were absent. ## AI Disclosure - [x] AI-assisted — tool: OpenAI Codex --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Version updated to 0.0.31 * Local inference onboarding now includes retry logic for container reachability checks * Web search setup failure handling clarified with fallback guidance * Dockerfile path validation timing documented * Logging behavior clarified for concurrent stream reading * New TLS/certificate troubleshooting section added * Install path and proxy configuration troubleshooting updated <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
|
@brandonpelfrey Thank you! I will ensure that my other commits are in a signed state. |
Add "tls_cert_mismatch" classification to classifySandboxCreateFailure() so that TLS/certificate errors (e.g. "invalid peer certificate: BadSignature") are detected instead of falling through to "unknown".
When this failure kind is detected, printSandboxCreateRecoveryHints() now tells the user to re-trust the gateway certificate with
openshell gateway trust -g nemoclawbefore resuming onboarding.Closes #1933
Summary
Related Issue
Changes
Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)AI Disclosure
Signed-off-by: John Liu lijohn@nvidia.com
Summary by CodeRabbit
Bug Fixes
Tests