Skip to content

fix(onboard): classify TLS certificate errors during sandbox create#1936

Merged
brandonpelfrey merged 5 commits into
NVIDIA:mainfrom
jneeee:fix/1933-tls-cert-error-classification
Apr 29, 2026
Merged

fix(onboard): classify TLS certificate errors during sandbox create#1936
brandonpelfrey merged 5 commits into
NVIDIA:mainfrom
jneeee:fix/1933-tls-cert-error-classification

Conversation

@jneeee

@jneeee jneeee commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Add "tls_cert_mismatch" classification to classifySandboxCreateFailure() so that TLS/certificate errors (e.g. "invalid peer certificate: BadSignature") are detected instead of falling through to "unknown".

When this failure kind is detected, printSandboxCreateRecoveryHints() now tells the user to re-trust the gateway certificate with openshell gateway trust -g nemoclaw before resuming onboarding.

Closes #1933

Summary

Related Issue

Changes

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

AI Disclosure

  • AI-assisted — tool: Claude code, Hermes

Signed-off-by: John Liu lijohn@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection and messaging for TLS certificate/handshake mismatches during sandbox creation; users now receive TLS-specific recovery hints (including guidance to trust the gateway and to rerun onboarding with --resume) instead of generic retry messages.
  • Tests

    • Added tests covering TLS certificate/handshake failure cases and negative scenarios to ensure correct classification and prevent misidentification.

Add "tls_cert_mismatch" classification to classifySandboxCreateFailure()
so that TLS/certificate errors (e.g. "invalid peer certificate:
BadSignature") are detected instead of falling through to "unknown".

When this failure kind is detected, printSandboxCreateRecoveryHints()
now tells the user to re-trust the gateway certificate with
`openshell gateway trust -g nemoclaw` before resuming onboarding.

Closes NVIDIA#1933

Signed-off-by: John Liu <lijohn@nvidia.com>
@coderabbitai

coderabbitai Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: db38ad13-b587-4a41-9c14-1c08b9b79f6c

📥 Commits

Reviewing files that changed from the base of the PR and between 83b966d and 73a8cbc.

📒 Files selected for processing (2)
  • src/lib/validation.test.ts
  • src/lib/validation.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/validation.ts

📝 Walkthrough

Walkthrough

Recognizes TLS certificate/handshake verification failures during sandbox creation as tls_cert_mismatch; adds tests for diverse TLS error message patterns; and prints TLS-specific recovery hints instructing openshell gateway trust -g nemoclaw and to retry onboarding with --resume.

Changes

Cohort / File(s) Summary
Failure Classification Logic
src/lib/validation.ts
Adds "tls_cert_mismatch" to SandboxCreateFailure.kind and updates classifySandboxCreateFailure() to detect TLS/certificate/handshake error text (e.g., invalid peer certificate, BadSignature, certificate verify failed) returning { kind: "tls_cert_mismatch", uploadedToGateway }.
Classification Tests
src/lib/validation.test.ts
Adds test cases asserting various TLS certificate/handshake error message variants map to tls_cert_mismatch, includes negative cases for unrelated TLS/transport errors remaining unknown, and a case with Created sandbox: present.
Recovery Hints
src/lib/build-context.ts
Updates printSandboxCreateRecoveryHints() to handle failure.kind === "tls_cert_mismatch" by printing TLS-specific hint/fix (openshell gateway trust -g nemoclaw) and instructing to rerun onboard with --resume, then returning early.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nosed the broken cert one night,
a BadSignature gave me a fright.
Trust the gateway, hop this plea:
openshell gateway trust -g nemoclaw — then resume with glee.
🥕✨

🚥 Pre-merge checks | ✅ 3 | ❌ 3

❌ Failed checks (3 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR addresses detection of TLS certificate errors but the implementation includes overly broad pattern matching that misclassifies non-certificate TLS/SSL errors as tls_cert_mismatch. Narrow the TLS certificate detection patterns to be certificate-specific and add regression tests ensuring generic TLS/SSL errors remain classified as unknown.
Description check ⚠️ Warning The PR lacks detailed description of the implementation, the pattern matching approach, and the known false-positive issue flagged in the automated review. Add a description explaining the pattern matching strategy, acknowledge the false-positive concern, and clarify which patterns are specifically for certificate errors.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding TLS certificate error classification to the sandbox create failure handling.
Out of Scope Changes check ✅ Passed All changes are directly related to detecting and handling TLS certificate errors during sandbox creation as specified in issue #1933.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/validation.ts (1)

69-71: Tighten TLS pattern to avoid false cert-mismatch classification.

Line 69 currently matches any tls.*error / ssl.*error, which can include non-certificate TLS failures and trigger a misleading re-trust hint. Consider narrowing this branch to certificate/trust-specific phrases.

♻️ Proposed regex refinement
-  if (/invalid peer certificate|BadSignature|handshake verification failed|certificate verify failed|tls.*error|ssl.*error/i.test(text)) {
+  if (
+    /invalid peer certificate|BadSignature|handshake verification failed|certificate verify failed|SSL certificate problem|x509: certificate|unknown authority/i.test(
+      text,
+    )
+  ) {
     return { kind: "tls_cert_mismatch", uploadedToGateway };
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/validation.ts` around lines 69 - 71, The current regex in the TLS
branch is too broad and matches generic "tls.*error" / "ssl.*error", causing
non-cert TLS failures to be classified as kind: "tls_cert_mismatch"; update the
pattern used in the conditional that returns { kind: "tls_cert_mismatch",
uploadedToGateway } to only match certificate/trust-specific phrases (e.g.,
include words like "certificate", "cert", "x509", "trust", "verify", "handshake
verification failed", "certificate verify failed", "BadSignature" etc.) and
remove the generic tls.*error and ssl.*error alternatives so only true
certificate/verification errors trigger this branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/lib/validation.ts`:
- Around line 69-71: The current regex in the TLS branch is too broad and
matches generic "tls.*error" / "ssl.*error", causing non-cert TLS failures to be
classified as kind: "tls_cert_mismatch"; update the pattern used in the
conditional that returns { kind: "tls_cert_mismatch", uploadedToGateway } to
only match certificate/trust-specific phrases (e.g., include words like
"certificate", "cert", "x509", "trust", "verify", "handshake verification
failed", "certificate verify failed", "BadSignature" etc.) and remove the
generic tls.*error and ssl.*error alternatives so only true
certificate/verification errors trigger this branch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 438ae893-fad6-422d-a03b-9f791a791be3

📥 Commits

Reviewing files that changed from the base of the PR and between bf99008 and b3774f2.

📒 Files selected for processing (3)
  • src/lib/build-context.ts
  • src/lib/validation.test.ts
  • src/lib/validation.ts

@wscurran wscurran added bug Something fails against expected or documented behavior Getting Started labels Apr 16, 2026
@wscurran

Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this pull request, which proposes a way to fix the issue with TLS certificate errors not being properly detected and handled during onboarding. This change could help improve the security and reliability of the onboarding process.


Possibly related open issues:

@brandonpelfrey

Copy link
Copy Markdown
Collaborator

Automated PR review summary

Reviewed PR #1936: fix(onboard): classify TLS certificate errors during sandbox create

Recommendation

  • Recommendation: Requires Changes
  • Highest observed severity: medium
  • Block merge: yes
  • Why: The highest-risk issue is misleading recovery guidance: arbitrary TLS/SSL errors such as proxy refusal or unsupported protocol are now classified as gateway certificate mismatches, causing users to run certificate trust remediation for the wrong problem. That weakens diagnosability of onboarding failures and makes the new classification less trustworthy.
  • Reviewer summary: Reviewed the installed PR behavior with targeted Node/runtime probes after re-establishing sandbox access. Intended TLS certificate strings classify correctly and trigger the new trust-and-resume hint, but adversarial inputs show the added tls.*error|ssl.*error regex incorrectly labels unrelated TLS/SSL transport/protocol errors as tls_cert_mismatch, which would mislead users during onboarding.

Installation and setup findings

  • Installation completed successfully from the local repository source. I used the root installer with local-repo environment overrides, passed the NVIDIA build key through non-interactively for the build provider, let onboarding create the my-assistant sandbox, then verified it with nemoclaw list/status, openshell sandbox ssh-config, an SSH exec returning 2+2=4, and an in-sandbox openclaw agent probe returning 4.

What was validated

  • The PR revision was checked out in an isolated review environment.
  • The local checkout was installed using the repository installer flow as closely as the environment allowed.
  • Adversarial, PR-specific probes were then run against the installed environment and relevant repository context.
  • Diff summary:
 .../SKILL.md                                       |   91 -
 .../scripts/normalize-title-tags.ts                |  297 --
 .agents/skills/nemoclaw-skills-guide/SKILL.md      |    7 +-
 .agents/skills/nemoclaw-user-agent-skills/SKILL.md |   13 -
 .../nemoclaw-user-configure-inference/SKILL.md     |  221 +-
 .../references/inference-options.md                |   28 +-
 .../nemoclaw-user-configure-security/SKILL.md      |  148 +-
 .../references/best-practices.md                   |   16 +-
 .../references/credential-storage.md               |    2 +-
 .../references/openclaw-controls.md                |    4 +-
 .../skills/nemoclaw-user-deploy-remote/SKILL.md    |   51 +-
 .agents/skills/nemoclaw-user-get-started/SKILL.md  |  327 +-
 .../references/prerequisites.md                    |   52 -
 .../{windows-preparation.md => windows-setup.md}   |   12 +-
 .../skills/nemoclaw-user-manage-policy/SKILL.md    |  170 +-
 .../skills/nemoclaw-user-monitor-sandbox/SKILL.md  |   14 +-
 .agents/skills/nemoclaw-user-overview/SKILL.md     |  202 +-
 .../nemoclaw-user-overview/references/ecosystem.md |    6 +-
 .../references/how-it-works.md                     |   12 +-
 .../nemoclaw-user-overvi
...[truncated]

Failing tests and unresolved impact

Failing test 1: Broad tls/ssl regex overmatches non-certificate failures

  • What was tested: The new matcher only catches certificate trust problems, not arbitrary TLS/SSL transport or protocol errors.
  • Why it matters: If false, onboarding can recommend certificate re-trust for unrelated failures, misleading users and obscuring the real cause.
  • Observed result: TLS error: connection refused by proxy and SSL error: unsupported protocol version both classified as tls_cert_mismatch, showing false positives from the broad tls.*error|ssl.*error pattern. See below.
  • Command: node /tmp/pr1936-probe2.mjs
  • Recommended follow-up coverage: Add regression tests proving non-certificate TLS/SSL errors stay unknown or map to a separate failure kind; also cover uploaded-to-gateway text variants if those matter operationally.

Passing tests and why they mattered

Passing test 1: TLS certificate strings classify to tls_cert_mismatch

  • What was tested: Representative TLS verification failures are now classified as tls_cert_mismatch instead of unknown.
  • Why it mattered: If false, users still receive generic onboarding recovery for gateway certificate changes and the PR's main fix is ineffective.
  • Observed result: Installed runtime returned tls_cert_mismatch for invalid peer certificate: BadSignature, handshake verification failed, and certificate verify failed.
  • Command: node /tmp/pr1936-probe.mjs
  • Recommended follow-up coverage: Keep regression coverage for these exact user-observed strings because classification depends on fragile substring matching.

Passing test 2: TLS mismatch emits trust-and-resume recovery hints

  • What was tested: When classification returns tls_cert_mismatch, recovery output tells users to re-trust the gateway certificate and resume onboarding.
  • Why it mattered: If false, the new classification would not improve the real user recovery path.
  • Observed result: TLS case printed openshell gateway trust -g nemoclaw and nemoclaw onboard --resume; unknown case still printed generic recovery text.
  • Command: node /tmp/pr1936-probe.mjs
  • Recommended follow-up coverage: Add or retain an integration/regression test for the printed hint text since this is the observable UX contract of the PR.

Bottom line

  • Based on the install evidence and adversarial probes, this PR should not be approved as-is.

Failed test specifics

$ export PATH="$HOME/.local/bin:$HOME/.npm-global/bin:$PATH"; cd /workspace/nemoclaw && cat > /tmp/pr1936-probe2.mjs <<'EOF'
import { classifySandboxCreateFailure } from 'file:///workspace/nemoclaw/dist/lib/validation.js';
const samples = [
  'TLS error: connection refused by proxy',
  'SSL error: unsupported protocol version',
  'Uploaded sandbox image to gateway successfully. TLS error later during notify',
  'uploaded sandbox image to gateway\nssl error: peer closed connection'
];
for (const s of samples) {
  const c = classifySandboxCreateFailure(s);
  console.log(JSON.stringify({ input: s, kind: c.kind, uploadedToGateway: c.uploadedToGateway }));
}
EOF
node /tmp/pr1936-probe2.mjs > /tmp/pr1936-probe2.out 2>&1; tail -n 80 /tmp/pr1936-probe2.out

{"input":"TLS error: connection refused by proxy","kind":"tls_cert_mismatch","uploadedToGateway":false}
{"input":"SSL error: unsupported protocol version","kind":"tls_cert_mismatch","uploadedToGateway":false}
{"input":"Uploaded sandbox image to gateway successfully. TLS error later during notify","kind":"tls_cert_mismatch","uploadedToGateway":false}
{"input":"uploaded sandbox image to gateway\nssl error: peer closed connection","kind":"tls_cert_mismatch","uploadedToGateway":false}

…ation

Remove overly broad 'tls.*error|ssl.*error' patterns that incorrectly
classified generic TLS/SSL transport errors (e.g. connection refused by
proxy, unsupported protocol version) as tls_cert_mismatch.

Replace with specific certificate-related patterns: 'SSL certificate
problem', 'x509: certificate', and 'unknown authority'.

Add regression tests proving non-certificate TLS/SSL errors remain
classified as 'unknown'.

Signed-off-by: John Liu <lijohn@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Apr 29, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/validation.ts`:
- Around line 75-80: The sandbox-detection branch short-circuits TLS certificate
error handling: in validate logic where you check /Created sandbox:/i and return
{ kind: "sandbox_create_incomplete", uploadedToGateway: true }, move the
certificate-error regex check (the long regex matching invalid peer
certificate|BadSignature|handshake verification failed|certificate verify
failed|SSL certificate problem|x509: certificate|unknown authority) to run
before the Created sandbox check, or alter the sandbox branch to explicitly
exclude cert-related matches (i.e., only return "sandbox_create_incomplete" when
the text matches Created sandbox and does not match the cert regex); update the
code paths that return { kind: "tls_cert_mismatch", uploadedToGateway } and {
kind: "sandbox_create_incomplete", uploadedToGateway: true } accordingly so cert
failures are detected and returned as tls_cert_mismatch instead of being masked
by sandbox_create_incomplete.
🪄 Autofix (Beta)

❌ Autofix failed (check again to retry)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c25c6d8-05ed-4b88-95c1-b831a5eea582

📥 Commits

Reviewing files that changed from the base of the PR and between 91a7860 and 83b966d.

📒 Files selected for processing (2)
  • src/lib/validation.test.ts
  • src/lib/validation.ts

Comment thread src/lib/validation.ts Outdated
@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

An unexpected error occurred while generating fixes: Not Found - https://docs.github.com/rest/git/refs#get-a-reference

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

An unexpected error occurred while generating fixes: Not Found - https://docs.github.com/rest/git/refs#get-a-reference

Move the TLS certificate error regex check before the 'Created sandbox:'
branch so that cert failures are never masked by sandbox_create_incomplete
when both patterns appear in the output.

Add regression test proving cert errors win when output contains both
'Created sandbox:' and a TLS verification failure.

Signed-off-by: John Liu <lijohn@nvidia.com>
@brandonpelfrey brandonpelfrey enabled auto-merge (squash) April 29, 2026 19:31
@brandonpelfrey brandonpelfrey merged commit 24725d2 into NVIDIA:main Apr 29, 2026
11 checks passed
@brandonpelfrey

Copy link
Copy Markdown
Collaborator

@jneeee please ensure future commits/changes have signed commits. I've administratively merged this change to get this one in. Normally we will be blocked without signed commits.

@miyoungc miyoungc mentioned this pull request Apr 30, 2026
13 tasks
miyoungc added a commit that referenced this pull request Apr 30, 2026
## Summary
Refreshes the daily docs from NemoClaw commits merged in the past 24
hours and advances the docs metadata from 0.0.29 to 0.0.31, the next
version after tag v0.0.30.
The updates cover documented behavior gaps found in the merged PRs
listed below.

## Related Issue
None.

## Changes
- `docs/versions1.json` and `docs/project.json`: bump the preferred docs
version to `0.0.31` for daily release preparation after latest tag
`v0.0.30`.
- `docs/reference/commands.md`: document non-interactive Brave Search
validation fallback from #2511 / 9bfe30b, missing `--from <Dockerfile>`
path validation from #2597 / 7186834, and `logs` reading OpenShell
audit events from #2590 / e225dfb.
- `docs/inference/use-local-inference.md`: document local inference
reachability retry and host-side fallback from #2453 / 9dbe855, plus
compatible-endpoint timeout coverage from #2583 / b4ef3db.
- `docs/reference/troubleshooting.md`: document source-install shim
fallback from #2520 / 01a177c, TLS gateway trust recovery from #1936 /
24725d2, compatible-endpoint timeout coverage from #2583 / b4ef3db,
local reachability diagnostics from #2453 / 9dbe855, and host proxy
`NO_PROXY` injection from #2662 / b4df07e.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Additional verification:
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --dry-run` passed.
- `git diff --check` passed.
- Pre-push hooks passed through markdownlint, docs-to-skills, JSON
checks, gitleaks, and version sync before `Test (skills YAML)` failed
because this fresh worktree lacked `vitest/config`.
- `npx prek run --all-files` could not run from the fresh worktree
because `npx prek` resolved to a missing `prek@*` package; downloading
`@j178/prek` was not approved.
- `npm test` could not complete from the fresh worktree because
dependencies and compiled `dist/lib/*` artifacts were absent.

## AI Disclosure
- [x] AI-assisted — tool: OpenAI Codex

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
  * Version updated to 0.0.31
* Local inference onboarding now includes retry logic for container
reachability checks
  * Web search setup failure handling clarified with fallback guidance
  * Dockerfile path validation timing documented
  * Logging behavior clarified for concurrent stream reading
  * New TLS/certificate troubleshooting section added
  * Install path and proxy configuration troubleshooting updated

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…VIDIA#1936)

Add "tls_cert_mismatch" classification to classifySandboxCreateFailure()
so that TLS/certificate errors (e.g. "invalid peer certificate:
BadSignature") are detected instead of falling through to "unknown".

When this failure kind is detected, printSandboxCreateRecoveryHints()
now tells the user to re-trust the gateway certificate with `openshell
gateway trust -g nemoclaw` before resuming onboarding.

Closes NVIDIA#1933

<!-- markdownlint-disable MD041 -->
## Summary
<!-- 1-3 sentences: what this PR does and why. -->

## Related Issue
<!-- Fixes #NNN or Closes #NNN. Remove this section if none. -->

## Changes
<!-- Bullet list of key changes. -->

## Type of Change

- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
<!-- Check each item you ran and confirmed. Leave unchecked items you
skipped. -->
- [x] `npx prek run --all-files` passes
- [x] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [ ] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

## AI Disclosure
<!-- If an AI agent authored or co-authored this PR, check the box and
name the tool. Remove this section for fully human-authored PRs. -->
- [ ] AI-assisted — tool: Claude code, Hermes

---
<!-- DCO sign-off required by CI. Run: git config user.name && git
config user.email -->
Signed-off-by: John Liu <lijohn@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Improved detection and messaging for TLS certificate/handshake
mismatches during sandbox creation; users now receive TLS-specific
recovery hints (including guidance to trust the gateway and to rerun
onboarding with --resume) instead of generic retry messages.

* **Tests**
* Added tests covering TLS certificate/handshake failure cases and
negative scenarios to ensure correct classification and prevent
misidentification.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: John Liu <lijohn@nvidia.com>
Co-authored-by: Brandon Pelfrey <bpelfrey@nvidia.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
## Summary
Refreshes the daily docs from NemoClaw commits merged in the past 24
hours and advances the docs metadata from 0.0.29 to 0.0.31, the next
version after tag v0.0.30.
The updates cover documented behavior gaps found in the merged PRs
listed below.

## Related Issue
None.

## Changes
- `docs/versions1.json` and `docs/project.json`: bump the preferred docs
version to `0.0.31` for daily release preparation after latest tag
`v0.0.30`.
- `docs/reference/commands.md`: document non-interactive Brave Search
validation fallback from NVIDIA#2511 / 9bfe30b, missing `--from <Dockerfile>`
path validation from NVIDIA#2597 / 7186834, and `logs` reading OpenShell
audit events from NVIDIA#2590 / e225dfb.
- `docs/inference/use-local-inference.md`: document local inference
reachability retry and host-side fallback from NVIDIA#2453 / 9dbe855, plus
compatible-endpoint timeout coverage from NVIDIA#2583 / b4ef3db.
- `docs/reference/troubleshooting.md`: document source-install shim
fallback from NVIDIA#2520 / 01a177c, TLS gateway trust recovery from NVIDIA#1936 /
24725d2, compatible-endpoint timeout coverage from NVIDIA#2583 / b4ef3db,
local reachability diagnostics from NVIDIA#2453 / 9dbe855, and host proxy
`NO_PROXY` injection from NVIDIA#2662 / b4df07e.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Additional verification:
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --dry-run` passed.
- `git diff --check` passed.
- Pre-push hooks passed through markdownlint, docs-to-skills, JSON
checks, gitleaks, and version sync before `Test (skills YAML)` failed
because this fresh worktree lacked `vitest/config`.
- `npx prek run --all-files` could not run from the fresh worktree
because `npx prek` resolved to a missing `prek@*` package; downloading
`@j178/prek` was not approved.
- `npm test` could not complete from the fresh worktree because
dependencies and compiled `dist/lib/*` artifacts were absent.

## AI Disclosure
- [x] AI-assisted — tool: OpenAI Codex

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
  * Version updated to 0.0.31
* Local inference onboarding now includes retry logic for container
reachability checks
  * Web search setup failure handling clarified with fallback guidance
  * Dockerfile path validation timing documented
  * Logging behavior clarified for concurrent stream reading
  * New TLS/certificate troubleshooting section added
  * Install path and proxy configuration troubleshooting updated

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
@jneeee

jneeee commented Apr 30, 2026

Copy link
Copy Markdown
Contributor Author

@brandonpelfrey Thank you! I will ensure that my other commits are in a signed state.

@jneeee jneeee deleted the fix/1933-tls-cert-error-classification branch May 21, 2026 05:20
@wscurran wscurran added area: install Install, setup, prerequisites, or uninstall flow area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow bug-fix PR fixes a bug or regression and removed Getting Started labels Jun 3, 2026
@wscurran wscurran removed the bug Something fails against expected or documented behavior label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: install Install, setup, prerequisites, or uninstall flow area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(onboard): Onboard does not detect or recover from TLS cert errors during sandbox create

3 participants