fix(cli): handle local gateway doctor and large backups#4912
Conversation
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
|
Warning Review limit reached
More reviews will be available in 14 minutes and 3 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a host-level gateway inspection fallback (pgrep + ss) used when Docker inspect fails, moves host command capture into its own module, exports the gateway pgrep pattern, and increases tar-listing subprocess buffer to 256 MiB with 20k-entry regression tests. ChangesHost Gateway Process Detection Fallback
Tar Archive Buffer Management
Sequence Diagram(s)sequenceDiagram
participant DoctorCLI as Doctor
participant DockerInspect as DockerInspect
participant OpenShell as OpenShell
participant Pgrep as pgrep
participant SS as ss
participant DoctorReport as DoctorReport
DoctorCLI->>DockerInspect: inspect legacy gateway container
DockerInspect-->>DoctorCLI: fails (status !== 0)
DoctorCLI->>OpenShell: query named gateway connected?
OpenShell-->>DoctorCLI: namedGatewayConnected
DoctorCLI->>Pgrep: pgrep HOST_GATEWAY_PGREP_PATTERN
Pgrep-->>DoctorCLI: process found / not found
DoctorCLI->>SS: ss check GATEWAY_PORT
SS-->>DoctorCLI: port listening / not listening
DoctorCLI->>DoctorReport: emit Local gateway process ok / info / skipped checks
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 1 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/actions/sandbox/doctor.ts (1)
206-210: ⚖️ Poor tradeoffConsider checking pgrep/ss availability before relying on them.
While the current implementation gracefully falls through to the existing Docker container failure when pgrep or ss fail, the user experience could be improved by detecting when these tools are unavailable versus when the gateway is genuinely not running. However, since this is a fallback path that only runs when Docker inspect fails, and the existing behavior (reporting Docker container failure) is acceptable, this is a low-priority enhancement.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/actions/sandbox/doctor.ts` around lines 206 - 210, The current fallback uses captureHostCommand("pgrep", ...) and captureHostCommand("ss", ...) without verifying those binaries exist; update the logic in the diagnostic block that computes processCheck/portCheck (references: captureHostCommand, HOST_GATEWAY_PGREP_PATTERN, GATEWAY_PORT, processCheck, portCheck, processRunning, portListening) to first detect missing/failed utilities and treat that as "tool unavailable" rather than "gateway not running" — e.g., after calling captureHostCommand inspect processCheck.status and processCheck.stderr (and same for portCheck) for command-not-found or permission errors and set separate flags like pgrepAvailable/ssAvailable; if a tool is unavailable, surface a clearer message or skip that check when deciding processRunning/portListening so the code can fall back to Docker diagnostics or produce a specific "pgrep/ss unavailable" hint to the user.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/actions/sandbox/doctor.ts`:
- Around line 206-210: The current fallback uses captureHostCommand("pgrep",
...) and captureHostCommand("ss", ...) without verifying those binaries exist;
update the logic in the diagnostic block that computes processCheck/portCheck
(references: captureHostCommand, HOST_GATEWAY_PGREP_PATTERN, GATEWAY_PORT,
processCheck, portCheck, processRunning, portListening) to first detect
missing/failed utilities and treat that as "tool unavailable" rather than
"gateway not running" — e.g., after calling captureHostCommand inspect
processCheck.status and processCheck.stderr (and same for portCheck) for
command-not-found or permission errors and set separate flags like
pgrepAvailable/ssAvailable; if a tool is unavailable, surface a clearer message
or skip that check when deciding processRunning/portListening so the code can
fall back to Docker diagnostics or produce a specific "pgrep/ss unavailable"
hint to the user.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 90f8d5ba-5f48-4ab8-baed-0b8eea486a98
📒 Files selected for processing (5)
src/lib/actions/sandbox/doctor.tssrc/lib/onboard/host-gateway-process.tssrc/lib/state/sandbox.tstest/cli/doctor-gateway-token.test.tstest/security-sandbox-tar-traversal.test.ts
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/cli/doctor-gateway-token.test.ts (1)
87-90: 💤 Low valueConsider adding a brief comment explaining the purpose.
The
parseJsonPrefixfunction handles doctor output that may have trailing content (e.g., stderr) after the JSON. A one-line comment would clarify this intent for future maintainers.📝 Suggested comment
+// Extracts JSON from output that may have trailing stderr after the final closing brace. function parseJsonPrefix<T>(output: string): T { const end = output.lastIndexOf("\n}"); return JSON.parse(end >= 0 ? output.slice(0, end + 2) : output) as T; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/cli/doctor-gateway-token.test.ts` around lines 87 - 90, The helper function parseJsonPrefix silently trims trailing non-JSON content from command output before parsing; add a one-line comment above parseJsonPrefix explaining it extracts the JSON prefix from CLI output (handles cases where the CLI appends extra stdout/stderr after the JSON, so we search for the last "\n}" and parse up to that) to clarify intent for future maintainers and reference parseJsonPrefix in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/cli/doctor-gateway-token.test.ts`:
- Around line 87-90: The helper function parseJsonPrefix silently trims trailing
non-JSON content from command output before parsing; add a one-line comment
above parseJsonPrefix explaining it extracts the JSON prefix from CLI output
(handles cases where the CLI appends extra stdout/stderr after the JSON, so we
search for the last "\n}" and parse up to that) to clarify intent for future
maintainers and reference parseJsonPrefix in the comment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 17f14f53-a1f5-4085-9b38-cb875ea52191
📒 Files selected for processing (5)
src/lib/actions/sandbox/doctor.tssrc/lib/onboard/host-gateway-process.tssrc/lib/state/sandbox.tstest/cli/doctor-gateway-token.test.tstest/security-sandbox-tar-traversal.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- src/lib/onboard/host-gateway-process.ts
- src/lib/state/sandbox.ts
- src/lib/actions/sandbox/doctor.ts
Selective E2E Results — ✅ All requested jobs passedRun: 27096769346
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/actions/sandbox/doctor-host-command.ts`:
- Line 27: The return logic for status in doctor-host-command.ts uses
result.status ?? (result.error ? 1 : 0) which treats a signal-terminated process
(result.signal set, result.status null) as success; update the expression to
consider result.signal as a failure too (e.g., set non-zero when result.signal
is present) by using result.status ?? (result.signal ? 1 : (result.error ? 1 :
0)), ensuring you reference the same result.status, result.signal and
result.error identifiers and the returned status field in the function that
builds the probe result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5729251b-9533-42e6-9ac2-69ef006cf912
📒 Files selected for processing (4)
src/lib/actions/sandbox/doctor-gateway-fallback.tssrc/lib/actions/sandbox/doctor-host-command.tssrc/lib/actions/sandbox/doctor.tstest/cli/doctor-gateway-token.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- test/cli/doctor-gateway-token.test.ts
- src/lib/actions/sandbox/doctor.ts
Selective E2E Results — ✅ All requested jobs passedRun: 27097579627
|
Selective E2E Results — ✅ All requested jobs passedRun: 27097993741
|
Selective E2E Results — ✅ All requested jobs passedRun: 27098301659
|
Summary
Salvages the intended fixes from #4583 on a clean branch.
nemoclaw doctornow accepts a verified localopenshell-gatewayprocess when legacy container inspection fails, and tar listing validation has enough buffer headroom for large sandbox state backups.Changes
doctorand report a local gateway as healthy only when both the process and gateway port are present.spawnSyncbuffer used by path validation and hard-link rejection.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit
New Features
Tests