fix(preflight): detect docker group permission issue and suggest usermod#1614
Conversation
When Docker is installed and the systemd service is active but the daemon is unreachable, the most likely cause is a permissions/group issue rather than a stopped service. Detect this case and suggest `sudo usermod -aG docker $USER && newgrp docker` instead of the generic "Start Docker" remediation. Closes #50 Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughPreflight logic now differentiates "Docker installed but daemon unreachable": if the Docker service is active/enabled, the remediation plan proposes adding the user to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Preflight
participant systemctl
participant DockerDaemon
User->>Preflight: run "nemoclaw onboard" preflight
Preflight->>systemctl: check is-active/is-enabled docker
systemctl-->>Preflight: active/enabled or inactive/disabled
Preflight->>DockerDaemon: attempt docker daemon ping
alt daemon reachable
DockerDaemon-->>Preflight: reachable
Preflight-->>User: continue onboarding
else daemon unreachable
alt service active/enabled
Preflight-->>User: planAction(docker_group_permission) with sudo usermod -aG docker $USER, newgrp/docker re-login, then onboard
else service inactive/disabled
Preflight-->>User: planAction(start_docker) with sudo systemctl start/enable docker
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsTimed out fetching pipeline failures after 30000ms Comment |
The install-preflight test that simulates Docker-installed-but-unreachable was picking up the real systemctl on CI, which reported Docker as active. This triggered the new docker-group remediation instead of the expected "Start Docker" path. Add a systemctl stub that reports docker as inactive so the test exercises the correct code path. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/preflight.ts`:
- Around line 365-372: The RemediationAction for "docker_group_permission" in
src/lib/preflight.ts relies on shell state between two commands but there is no
executor yet; update the remediation design so the executor runs these steps
atomically: either change the RemediationAction for kind "sudo" / id
"docker_group_permission" to use a single shell invocation combining "sudo
usermod -aG docker $USER && newgrp docker && nemoclaw onboard", or split into
two distinct actions (e.g., "add_user_to_docker_group" and a separate
"run_onboard_after_relogin") and include an explicit manual-followup note so the
second step isn’t expected to inherit group changes. Ensure the executor (when
implemented) documents that RemediationAction.commands may be executed in a
single shell when order/state must be preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 80cfd749-8c8e-4066-8642-3e0a7f4036ae
📒 Files selected for processing (2)
src/lib/preflight.test.tssrc/lib/preflight.ts
| kind: "sudo", | ||
| reason: | ||
| "Docker is installed and the service is running, but the current user cannot reach the daemon. " + | ||
| "This usually means your user is not in the docker group.", | ||
| commands: [ | ||
| "sudo usermod -aG docker $USER && newgrp docker", | ||
| "nemoclaw onboard", | ||
| ], |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== RemediationAction / RemediationKind definitions ==="
rg -n -C2 '\binterface\s+RemediationAction\b|\btype\s+RemediationKind\b' -t ts
echo
echo "=== Producers of the docker-group remediation ==="
rg -n -C3 'docker_group_permission|planHostRemediation\(' -t ts
echo
echo "=== Consumers of action.kind and action.commands ==="
rg -n -C4 '\.kind\b|case\s+"sudo"|===\s*"sudo"|\.commands\b' -t ts -t tsx
echo
echo "=== Potential process/shell executors ==="
rg -n -C4 'spawn\(|exec\(|execFile\(|execa\(|runCapture\(' -t ts -t tsxRepository: NVIDIA/NemoClaw
Length of output: 4067
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Complete RemediationAction interface ==="
sed -n '109,130p' src/lib/preflight.ts
echo
echo "=== Complete docker_group_permission action definition ==="
sed -n '362,373p' src/lib/preflight.ts
echo
echo "=== Consumers of action.kind and action.commands ==="
rg -n -A5 'action\.kind|action\.commands|case.*sudo' -t ts
echo
echo "=== Executors / how commands are run ==="
rg -n -B2 -A6 'spawn\(|execFile\(|execa\(' -t tsRepository: NVIDIA/NemoClaw
Length of output: 1280
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Search for executor/runner code handling RemediationAction ==="
rg -n 'function.*executeAction|function.*runAction|function.*runRemediatio' -t ts
echo
echo "=== Look for where commands are iterated/executed ==="
rg -n 'for.*commands|commands\.forEach|\.map.*command' -t ts | head -30
echo
echo "=== Test expectations for docker_group_permission execution ==="
sed -n '378,410p' src/lib/preflight.test.ts
echo
echo "=== Search main entry point or command execution ==="
rg -n 'main\(|async run\(' -t ts | head -20Repository: NVIDIA/NemoClaw
Length of output: 125
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== All references to RemediationAction in codebase ==="
rg -n 'RemediationAction' -t ts
echo
echo "=== Full test case for docker_group_permission ==="
sed -n '378,420p' src/lib/preflight.test.ts
echo
echo "=== Check for any execution/spawn in bin or CLI ==="
find . -name 'bin' -o -name 'cli' -o -name 'cmd' -type d 2>/dev/null | head -5
echo
echo "=== List all TypeScript files to understand structure ==="
git ls-files --cached | grep -E '\.ts$' | head -20Repository: NVIDIA/NemoClaw
Length of output: 2638
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Contents of bin directory ==="
ls -la bin/ 2>/dev/null || echo "No bin directory"
echo
echo "=== Check for any execution/command running logic ==="
rg -n 'spawn|execSync|exec\(|execa' -t ts | head -20
echo
echo "=== Search for TODO or FIXME related to remediation ==="
rg -n 'TODO|FIXME' src/lib/preflight.ts | grep -i 'remedi\|command\|action' || echo "No relevant TODOs"
echo
echo "=== Check imports and exports in preflight.ts ==="
head -50 src/lib/preflight.ts | grep -E '^import|^export'
echo
echo "=== Check if planHostRemediation is exported and used elsewhere ==="
rg -n 'planHostRemediation' -t tsRepository: NVIDIA/NemoClaw
Length of output: 3049
Clarify the remediation action design before implementing the executor.
Currently, no executor code exists for RemediationAction. When the executor is implemented, the docker_group_permission action needs careful handling: the first command (sudo usermod -aG docker $USER && newgrp docker) depends on shell state that the second command (nemoclaw onboard) must inherit. If the executor spawns commands separately (one per array element), the second command will lose the group membership refresh. Consider either combining both steps into a single shell invocation or splitting them into separate actions with explicit manual follow-up instructions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/lib/preflight.ts` around lines 365 - 372, The RemediationAction for
"docker_group_permission" in src/lib/preflight.ts relies on shell state between
two commands but there is no executor yet; update the remediation design so the
executor runs these steps atomically: either change the RemediationAction for
kind "sudo" / id "docker_group_permission" to use a single shell invocation
combining "sudo usermod -aG docker $USER && newgrp docker && nemoclaw onboard",
or split into two distinct actions (e.g., "add_user_to_docker_group" and a
separate "run_onboard_after_relogin") and include an explicit manual-followup
note so the second step isn’t expected to inherit group changes. Ensure the
executor (when implemented) documents that RemediationAction.commands may be
executed in a single shell when order/state must be preserved.
Address CodeRabbit review: newgrp replaces the current shell, so combining it with usermod in a single && chain would prevent any subsequent command from running. Split into three discrete steps so each is clear when displayed as manual instructions. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…A/NemoClaw into fix/docker-group-detection
kjw3
left a comment
There was a problem hiding this comment.
CodeRabbit call out looks worth addressing, but otherwise this looks sound
## Summary - Document `nemoclaw credentials list` and `nemoclaw credentials reset` commands in commands reference (#1597) - Add `--dry-run` flag documentation for `policy-add` (#1276) - Update policy presets table: remove `docker` (#1647), add `brave` and `brew`, update HuggingFace endpoint (#1540) - Document `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` env var for local providers (#1620) - Document `NEMOCLAW_PROXY_HOST`/`NEMOCLAW_PROXY_PORT` env vars (#1563) - Add troubleshooting entries for Docker group permissions (#1614), sandbox survival after gateway restart (#1587), and proxy configuration - Regenerate `nemoclaw-user-*` skills from updated docs ## Test plan - [x] `make docs` builds without warnings - [x] All pre-commit and pre-push hooks pass - [ ] Verify rendered pages in docs site preview 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `nemoclaw credentials list` command to display stored credential names * Added `nemoclaw credentials reset <KEY>` command with `--yes` flag to remove credentials * Added `--dry-run` flag for policy-add to preview endpoint changes * New policy presets: `brave` and `brew` * New configuration options: `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT`, `NEMOCLAW_PROXY_HOST`, and `NEMOCLAW_PROXY_PORT` * **Documentation** * Expanded troubleshooting guides for Docker permissions, sandbox connectivity, local inference timeouts, and proxy configuration <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mod (NVIDIA#1614) ## Summary - When Docker is installed and the systemd service is active but the daemon is unreachable, suggest `sudo usermod -aG docker $USER && newgrp docker` instead of the generic "Start Docker" remediation - Adds a new `docker_group_permission` remediation action (kind: `sudo`, blocking) for this case - Existing `start_docker` path remains for when the service is inactive ## Test plan - [x] New test: `suggests usermod when docker service is active but daemon is unreachable` - [x] Existing test renamed and still passes: `recommends starting docker when installed but unreachable and service inactive` - [x] `make check` passes (all linters + tests) - [x] Full `vitest run` passes (1178 tests, 0 failures) Closes NVIDIA#50 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Expanded and refined tests to cover Docker unreachable scenarios (service inactive vs. active-but-unreachable) and added a test stub to enforce expected remediation paths. * **Bug Fixes** * Remediation updated: when Docker is installed and the service is active but the daemon is unreachable on Linux, recommend adjusting Docker group permissions (with re-login) instead of always proposing to start the service. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
## Summary - Document `nemoclaw credentials list` and `nemoclaw credentials reset` commands in commands reference (NVIDIA#1597) - Add `--dry-run` flag documentation for `policy-add` (NVIDIA#1276) - Update policy presets table: remove `docker` (NVIDIA#1647), add `brave` and `brew`, update HuggingFace endpoint (NVIDIA#1540) - Document `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` env var for local providers (NVIDIA#1620) - Document `NEMOCLAW_PROXY_HOST`/`NEMOCLAW_PROXY_PORT` env vars (NVIDIA#1563) - Add troubleshooting entries for Docker group permissions (NVIDIA#1614), sandbox survival after gateway restart (NVIDIA#1587), and proxy configuration - Regenerate `nemoclaw-user-*` skills from updated docs ## Test plan - [x] `make docs` builds without warnings - [x] All pre-commit and pre-push hooks pass - [ ] Verify rendered pages in docs site preview 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `nemoclaw credentials list` command to display stored credential names * Added `nemoclaw credentials reset <KEY>` command with `--yes` flag to remove credentials * Added `--dry-run` flag for policy-add to preview endpoint changes * New policy presets: `brave` and `brew` * New configuration options: `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT`, `NEMOCLAW_PROXY_HOST`, and `NEMOCLAW_PROXY_PORT` * **Documentation** * Expanded troubleshooting guides for Docker permissions, sandbox connectivity, local inference timeouts, and proxy configuration <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
sudo usermod -aG docker $USER && newgrp dockerinstead of the generic "Start Docker" remediationdocker_group_permissionremediation action (kind:sudo, blocking) for this casestart_dockerpath remains for when the service is inactiveTest plan
suggests usermod when docker service is active but daemon is unreachablerecommends starting docker when installed but unreachable and service inactivemake checkpasses (all linters + tests)vitest runpasses (1178 tests, 0 failures)Closes #50
Summary by CodeRabbit
Tests
Bug Fixes