Skip to content

fix(preflight): detect docker group permission issue and suggest usermod#1614

Merged
ericksoa merged 5 commits into
mainfrom
fix/docker-group-detection
Apr 8, 2026
Merged

fix(preflight): detect docker group permission issue and suggest usermod#1614
ericksoa merged 5 commits into
mainfrom
fix/docker-group-detection

Conversation

@ericksoa

@ericksoa ericksoa commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • When Docker is installed and the systemd service is active but the daemon is unreachable, suggest sudo usermod -aG docker $USER && newgrp docker instead of the generic "Start Docker" remediation
  • Adds a new docker_group_permission remediation action (kind: sudo, blocking) for this case
  • Existing start_docker path remains for when the service is inactive

Test plan

  • New test: suggests usermod when docker service is active but daemon is unreachable
  • Existing test renamed and still passes: recommends starting docker when installed but unreachable and service inactive
  • make check passes (all linters + tests)
  • Full vitest run passes (1178 tests, 0 failures)

Closes #50

Summary by CodeRabbit

  • Tests

    • Expanded and refined tests to cover Docker unreachable scenarios (service inactive vs. active-but-unreachable) and added a test stub to enforce expected remediation paths.
  • Bug Fixes

    • Remediation updated: when Docker is installed and the service is active but the daemon is unreachable on Linux, recommend adjusting Docker group permissions (with re-login) instead of always proposing to start the service.

When Docker is installed and the systemd service is active but the daemon
is unreachable, the most likely cause is a permissions/group issue rather
than a stopped service. Detect this case and suggest
`sudo usermod -aG docker $USER && newgrp docker` instead of the generic
"Start Docker" remediation.

Closes #50

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@coderabbitai

coderabbitai Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 35d4150e-6e18-45b7-97c3-205f1f4eda23

📥 Commits

Reviewing files that changed from the base of the PR and between 7191177 and 9ab1500.

📒 Files selected for processing (2)
  • src/lib/preflight.test.ts
  • src/lib/preflight.ts
✅ Files skipped from review due to trivial changes (1)
  • src/lib/preflight.test.ts

📝 Walkthrough

Walkthrough

Preflight logic now differentiates "Docker installed but daemon unreachable": if the Docker service is active/enabled, the remediation plan proposes adding the user to the docker group; otherwise it falls back to the existing "start Docker" remediation.

Changes

Cohort / File(s) Summary
Preflight logic
src/lib/preflight.ts
When Docker is installed but the daemon is unreachable, branch on dockerServiceActive: if true, emit a docker_group_permission action (sudo usermod -aG docker $USER, suggest newgrp docker/relogin, then nemoclaw onboard); otherwise keep start_docker action (systemctl path).
Preflight tests
src/lib/preflight.test.ts
Adjusted existing test to cover "Docker installed but unreachable" with dockerServiceActive: false; added a new test for "Docker service active/enabled but daemon unreachable" asserting a blocking docker_group_permission sudo action and reason mentioning the Docker group.
Installer test stub
test/install-preflight.test.js
Added a systemctl stub in the test PATH to return inactive/disabled for Docker checks so the installer preflight test follows the "start Docker" remediation path in that test context.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Preflight
    participant systemctl
    participant DockerDaemon

    User->>Preflight: run "nemoclaw onboard" preflight
    Preflight->>systemctl: check is-active/is-enabled docker
    systemctl-->>Preflight: active/enabled or inactive/disabled
    Preflight->>DockerDaemon: attempt docker daemon ping
    alt daemon reachable
        DockerDaemon-->>Preflight: reachable
        Preflight-->>User: continue onboarding
    else daemon unreachable
        alt service active/enabled
            Preflight-->>User: planAction(docker_group_permission) with sudo usermod -aG docker $USER, newgrp/docker re-login, then onboard
        else service inactive/disabled
            Preflight-->>User: planAction(start_docker) with sudo systemctl start/enable docker
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I sniffed the daemon, quiet and sly,
The service stood up while sockets deny.
"Add me to docker," I wiggle and plead,
A sudo usermod — then newgrp to succeed. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: detecting Docker group permission issues and suggesting usermod as the remediation action.
Linked Issues check ✅ Passed The PR directly addresses issue #50 by distinguishing between inactive Docker service and unreachable daemon due to group permissions, implementing the docker_group_permission remediation when the service is active but daemon is unreachable.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the Docker group permission detection in preflight checks; no unrelated modifications are present in the three modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/docker-group-detection

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Comment @coderabbitai help to get the list of available commands and usage tips.

The install-preflight test that simulates Docker-installed-but-unreachable
was picking up the real systemctl on CI, which reported Docker as active.
This triggered the new docker-group remediation instead of the expected
"Start Docker" path. Add a systemctl stub that reports docker as inactive
so the test exercises the correct code path.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/preflight.ts`:
- Around line 365-372: The RemediationAction for "docker_group_permission" in
src/lib/preflight.ts relies on shell state between two commands but there is no
executor yet; update the remediation design so the executor runs these steps
atomically: either change the RemediationAction for kind "sudo" / id
"docker_group_permission" to use a single shell invocation combining "sudo
usermod -aG docker $USER && newgrp docker && nemoclaw onboard", or split into
two distinct actions (e.g., "add_user_to_docker_group" and a separate
"run_onboard_after_relogin") and include an explicit manual-followup note so the
second step isn’t expected to inherit group changes. Ensure the executor (when
implemented) documents that RemediationAction.commands may be executed in a
single shell when order/state must be preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 80cfd749-8c8e-4066-8642-3e0a7f4036ae

📥 Commits

Reviewing files that changed from the base of the PR and between adbea05 and f24273b.

📒 Files selected for processing (2)
  • src/lib/preflight.test.ts
  • src/lib/preflight.ts

Comment thread src/lib/preflight.ts
Comment on lines +365 to +372
kind: "sudo",
reason:
"Docker is installed and the service is running, but the current user cannot reach the daemon. " +
"This usually means your user is not in the docker group.",
commands: [
"sudo usermod -aG docker $USER && newgrp docker",
"nemoclaw onboard",
],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== RemediationAction / RemediationKind definitions ==="
rg -n -C2 '\binterface\s+RemediationAction\b|\btype\s+RemediationKind\b' -t ts

echo
echo "=== Producers of the docker-group remediation ==="
rg -n -C3 'docker_group_permission|planHostRemediation\(' -t ts

echo
echo "=== Consumers of action.kind and action.commands ==="
rg -n -C4 '\.kind\b|case\s+"sudo"|===\s*"sudo"|\.commands\b' -t ts -t tsx

echo
echo "=== Potential process/shell executors ==="
rg -n -C4 'spawn\(|exec\(|execFile\(|execa\(|runCapture\(' -t ts -t tsx

Repository: NVIDIA/NemoClaw

Length of output: 4067


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Complete RemediationAction interface ==="
sed -n '109,130p' src/lib/preflight.ts

echo
echo "=== Complete docker_group_permission action definition ==="
sed -n '362,373p' src/lib/preflight.ts

echo
echo "=== Consumers of action.kind and action.commands ==="
rg -n -A5 'action\.kind|action\.commands|case.*sudo' -t ts

echo
echo "=== Executors / how commands are run ==="
rg -n -B2 -A6 'spawn\(|execFile\(|execa\(' -t ts

Repository: NVIDIA/NemoClaw

Length of output: 1280


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Search for executor/runner code handling RemediationAction ==="
rg -n 'function.*executeAction|function.*runAction|function.*runRemediatio' -t ts

echo
echo "=== Look for where commands are iterated/executed ==="
rg -n 'for.*commands|commands\.forEach|\.map.*command' -t ts | head -30

echo
echo "=== Test expectations for docker_group_permission execution ==="
sed -n '378,410p' src/lib/preflight.test.ts

echo
echo "=== Search main entry point or command execution ==="
rg -n 'main\(|async run\(' -t ts | head -20

Repository: NVIDIA/NemoClaw

Length of output: 125


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== All references to RemediationAction in codebase ==="
rg -n 'RemediationAction' -t ts

echo
echo "=== Full test case for docker_group_permission ==="
sed -n '378,420p' src/lib/preflight.test.ts

echo
echo "=== Check for any execution/spawn in bin or CLI ==="
find . -name 'bin' -o -name 'cli' -o -name 'cmd' -type d 2>/dev/null | head -5

echo
echo "=== List all TypeScript files to understand structure ==="
git ls-files --cached | grep -E '\.ts$' | head -20

Repository: NVIDIA/NemoClaw

Length of output: 2638


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Contents of bin directory ==="
ls -la bin/ 2>/dev/null || echo "No bin directory"

echo
echo "=== Check for any execution/command running logic ==="
rg -n 'spawn|execSync|exec\(|execa' -t ts | head -20

echo
echo "=== Search for TODO or FIXME related to remediation ==="
rg -n 'TODO|FIXME' src/lib/preflight.ts | grep -i 'remedi\|command\|action' || echo "No relevant TODOs"

echo
echo "=== Check imports and exports in preflight.ts ==="
head -50 src/lib/preflight.ts | grep -E '^import|^export'

echo
echo "=== Check if planHostRemediation is exported and used elsewhere ==="
rg -n 'planHostRemediation' -t ts

Repository: NVIDIA/NemoClaw

Length of output: 3049


Clarify the remediation action design before implementing the executor.

Currently, no executor code exists for RemediationAction. When the executor is implemented, the docker_group_permission action needs careful handling: the first command (sudo usermod -aG docker $USER && newgrp docker) depends on shell state that the second command (nemoclaw onboard) must inherit. If the executor spawns commands separately (one per array element), the second command will lose the group membership refresh. Consider either combining both steps into a single shell invocation or splitting them into separate actions with explicit manual follow-up instructions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/preflight.ts` around lines 365 - 372, The RemediationAction for
"docker_group_permission" in src/lib/preflight.ts relies on shell state between
two commands but there is no executor yet; update the remediation design so the
executor runs these steps atomically: either change the RemediationAction for
kind "sudo" / id "docker_group_permission" to use a single shell invocation
combining "sudo usermod -aG docker $USER && newgrp docker && nemoclaw onboard",
or split into two distinct actions (e.g., "add_user_to_docker_group" and a
separate "run_onboard_after_relogin") and include an explicit manual-followup
note so the second step isn’t expected to inherit group changes. Ensure the
executor (when implemented) documents that RemediationAction.commands may be
executed in a single shell when order/state must be preserved.

ericksoa added 3 commits April 8, 2026 09:06
Address CodeRabbit review: newgrp replaces the current shell, so
combining it with usermod in a single && chain would prevent any
subsequent command from running. Split into three discrete steps
so each is clear when displayed as manual instructions.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@ericksoa ericksoa self-assigned this Apr 8, 2026

@kjw3 kjw3 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeRabbit call out looks worth addressing, but otherwise this looks sound

@ericksoa ericksoa merged commit 966a69d into main Apr 8, 2026
10 checks passed
@ericksoa ericksoa deleted the fix/docker-group-detection branch April 8, 2026 17:17
miyoungc added a commit that referenced this pull request Apr 9, 2026
## Summary
- Document `nemoclaw credentials list` and `nemoclaw credentials reset`
commands in commands reference (#1597)
- Add `--dry-run` flag documentation for `policy-add` (#1276)
- Update policy presets table: remove `docker` (#1647), add `brave` and
`brew`, update HuggingFace endpoint (#1540)
- Document `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` env var for local
providers (#1620)
- Document `NEMOCLAW_PROXY_HOST`/`NEMOCLAW_PROXY_PORT` env vars (#1563)
- Add troubleshooting entries for Docker group permissions (#1614),
sandbox survival after gateway restart (#1587), and proxy configuration
- Regenerate `nemoclaw-user-*` skills from updated docs

## Test plan
- [x] `make docs` builds without warnings
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify rendered pages in docs site preview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added `nemoclaw credentials list` command to display stored credential
names
* Added `nemoclaw credentials reset <KEY>` command with `--yes` flag to
remove credentials
  * Added `--dry-run` flag for policy-add to preview endpoint changes
  * New policy presets: `brave` and `brew`
* New configuration options: `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT`,
`NEMOCLAW_PROXY_HOST`, and `NEMOCLAW_PROXY_PORT`

* **Documentation**
* Expanded troubleshooting guides for Docker permissions, sandbox
connectivity, local inference timeouts, and proxy configuration

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
…mod (NVIDIA#1614)

## Summary
- When Docker is installed and the systemd service is active but the
daemon is unreachable, suggest `sudo usermod -aG docker $USER && newgrp
docker` instead of the generic "Start Docker" remediation
- Adds a new `docker_group_permission` remediation action (kind: `sudo`,
blocking) for this case
- Existing `start_docker` path remains for when the service is inactive

## Test plan
- [x] New test: `suggests usermod when docker service is active but
daemon is unreachable`
- [x] Existing test renamed and still passes: `recommends starting
docker when installed but unreachable and service inactive`
- [x] `make check` passes (all linters + tests)
- [x] Full `vitest run` passes (1178 tests, 0 failures)

Closes NVIDIA#50

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Expanded and refined tests to cover Docker unreachable scenarios
(service inactive vs. active-but-unreachable) and added a test stub to
enforce expected remediation paths.

* **Bug Fixes**
* Remediation updated: when Docker is installed and the service is
active but the daemon is unreachable on Linux, recommend adjusting
Docker group permissions (with re-login) instead of always proposing to
start the service.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
## Summary
- Document `nemoclaw credentials list` and `nemoclaw credentials reset`
commands in commands reference (NVIDIA#1597)
- Add `--dry-run` flag documentation for `policy-add` (NVIDIA#1276)
- Update policy presets table: remove `docker` (NVIDIA#1647), add `brave` and
`brew`, update HuggingFace endpoint (NVIDIA#1540)
- Document `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` env var for local
providers (NVIDIA#1620)
- Document `NEMOCLAW_PROXY_HOST`/`NEMOCLAW_PROXY_PORT` env vars (NVIDIA#1563)
- Add troubleshooting entries for Docker group permissions (NVIDIA#1614),
sandbox survival after gateway restart (NVIDIA#1587), and proxy configuration
- Regenerate `nemoclaw-user-*` skills from updated docs

## Test plan
- [x] `make docs` builds without warnings
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify rendered pages in docs site preview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added `nemoclaw credentials list` command to display stored credential
names
* Added `nemoclaw credentials reset <KEY>` command with `--yes` flag to
remove credentials
  * Added `--dry-run` flag for policy-add to preview endpoint changes
  * New policy presets: `brave` and `brew`
* New configuration options: `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT`,
`NEMOCLAW_PROXY_HOST`, and `NEMOCLAW_PROXY_PORT`

* **Documentation**
* Expanded troubleshooting guides for Docker permissions, sandbox
connectivity, local inference timeouts, and proxy configuration

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wscurran wscurran added area: packaging Packages, images, registries, installers, or distribution bug-fix PR fixes a bug or regression platform: container Affects Docker, containerd, Podman, or images and removed area: packaging Packages, images, registries, installers, or distribution Docker labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression platform: container Affects Docker, containerd, Podman, or images

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fedora - Docker - Onboarding cannot find Docker

3 participants