fix(cli): add sudo to lsof and kill hints in onboarding port check by tienedev · Pull Request #734 · NVIDIA/NemoClaw

tienedev · 2026-03-23T21:04:09Z

Summary

Non-root users cannot see root-owned listeners via lsof, so the port-conflict hints shown during onboarding silently fail. This PR adds sudo to the lsof and kill hints displayed to users, and updates the troubleshooting docs to match.

The programmatic lsof call in preflight.js is intentionally left without sudo — adding it there would trigger an interactive password prompt during onboarding, which is a worse UX. The current approach (try without sudo, fall through gracefully, then suggest the user run it manually with sudo) is the right tradeoff.

Related Issue

Fixes #726

Changes

Extracted inline hint logic into a dedicated formatPortConflictHint() helper in onboard.js for testability
Added sudo prefix to lsof and kill commands in the user-facing hints (both "unknown process" and "known process without PID" paths)
Updated docs/reference/troubleshooting.md with sudo commands
Regenerated .agents/skills/docs/ via docs-to-skills.py
Added 3 unit tests covering all hint branches

Type of Change

Code change with doc updates.

Testing

npm test passes.
3 new tests added in test/onboard.test.js covering all formatPortConflictHint() branches: unknown process, known with PID, known without PID.

Checklist

General

I have read and followed the contributing guide.

Code Changes

Tests added or updated for new or changed behavior.
No secrets, API keys, or credentials committed.
Doc pages updated for any user-facing behavior changes.

Doc Changes

Cross-references and links verified.

Summary by CodeRabbit

Documentation
- Added comprehensive guide for backing up and restoring workspace files
- Added reference documentation describing workspace Markdown files and persistence behavior
- Added warning callouts about permanent file deletion when destroying sandboxes
Bug Fixes
- Updated port conflict troubleshooting commands to use elevated privileges for improved reliability
Tests
- Added unit tests for port conflict error message formatting

…ixes #726) Non-root users cannot see root-owned listeners with lsof, so the onboarding error message now suggests `sudo lsof` and `sudo kill` instead of the unprivileged variants. The troubleshooting docs are updated to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-03-23T21:04:24Z

📝 Walkthrough

Walkthrough

This pull request addresses a port conflict issue by adding sudo requirements to troubleshooting commands, refactoring port conflict hint generation into a reusable helper function with unit tests, and adding comprehensive documentation for workspace files, backup procedures, and the destroy command's permanent deletion behavior.

Changes

Cohort / File(s)	Summary
Troubleshooting Documentation `docs/reference/troubleshooting.md`, `.agents/skills/docs/nemoclaw-reference/references/troubleshooting.md`	Updated port conflict identification and termination commands to require `sudo` prefix for `lsof` and `kill` operations.
Workspace Documentation `.agents/skills/docs/nemoclaw-workspace/SKILL.md`, `.agents/skills/docs/nemoclaw-workspace/references/workspace-files.md`, `.agents/skills/docs/nemoclaw-reference/references/commands.md`	Added new documentation files explaining workspace Markdown files, their location, persistence behavior, and manual/scripted backup/restore procedures. Added warning callout to `nemoclaw destroy` command about permanent file deletion.
Port Conflict Helper Refactoring `bin/lib/onboard.js`	Extracted inline port conflict hint generation logic into new exported `formatPortConflictHint()` helper function that formats conflict metadata into human-readable instructions with appropriate `sudo` commands.
Unit Tests `test/onboard.test.js`	Added three test cases covering `formatPortConflictHint()` behavior for unknown processes, known processes with PIDs, and known processes without PIDs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐰 Hops of wisdom, bound in care,
Sudo commands float through the air!
Workspace treasures tucked away safe,
With backups made, no need for haste. 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly corresponds to the main code change: adding sudo to lsof and kill hints in the onboarding port check logic.
Linked Issues check	✅ Passed	The code changes comprehensively address issue `#726`: sudo added to lsof and kill commands in user-facing hints, new helper function extracted for testability, and accompanying documentation and tests provided.
Out of Scope Changes check	✅ Passed	All changes are in scope: code updates to address the issue, documentation updates for both user-facing hints and troubleshooting references, and comprehensive test coverage for the new helper function.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

docs/reference/troubleshooting.md (1)
98-103: ⚠️ Potential issue | 🟡 Minor

Missing sudo on the force-terminate command.

Line 102 mentions kill -9 <PID> without sudo, but if sudo kill <PID> is required (line 99), then sudo kill -9 <PID> should also be required for consistency when force-terminating root-owned processes.
📝 Suggested fix
-If the process does not exit, use `kill -9 <PID>` to force-terminate it.
+If the process does not exit, use `sudo kill -9 <PID>` to force-terminate it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/reference/troubleshooting.md` around lines 98 - 103, The troubleshooting
doc shows "kill -9 <PID>" without sudo while earlier using "sudo kill <PID>";
update the force-terminate example so it consistently uses sudo (i.e., change
"kill -9 <PID>" to "sudo kill -9 <PID>") near the "sudo lsof -i :18789" and
"sudo kill <PID>" lines to ensure commands will work for root-owned processes.
.agents/skills/docs/nemoclaw-reference/references/troubleshooting.md (1)
72-77: ⚠️ Potential issue | 🟡 Minor

Missing sudo on the force-terminate command (same as main docs).

Line 76 mentions kill -9 <PID> without sudo. For consistency with the updated sudo kill <PID> on line 73, this should also use sudo kill -9 <PID>.
📝 Suggested fix
-If the process does not exit, use `kill -9 <PID>` to force-terminate it.
+If the process does not exit, use `sudo kill -9 <PID>` to force-terminate it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/skills/docs/nemoclaw-reference/references/troubleshooting.md around
lines 72 - 77, Update the force-terminate command to use sudo for consistency:
change the plain "kill -9 <PID>" mentioned in the troubleshooting text to "sudo
kill -9 <PID>" so it matches the earlier "sudo kill <PID>" usage; edit the
string in the troubleshooting paragraph (the line containing the kill -9
instruction) accordingly.

🧹 Nitpick comments (2)

bin/lib/onboard.js (1)
323-346: Minor inconsistency in lsof flags between branches.

Line 337 uses sudo lsof -i :${port} -sTCP:LISTEN -P -n (with -P -n flags for numeric output), while line 343 uses sudo lsof -i :${port} -sTCP:LISTEN (without -P -n). Consider using consistent flags in both branches for a uniform user experience.
🔧 Suggested fix for consistency
   } else {
     lines.push(`     Could not identify the process using port ${port}.`);
-    lines.push(`     Run: sudo lsof -i :${port} -sTCP:LISTEN`);
+    lines.push(`     Run: sudo lsof -i :${port} -sTCP:LISTEN -P -n`);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/onboard.js` around lines 323 - 346, The hint builder
formatPortConflictHint has inconsistent lsof flags: it uses "sudo lsof -i
:${port} -sTCP:LISTEN -P -n" when portCheck.process is known but omits "-P -n"
in the fallback branch; update the fallback branch to include the same "-P -n"
flags so both branches produce the same lsof command and consistent numeric
output (modify the string in the else branch where it currently returns "sudo
lsof -i :${port} -sTCP:LISTEN" to include "-P -n").
.agents/skills/docs/nemoclaw-workspace/SKILL.md (1)
88-99: Resolve markdownlint MD014 warnings in command examples.

$ prompts are used without command output in several console blocks, which matches the lint warnings and will keep docs lint noisy/failing depending on CI policy.
🧹 One clean approach (switch to bash blocks for command-only snippets)
-```console
-$ SANDBOX=my-assistant
-$ BACKUP_DIR=~/.nemoclaw/backups/$(date +%Y%m%d-%H%M%S)
-$ mkdir -p "$BACKUP_DIR"
+```bash
+SANDBOX=my-assistant
+BACKUP_DIR=~/.nemoclaw/backups/$(date +%Y%m%d-%H%M%S)
+mkdir -p "$BACKUP_DIR"
 ...
-```
+```
Also applies to: 105-115, 133-135, 139-141, 162-163
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/skills/docs/nemoclaw-workspace/SKILL.md around lines 88 - 99, The
console code blocks in SKILL.md use ```console with leading "$ " prompts (e.g.,
lines containing "$ SANDBOX=...", "$ BACKUP_DIR=...", "$ mkdir -p ..."), which
triggers markdownlint MD014; change those ```console fences to ```bash and
remove the leading "$ " prompt prefixes in each command example (including the
other affected blocks around the examples referenced) so the blocks contain only
raw commands like SANDBOX=..., BACKUP_DIR=..., mkdir -p ...; ensure all similar
occurrences (the blocks around the other ranges noted) are updated consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/skills/docs/nemoclaw-reference/references/commands.md:
- Around line 85-87: Replace the plain-text reference "Back Up and Restore (see
the `nemoclaw-workspace` skill)" with an explicit Markdown link that points to
the exact Back Up and Restore section or the nemoclaw-workspace skill doc;
update the sentence in commands.md (the warning block containing "Destroying a
sandbox..." and the phrase "Back Up and Restore") to use the Markdown link
syntax [Back Up and
Restore](<URL-or-relative-path-to-nemoclaw-workspace#back-up-and-restore>) so
users can click directly to the relevant section, keeping the
`nemoclaw-workspace` inline code mention if desired.

In @.agents/skills/docs/nemoclaw-workspace/SKILL.md:
- Line 3: Update the frontmatter "description" field in SKILL.md to fix grammar
and clarity: replace the current broken string with a concise sentence such as
"How to back up and restore OpenClaw workspace files before destructive
operations; explains what workspace files are, where they live, and how they
persist across sandbox restarts. Use for agents.md, backup/restore, identity.md,
memory.md, nemoclaw, nemoclaw backup, and nemoclaw restore." Ensure you modify
the description key in the file's frontmatter (the "description" field) to this
improved wording.

---

Outside diff comments:
In @.agents/skills/docs/nemoclaw-reference/references/troubleshooting.md:
- Around line 72-77: Update the force-terminate command to use sudo for
consistency: change the plain "kill -9 <PID>" mentioned in the troubleshooting
text to "sudo kill -9 <PID>" so it matches the earlier "sudo kill <PID>" usage;
edit the string in the troubleshooting paragraph (the line containing the kill
-9 instruction) accordingly.

In `@docs/reference/troubleshooting.md`:
- Around line 98-103: The troubleshooting doc shows "kill -9 <PID>" without sudo
while earlier using "sudo kill <PID>"; update the force-terminate example so it
consistently uses sudo (i.e., change "kill -9 <PID>" to "sudo kill -9 <PID>")
near the "sudo lsof -i :18789" and "sudo kill <PID>" lines to ensure commands
will work for root-owned processes.

---

Nitpick comments:
In @.agents/skills/docs/nemoclaw-workspace/SKILL.md:
- Around line 88-99: The console code blocks in SKILL.md use ```console with
leading "$ " prompts (e.g., lines containing "$ SANDBOX=...", "$
BACKUP_DIR=...", "$ mkdir -p ..."), which triggers markdownlint MD014; change
those ```console fences to ```bash and remove the leading "$ " prompt prefixes
in each command example (including the other affected blocks around the examples
referenced) so the blocks contain only raw commands like SANDBOX=...,
BACKUP_DIR=..., mkdir -p ...; ensure all similar occurrences (the blocks around
the other ranges noted) are updated consistently.

In `@bin/lib/onboard.js`:
- Around line 323-346: The hint builder formatPortConflictHint has inconsistent
lsof flags: it uses "sudo lsof -i :${port} -sTCP:LISTEN -P -n" when
portCheck.process is known but omits "-P -n" in the fallback branch; update the
fallback branch to include the same "-P -n" flags so both branches produce the
same lsof command and consistent numeric output (modify the string in the else
branch where it currently returns "sudo lsof -i :${port} -sTCP:LISTEN" to
include "-P -n").

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a7f58d9-6a96-4dca-9b28-a270fccc0942

📥 Commits

Reviewing files that changed from the base of the PR and between 5ac3575 and 92117bc.

📒 Files selected for processing (7)

.agents/skills/docs/nemoclaw-reference/references/commands.md
.agents/skills/docs/nemoclaw-reference/references/troubleshooting.md
.agents/skills/docs/nemoclaw-workspace/SKILL.md
.agents/skills/docs/nemoclaw-workspace/references/workspace-files.md
bin/lib/onboard.js
docs/reference/troubleshooting.md
test/onboard.test.js

coderabbitai · 2026-03-23T21:06:04Z

+> **Warning:** Destroying a sandbox permanently deletes all files inside it, including
+> workspace files (see the `nemoclaw-workspace` skill) (SOUL.md, USER.md, IDENTITY.md, AGENTS.md, MEMORY.md, and daily memory notes).
+> Back up your workspace first by following the instructions at Back Up and Restore (see the `nemoclaw-workspace` skill).


⚠️ Potential issue | 🟡 Minor

Make “Back Up and Restore” an explicit Markdown link.

The warning is useful, but the current plain-text reference is harder to navigate than a direct link to the exact section/skill doc.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/skills/docs/nemoclaw-reference/references/commands.md around lines 85 - 87, Replace the plain-text reference "Back Up and Restore (see the `nemoclaw-workspace` skill)" with an explicit Markdown link that points to the exact Back Up and Restore section or the nemoclaw-workspace skill doc; update the sentence in commands.md (the warning block containing "Destroying a sandbox..." and the phrase "Back Up and Restore") to use the Markdown link syntax [Back Up and Restore](<URL-or-relative-path-to-nemoclaw-workspace#back-up-and-restore>) so users can click directly to the relevant section, keeping the `nemoclaw-workspace` inline code mention if desired.

coderabbitai · 2026-03-23T21:06:04Z

@@ -0,0 +1,169 @@
+---
+name: nemoclaw-workspace
+description: Hows to back up and restore OpenClaw workspace files before destructive operations. Also covers whats workspace files are, where they live, and how they persist across sandbox restarts. Use when agents.md, back restore workspace files, backup, identity.md, memory.md, nemoclaw, nemoclaw backup, nemoclaw restore.


⚠️ Potential issue | 🟡 Minor

Fix frontmatter description grammar and phrasing.

The description string has multiple typos/awkward phrases (Hows, whats, back restore workspace files) that reduce discoverability and polish in generated docs.

✏️ Proposed wording

-description: Hows to back up and restore OpenClaw workspace files before destructive operations. Also covers whats workspace files are, where they live, and how they persist across sandbox restarts. Use when agents.md, back restore workspace files, backup, identity.md, memory.md, nemoclaw, nemoclaw backup, nemoclaw restore. +description: How to back up and restore OpenClaw workspace files before destructive operations. Also covers what workspace files are, where they live, and how they persist across sandbox restarts. Use when querying AGENTS.md, backup/restore workspace files, IDENTITY.md, MEMORY.md, nemoclaw, nemoclaw backup, or nemoclaw restore.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.agents/skills/docs/nemoclaw-workspace/SKILL.md at line 3, Update the frontmatter "description" field in SKILL.md to fix grammar and clarity: replace the current broken string with a concise sentence such as "How to back up and restore OpenClaw workspace files before destructive operations; explains what workspace files are, where they live, and how they persist across sandbox restarts. Use for agents.md, backup/restore, identity.md, memory.md, nemoclaw, nemoclaw backup, and nemoclaw restore." Ensure you modify the description key in the file's frontmatter (the "description" field) to this improved wording.

…IDIA#734/NVIDIA#735) CodeQL flagged the symlink-target paths in the two symlink-skip tests as insecure temp file creation (predictable `os.tmpdir() + pid` paths let a coresident attacker pre-create the target). Switch both to the existing `makeTempDir()` helper, which uses `mkdtempSync` for an unguessable directory name. Cleanup also simplifies to the afterEach hook since both temp dirs are now tracked in `tmpDirs`. Tests still pass (16/16). Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

tienedev and others added 2 commits March 23, 2026 19:26

docs: regenerate skill docs after troubleshooting sudo update

92117bc

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

tienedev closed this Mar 23, 2026

tienedev deleted the fix/726-lsof-sudo-onboarding branch March 23, 2026 21:14

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): add sudo to lsof and kill hints in onboarding port check#734

fix(cli): add sudo to lsof and kill hints in onboarding port check#734
tienedev wants to merge 2 commits into
NVIDIA:mainfrom
tienedev:fix/726-lsof-sudo-onboarding

tienedev commented Mar 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 23, 2026

Uh oh!

coderabbitai Bot Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tienedev commented Mar 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Testing

Checklist

General

Code Changes

Doc Changes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tienedev commented Mar 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 23, 2026 •

edited

Loading