Skip to content

Add error handling for gateway startup#398

Closed
Sayeem3051 wants to merge 4 commits into
NVIDIA:mainfrom
Sayeem3051:fix/issue-17-gateway-volume-cleanup-sayeem
Closed

Add error handling for gateway startup#398
Sayeem3051 wants to merge 4 commits into
NVIDIA:mainfrom
Sayeem3051:fix/issue-17-gateway-volume-cleanup-sayeem

Conversation

@Sayeem3051

@Sayeem3051 Sayeem3051 commented Mar 19, 2026

Copy link
Copy Markdown

Handle errors during gateway startup by destroying the gateway if an error occurs.

Summary by CodeRabbit

  • Bug Fixes
    • Improved onboarding error handling: if gateway initialization fails, the system now attempts to clean up any partially created gateway resources, suppresses cleanup failures, and then surfaces the original error. This reduces leftover resources and provides a more reliable, predictable setup experience, minimizing the need for manual cleanup after failed onboarding attempts.

Handle errors during gateway startup by destroying the gateway if an error occurs.
@coderabbitai

coderabbitai Bot commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 19a9bd93-3f06-4bec-a6b3-a870744c79b2

📥 Commits

Reviewing files that changed from the base of the PR and between 8477076 and e338bde.

📒 Files selected for processing (1)
  • bin/lib/onboard.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/lib/onboard.js

📝 Walkthrough

Walkthrough

A try/catch now wraps the startGateway(gpu) call in the onboarding flow; on exception the code runs an openshell gateway destroy -g nemoclaw command with errors ignored, then rethrows the original error. If no error occurs, onboarding proceeds to createSandbox(gpu) and subsequent steps.

Changes

Cohort / File(s) Summary
Onboarding gateway error handling
bin/lib/onboard.js
Wrap startGateway(gpu) in try/catch; on throw, run openshell gateway destroy -g nemoclaw with stderr suppressed and ignoreError: true, then rethrow. Normal flow continues to createSandbox(gpu) when no exception occurs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I nudged the gateway to start the run,
It tripped—so I cleaned up what was undone,
A silent destroy, then the error I cast,
Back to the sandbox, hopping fast,
Forward we go, until the job is done 🌙

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding error handling for gateway startup in the onboarding process.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 879-884: The catch block references a non-existent
destroyGateway() causing a ReferenceError; replace that call with the existing
cleanup pattern used earlier (the shell-command-based teardown used around line
334) so the gateway is properly torn down and the original error is rethrown.
Specifically, inside the catch for startGateway, invoke the same async
shell/teardown helper (or inline shell command invocation) used at line 334
instead of destroyGateway(), await its completion, then rethrow the caught
error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e68f5694-f208-4e0c-856b-6a2774a95e62

📥 Commits

Reviewing files that changed from the base of the PR and between 3ba517d and 8b89f10.

📒 Files selected for processing (1)
  • bin/lib/onboard.js

CodeRabbitAI review identified destroyGateway() as undefined.
Using the shell command pattern from line 334 instead:
run("openshell gateway destroy -g nemoclaw 2>/dev/null || true", { ignoreError: true });

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 879-884: The catch block after startGateway(gpu) contains a syntax
error (a stray "();") and an unnecessary await on run; fix the catch to call
run("openshell gateway destroy -g nemoclaw 2>/dev/null || true", { ignoreError:
true }); (no trailing "();" and remove the await) and reformat the indentation
to match the file's 2-space style; this change should be applied in the catch
that references startGateway and run to restore valid parsing and follow the
existing synchronous run usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ebc76d14-fa5b-42a3-b8bf-354e1cb954cb

📥 Commits

Reviewing files that changed from the base of the PR and between 8b89f10 and b60dca6.

📒 Files selected for processing (1)
  • bin/lib/onboard.js

Comment thread bin/lib/onboard.js Outdated
CodeRabbitAI review flagged:
- Syntax error: trailing (); will cause a parse failure
- Unnecessary await since run() is synchronous

Fixed by using the correct shell command pattern from line 334.
@Sayeem3051

Copy link
Copy Markdown
Author

@coderabbitai Review completed — all feedback has been addressed:

  1. Syntax error fixed: Removed the trailing (); from the run() call on line 882. The correct shell command pattern from line 334 is now used:

    run("openshell gateway destroy -g nemoclaw 2>/dev/null || true", { ignoreError: true });
  2. Removed unnecessary await: The run() function is synchronous, so await was removed.

  3. All CodeRabbitAI checks are now passing (pre-merge checks: 3 verified).

Ready for maintainer review. Thank you!

@coderabbitai

coderabbitai Bot commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current state of the error handling block in onboard.js around line 882
grep -n "startGateway\|gateway destroy\|try\|catch\|ignoreError\|await run" bin/lib/onboard.js | head -40

Length of output: 2654


🏁 Script executed:

#!/bin/bash
# View the specific section around line 882 where the try/catch for startGateway was added
sed -n '870,900p' bin/lib/onboard.js

Length of output: 896


@Sayeem3051 — Thanks for the update! I've verified the current state of bin/lib/onboard.js. The two core fixes look correct:

  1. Syntax error fixedrun("openshell gateway destroy -g nemoclaw 2>/dev/null || true", { ignoreError: true }); matches the established pattern at line 334, with no trailing ();.
  2. await removedrun() is called synchronously as intended.
  3. Error is rethrown — The catch block destroys the gateway and rethrows the original error, preserving the failure signal to callers.

One minor observation: the indentation in the new try/catch block is inconsistent with the surrounding code style (the try { uses 4 spaces while await startGateway(gpu); uses 2 spaces, and the catch body uses 10 spaces). It doesn't affect behavior, but tidying it up would improve readability. For example:

  try {
    await startGateway(gpu);
  } catch (error) {
    run("openshell gateway destroy -g nemoclaw 2>/dev/null || true", { ignoreError: true });
    throw error;
  }

Otherwise, the logic is sound and ready for a maintainer review. 🚀

The try/catch block around startGateway() had inconsistent indentation
with the surrounding code style. Fixed to use consistent 2-space
indentation throughout the block.

- try { and } catch (error) { now use 2 spaces
- Block contents (await startGateway, run, throw) now use 4 spaces
- Matches the surrounding function body indentation
@Sayeem3051

Copy link
Copy Markdown
Author

@coderabbitai Indentation fix committed — the try/catch block now uses consistent 2-space indentation to match the surrounding code style:

  • try { and } catch (error) { — 2 spaces
  • Block contents (await startGateway, run, throw) — 4 spaces
  • Closing } — 2 spaces

All checks passing. Ready for maintainer review.

@coderabbitai

coderabbitai Bot commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the current indentation of the try/catch block around startGateway
sed -n '870,900p' bin/lib/onboard.js

Length of output: 872


@Sayeem3051 — Verified! ✅ The try/catch block in bin/lib/onboard.js now has consistent indentation matching the surrounding code style:

  try {
    await startGateway(gpu);
  } catch (error) {
    run("openshell gateway destroy -g nemoclaw 2>/dev/null || true", { ignoreError: true });
    throw error;
  }

All three concerns from the review are now fully addressed:

  1. ✅ Syntax error (();) fixed
  2. ✅ Unnecessary await removed from run()
  3. ✅ Indentation tidied to consistent 2-space/4-space style

The PR is in great shape — ready for maintainer review. 🚀

@wscurran wscurran added bug Something fails against expected or documented behavior enhancement: feature labels Mar 19, 2026
@wscurran

Copy link
Copy Markdown
Contributor

Thanks for tackling gateway startup error handling. The gateway lifecycle code has been refactored significantly since March. Could you rebase against main to check for conflicts? The error handling intent is sound — we'd like to get this reviewed once it's current.

@wscurran

Copy link
Copy Markdown
Contributor

Following up on the April 14 rebase request. The gateway lifecycle code continues to evolve, and we'd like to get this error handling merged — could you find time to rebase against main? If we don't hear back within 7 days, we'll close this PR to keep the queue healthy.

@wscurran

Copy link
Copy Markdown
Contributor

Closing due to inactivity. This PR has had merge conflicts for 7+ days without an update. Feel free to reopen once it's rebased against the latest main, or open a new PR to continue the work. Thanks for contributing to NemoClaw.

@wscurran wscurran closed this Apr 29, 2026
@wscurran wscurran added bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality needs: rebase PR needs rebase or conflict resolution and removed enhancement: feature bug Something fails against expected or documented behavior labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality needs: rebase PR needs rebase or conflict resolution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants