Skip to content

fix(hermes): clear Tirith retry marker before command dispatch#4666

Merged
cv merged 2 commits into
mainfrom
fix/4511_hermes_tirith_marker_cleanup
Jun 3, 2026
Merged

fix(hermes): clear Tirith retry marker before command dispatch#4666
cv merged 2 commits into
mainfrom
fix/4511_hermes_tirith_marker_cleanup

Conversation

@chengjiew

@chengjiew chengjiew commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Hermes startup now clears a retryable Tirith download_failed marker before dispatching explicit sandbox commands. This lets restart/status command paths run the documented marker cleanup instead of skipping it and leaving .tirith-install-failed behind.

Related Issue

Fixes #4511

Changes

  • Move retry_tirith_marker_if_needed ahead of explicit command dispatch in both root and non-root Hermes startup paths.
  • Add a behavioral Hermes startup harness that plants a download_failed marker and verifies explicit command dispatch sees the marker already removed.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Focused checks run:

  • npm test -- --run test/hermes-start.test.ts passes
  • npm run source-shape:check passes
  • git diff --check passes

Full local checks attempted but did not pass due existing environment/setup blockers: missing plugin json5 dependency under nemoclaw/node_modules, local dist/ artifact failures in a direct npm test run before build, macOS OpenShell standalone asset expectations, localhost fixture HTTP 403s, locale warnings, and e2e-scenario timeout failures.


Signed-off-by: Chengjie Wang chengjiew@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Ensures retry marker cleanup happens before explicit command dispatch so startup commands run reliably and fallback retries can proceed as intended.
  • Tests

    • Added tests that validate marker cleanup and messaging for both root and non-root execution paths, preventing leftover markers from affecting subsequent runs.

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
@chengjiew chengjiew self-assigned this Jun 2, 2026
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3eb19ae1-b0d8-4f0e-a030-08cbb2562c3d

📥 Commits

Reviewing files that changed from the base of the PR and between 5cbcaf6 and 38e3e6d.

📒 Files selected for processing (1)
  • agents/hermes/start.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • agents/hermes/start.sh

📝 Walkthrough

Walkthrough

Relocates the Tirith retry-marker cleanup to run before explicit-command exec/privilege-drop in both non-root and root startup paths, and adds tests that extract and execute those dispatch blocks to verify the marker is removed and the fallback-retry message is emitted.

Changes

Tirith Marker Cleanup on Command Dispatch

Layer / File(s) Summary
Move marker cleanup before exec in startup paths
agents/hermes/start.sh
retry_tirith_marker_if_needed is moved to run before the conditional exec/privilege-drop branches in both non-root (lines 958–963) and root (lines 1006–1011) startup paths when NEMOCLAW_CMD is set, making the marker retry logic reachable.
Test helpers and marker cleanup verification
test/hermes-start.test.ts
Adds extractTirithDispatchBlock and runTirithExplicitCommandDispatch helpers (lines 135–191) and a Vitest test (lines 526–537) that seeds a retryable .tirith-install-failed marker, runs both non-root and root dispatch blocks, asserts success, confirms the marker is removed, and checks stderr for the fallback-retry message.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

fix

Suggested reviewers

  • cv

Poem

🐰 A stubborn marker hid at night,
I hopped and nudged till dawn's first light.
Moved the check up front, before the run —
Now boot is tidy, marker gone, job done!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: moving the Tirith retry marker cleanup before command dispatch to fix the issue where it was unreachable.
Linked Issues check ✅ Passed The code changes directly address all three stated objectives from issue #4511: removing the download_failed marker during startup, emitting retry/cleanup messages, and restoring the documented recovery path for explicit commands.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the Tirith marker cleanup issue: moving function calls in the startup script and adding corresponding test coverage with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/4511_hermes_tirith_marker_cleanup

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@wscurran wscurran added area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression labels Jun 3, 2026
@wscurran

wscurran commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@wscurran wscurran added the integration: hermes Hermes integration behavior label Jun 3, 2026
@cv cv added the v0.0.58 Release target label Jun 3, 2026
@cv cv merged commit 1ada907 into main Jun 3, 2026
30 checks passed
@cv cv deleted the fix/4511_hermes_tirith_marker_cleanup branch June 3, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression integration: hermes Hermes integration behavior v0.0.58 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Sandbox] Hermes startup leaves download_failed Tirith marker after restart

3 participants