fix: recover Hermes Tirith startup marker#3797
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Hermes startup handling to detect and optionally retry build-time Tirith install failures, integrates a sandbox diagnostics collector for Hermes startup timeouts, and includes unit tests plus generated e2e parity updates. ChangesHermes Tirith Bootstrap Retry and Diagnostics
Sequence DiagramsequenceDiagram
participant OnboardCLI as Onboard CLI
participant GatewayProbe as Gateway Health Probe
participant StartSh as agents/hermes/start.sh
participant SandboxFS as Sandbox Filesystem
participant PythonTools as tools.tirith_security (Python)
participant RunCapture as runCaptureOpenshell
participant Diagnostics as collectHermesStartupDiagnostics()
participant FailReporter as failAgentSetup()
OnboardCLI->>GatewayProbe: wait for gateway (timeout)
StartSh->>SandboxFS: stat .tirith-install-failed
alt marker present and regular file
StartSh->>SandboxFS: read reason
alt reason == download_failed
StartSh->>PythonTools: invoke runtime tirith install
PythonTools-->>StartSh: install success/failure
StartSh->>SandboxFS: rm marker on success
else
StartSh->>OnboardCLI: log non-retryable reason
end
else marker is symlink
StartSh->>OnboardCLI: log unsafe marker warning
end
alt Gateway still unhealthy after timeout
GatewayProbe->>RunCapture: request diagnostics
RunCapture->>Diagnostics: run diagnostics script in sandbox
Diagnostics-->>RunCapture: redacted marker info + tailed logs
RunCapture->>FailReporter: pass diagnostics as detail lines
FailReporter->>OnboardCLI: report timeout with details
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
Selective E2E Results — ❌ Some jobs failedRun: 26099397364
|
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26099797965
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/docs/parity-map.yaml (1)
1-1:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd missing SPDX license header to this YAML file.
This file is missing the required SPDX copyright/license header.
As per coding guidelines, "Every source file must include an SPDX license header for copyright and Apache-2.0 license".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/docs/parity-map.yaml` at line 1, Add the missing SPDX license header to the top of the YAML file so every source file includes copyright and license info; insert a single-line SPDX header (e.g., "SPDX-FileCopyrightText: [copyright holders]" and "SPDX-License-Identifier: Apache-2.0") immediately above the existing root key (the "scripts:" entry) so the header appears as the first lines of parity-map.yaml.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/e2e/docs/parity-map.yaml`:
- Line 1: Add the missing SPDX license header to the top of the YAML file so
every source file includes copyright and license info; insert a single-line SPDX
header (e.g., "SPDX-FileCopyrightText: [copyright holders]" and
"SPDX-License-Identifier: Apache-2.0") immediately above the existing root key
(the "scripts:" entry) so the header appears as the first lines of
parity-map.yaml.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fae99d2d-767c-4138-960a-56e735a785e7
📒 Files selected for processing (2)
test/e2e/docs/parity-inventory.generated.jsontest/e2e/docs/parity-map.yaml
✅ Files skipped from review due to trivial changes (1)
- test/e2e/docs/parity-inventory.generated.json
Selective E2E Results — ✅ All requested jobs passedRun: 26100131745
|
|
Thanks for jumping on this — the direction is right, especially the Step 7 diagnostics and the fact that retry failure remains non-blocking. I do think this needs one simplification pass before merge, though. Request changes: avoid duplicating Tirith recovery in
|
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26102493171
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/hermes-start.test.ts`:
- Around line 16-17: The regex in extractShellFunctionFromSource interpolates
the variable name directly into new RegExp and should escape regex
metacharacters to avoid ReDoS and unintended matches; modify
extractShellFunctionFromSource to first escape `name` (e.g. replace
/[.*+?^${}()|[\]\\]/g with '\\$&') and then build the RegExp using the
escapedName so the pattern becomes new RegExp(`${escapedName}\\(\\)
\\{([\\s\\S]*?)^\\}`, "m").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 29578fbb-f10d-4958-b294-2a6a77fc5413
📒 Files selected for processing (3)
agents/hermes/start.shtest/e2e/docs/parity-map.yamltest/hermes-start.test.ts
Selective E2E Results — ✅ All requested jobs passedRun: 26102731689
|
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26103316766
|
|
@jyaunches addressed in For the Tirith startup path, I reduced I also rewrote the startup tests around that narrower contract: one test proves On the parity files, I tried to split them out by restoring Validation after the simplification included |
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26116747872
|
Summary
/sandbox/.hermes/.tirith-install-failedcontainsdownload_failed.Fixes #3793
NVBug: 6190755
Validation
bash -n agents/hermes/start.shnpm run build:clinpm run typecheck:clinpx vitest run test/hermes-start.test.ts src/lib/agent/onboard.test.tsnpx vitest run --project cli test/nemoclaw-start.test.ts test/ssrf-parity.test.tsSigned-off-by: Aaron Erickson aerickson@nvidia.com
Summary by CodeRabbit
New Features
Tests
Documentation