fix(hermes): suspend kanban dispatcher under shields-up by laitingsheng · Pull Request #4703 · NVIDIA/NemoClaw

laitingsheng · 2026-06-03T06:34:13Z

Summary

Under shields-up the Hermes config root /sandbox/.hermes is root-owned read-only, so the gateway's embedded kanban dispatcher fails every tick when kanban_db.connect() runs PRAGMA journal_mode=WAL against the locked kanban.db. The dispatcher reconnects each cycle, filling /tmp/gateway.log with sqlite3.OperationalError: attempt to write a readonly database tracebacks while the health endpoint stays 200, so the breakage is silent. This change exports the upstream-supported HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override when the config root is locked, cleanly suspending the dispatcher and notifier loops without touching upstream Hermes.

Related Issue

Fixes #4299

Changes

agents/hermes/start.sh: new apply_shields_up_runtime_env helper that detects the shields-up posture via the existing hermes_config_root_is_locked probe and exports HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 if the caller has not set it. Called in both the non-root and root entrypoint paths, after config-integrity verification and before gateway launch so the gateway child inherits it.
test/hermes-start.test.ts: three new cases asserting the override fires under shields-up, stays off when shields are down, and preserves a caller-supplied value.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

`npx prek run --all-files` passes
`npm test` passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
`npm run docs` builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

New Features
- Hermes now automatically disables the kanban dispatcher in the gateway when the configuration root is locked; manual overrides provided by users are preserved.
Tests
- Added tests covering dispatcher behavior for locked, unlocked, and user-override scenarios, including emitted stderr messages for the forced disablement.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-06-03T06:34:24Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 884465e6-1ffa-4bd5-af21-ebe3251c9527

📥 Commits

Reviewing files that changed from the base of the PR and between 54d4f7a and bb1bf59.

📒 Files selected for processing (1)

test/hermes-start.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

test/hermes-start.test.ts

📝 Walkthrough

Walkthrough

Hermes startup now defines apply_shields_up_runtime_env() to detect locked config roots and disable the kanban dispatcher by setting HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 unless caller-supplied. This function is integrated into both non-root and root startup paths before provider placeholder refresh, with tests verifying locked/unlocked and override-preservation behavior.

Changes

Shields-up kanban dispatcher override

Layer / File(s)	Summary
Shields-up kanban dispatcher override function and startup integration `agents/hermes/start.sh`	Defines `apply_shields_up_runtime_env()` which checks whether the Hermes config root is locked and injects `HERMES_KANBAN_DISPATCH_IN_GATEWAY=0` (with Shields-up logging) unless already set. The function is called in both non-root (line 850) and root (line 899) startup branches before provider placeholder refresh and messaging configuration.
Test coverage for shields-up override behavior `test/hermes-start.test.ts`	Adds `LOCKED_HERMES_CONFIG_STAT_MOCK`, updates runtime script generation to reuse the mock, and introduces `runShieldsUpRuntimeEnv` plus a Vitest suite with three assertions: dispatcher forced to `0` when config root is locked, unchanged when unlocked, and caller-supplied overrides preserved under shields-up.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

Integration: Hermes, Sandbox

Suggested reviewers

miyoungc

Poem

"I nibble logs beneath the moonlit ground,
Shields rise high where quiet files are found.
Hermes hushes kanban's frantic drum,
No scribbled tracebacks beating like a drum.
Tests hop close — the garden sleeps, all sound."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(hermes): suspend kanban dispatcher under shields-up' clearly and concisely summarizes the main change—implementing a fix to suspend the kanban dispatcher when the Hermes config is locked under shields-up.
Linked Issues check	✅ Passed	The pull request fully addresses all coding requirements from issue `#4299`: it detects shields-up via the config root lock probe, exports the upstream-supported environment override HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 to suspend the dispatcher, and includes comprehensive tests validating the override behavior under locked/unlocked configurations.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the issue: the helper function and its integration in the startup script detect and handle shields-up conditions, and the tests validate the new functionality without introducing unrelated modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/4299-kanban-dispatcher-shields-up

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-03T06:34:46Z

PR Review Advisor

Findings: 0 needs attention, 3 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 2 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: agents/hermes/start.sh shields-up kanban dispatcher workaround: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: apply_shields_up_runtime_env is a localized runtime-env workaround; the linked issue offers three possible resolutions, and this PR implements the env-disable route without runtime log validation.
Shields-up dispatcher suspension can be bypassed by preset env (agents/hermes/start.sh:327): The new helper only sets HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 when the variable is unset or empty. A caller-supplied value such as 1 is preserved under the locked config-root posture, which can re-enable the embedded dispatcher in exactly the read-only shields-up state this patch is intended to protect. That may be intentional compatibility, but it weakens the runtime policy if shields-up is supposed to be authoritative.
- Recommendation: If shields-up must force the safe posture, set HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 unconditionally when hermes_config_root_is_locked is true, or at least reject/log-and-override enabling values. If manual override is intentional, document who is allowed to supply it and why it is safe despite the readonly kanban.db failure mode.
- Evidence: agents/hermes/start.sh:325-331 exports 0 only inside `if [ -z "${HERMES_KANBAN_DISPATCH_IN_GATEWAY:-}" ]`; test/hermes-start.test.ts:543-549 explicitly verifies that preset value `1` is preserved under shields-up.
Issue's gateway-log acceptance path is not runtime-validated (test/hermes-start.test.ts:523): The tests prove the helper exports the environment default for a mocked locked config root, but they do not launch a Hermes gateway or assert the linked issue's observable requirement that /tmp/gateway.log under shields-up stops producing Traceback/FATAL/Aborting output. The fix also depends on upstream Hermes honoring HERMES_KANBAN_DISPATCH_IN_GATEWAY=0, which is not verified here.
- Recommendation: Add or identify a targeted runtime/integration validation that starts the Hermes gateway in a locked config-root posture, confirms the gateway child inherits HERMES_KANBAN_DISPATCH_IN_GATEWAY=0, and checks the gateway log for absence of the repeated kanban sqlite Traceback/FATAL/Aborting pattern. If full gateway validation is too heavy for unit tests, document the manual/runtime validation tied to this workaround.
- Evidence: Issue [DGX Spark][Sandbox] Hermes gateway kanban dispatcher logs repeated Traceback under shields-up — sqlite3 OperationalError on readonly database #4299 states `gateway.log` under shields-up `MUST NOT contain "Traceback", "FATAL", or "Aborting"`; the new tests in test/hermes-start.test.ts:523-549 only inspect the shell helper's printed `KANBAN=` value and stderr message.

🌱 Nice ideas

None.

Since last review details

Current findings:

Source-of-truth review needed: agents/hermes/start.sh shields-up kanban dispatcher workaround: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: apply_shields_up_runtime_env is a localized runtime-env workaround; the linked issue offers three possible resolutions, and this PR implements the env-disable route without runtime log validation.
Shields-up dispatcher suspension can be bypassed by preset env (agents/hermes/start.sh:327): The new helper only sets HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 when the variable is unset or empty. A caller-supplied value such as 1 is preserved under the locked config-root posture, which can re-enable the embedded dispatcher in exactly the read-only shields-up state this patch is intended to protect. That may be intentional compatibility, but it weakens the runtime policy if shields-up is supposed to be authoritative.
- Recommendation: If shields-up must force the safe posture, set HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 unconditionally when hermes_config_root_is_locked is true, or at least reject/log-and-override enabling values. If manual override is intentional, document who is allowed to supply it and why it is safe despite the readonly kanban.db failure mode.
- Evidence: agents/hermes/start.sh:325-331 exports 0 only inside `if [ -z "${HERMES_KANBAN_DISPATCH_IN_GATEWAY:-}" ]`; test/hermes-start.test.ts:543-549 explicitly verifies that preset value `1` is preserved under shields-up.
Issue's gateway-log acceptance path is not runtime-validated (test/hermes-start.test.ts:523): The tests prove the helper exports the environment default for a mocked locked config root, but they do not launch a Hermes gateway or assert the linked issue's observable requirement that /tmp/gateway.log under shields-up stops producing Traceback/FATAL/Aborting output. The fix also depends on upstream Hermes honoring HERMES_KANBAN_DISPATCH_IN_GATEWAY=0, which is not verified here.
- Recommendation: Add or identify a targeted runtime/integration validation that starts the Hermes gateway in a locked config-root posture, confirms the gateway child inherits HERMES_KANBAN_DISPATCH_IN_GATEWAY=0, and checks the gateway log for absence of the repeated kanban sqlite Traceback/FATAL/Aborting pattern. If full gateway validation is too heavy for unit tests, document the manual/runtime validation tied to this workaround.
- Evidence: Issue [DGX Spark][Sandbox] Hermes gateway kanban dispatcher logs repeated Traceback under shields-up — sqlite3 OperationalError on readonly database #4299 states `gateway.log` under shields-up `MUST NOT contain "Traceback", "FATAL", or "Aborting"`; the new tests in test/hermes-start.test.ts:523-549 only inspect the shell helper's printed `KANBAN=` value and stderr message.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

github-actions · 2026-06-03T06:37:21Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

github-actions · 2026-06-03T06:37:22Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

github-actions · 2026-06-03T06:41:39Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26868069659
Target ref: 54d4f7a4c5836e06e385c3f134df02182db291e4
Workflow ref: main
Requested jobs: hermes-root-entrypoint-smoke-e2e,hermes-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
hermes-e2e	✅ success
hermes-root-entrypoint-smoke-e2e	✅ success

coderabbitai

🧹 Nitpick comments (2)

test/hermes-start.test.ts (2)
484-501: ⚡ Quick win

Extract the duplicated stat() mock into a shared helper.

This block is a near-verbatim copy of the stat() mock in runHermesGatewayRuntimeCleanup (Lines 222-235). The two must stay byte-for-byte in sync to keep shields-up detection consistent; if hermes_config_root_is_locked's stat invocations change, both copies have to be updated, which is easy to miss.

Consider hoisting it to a module-level constant (e.g., LOCKED_CONFIG_STAT_MOCK) reused by both helpers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/hermes-start.test.ts` around lines 484 - 501, The duplicated multi-line
stat() mock (currently in the local variable statMock and duplicated in
runHermesGatewayRuntimeCleanup) should be extracted to a single module-level
constant (e.g., LOCKED_CONFIG_STAT_MOCK) and both places should reference that
constant; update the test file to replace the inline statMock definition with a
reference to LOCKED_CONFIG_STAT_MOCK and update runHermesGatewayRuntimeCleanup
to use the same constant so the byte-for-byte mock remains in one place and is
reused by hermes_config_root_is_locked and the other helper.
510-518: ⚡ Quick win

Runtime replication: id mock isn’t needed for this snippet; consider aligning set flags.

apply_shields_up_runtime_env (and its dependency hermes_config_root_is_locked / hermes_config_path_is_locked) never branches on id -u in agents/hermes/start.sh, so the lack of an id() mock in test/hermes-start.test.ts lines 510-518 shouldn’t change behavior.

The helper script uses set -uo pipefail while agents/hermes/start.sh uses set -euo pipefail; for strict fidelity (and to catch unexpected command failures), align the helper to set -euo pipefail—though in these extracted functions, stat failures are already masked (... || ... || true) and apply_shields_up_runtime_env is explicitly guarded with || return 0.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/hermes-start.test.ts` around lines 510 - 518, Remove the unnecessary
`id` mock from the test snippet because apply_shields_up_runtime_env and its
dependent functions hermes_config_root_is_locked / hermes_config_path_is_locked
never branch on `id -u`; instead, update the helper invocation that currently
uses "set -uo pipefail" to "set -euo pipefail" so the extracted functions mirror
agents/hermes/start.sh's strict flags and will fail on unexpected command errors
(note these functions already mask stat failures and
apply_shields_up_runtime_env is guarded with `|| return 0`).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/hermes-start.test.ts`:
- Around line 484-501: The duplicated multi-line stat() mock (currently in the
local variable statMock and duplicated in runHermesGatewayRuntimeCleanup) should
be extracted to a single module-level constant (e.g., LOCKED_CONFIG_STAT_MOCK)
and both places should reference that constant; update the test file to replace
the inline statMock definition with a reference to LOCKED_CONFIG_STAT_MOCK and
update runHermesGatewayRuntimeCleanup to use the same constant so the
byte-for-byte mock remains in one place and is reused by
hermes_config_root_is_locked and the other helper.
- Around line 510-518: Remove the unnecessary `id` mock from the test snippet
because apply_shields_up_runtime_env and its dependent functions
hermes_config_root_is_locked / hermes_config_path_is_locked never branch on `id
-u`; instead, update the helper invocation that currently uses "set -uo
pipefail" to "set -euo pipefail" so the extracted functions mirror
agents/hermes/start.sh's strict flags and will fail on unexpected command errors
(note these functions already mask stat failures and
apply_shields_up_runtime_env is guarded with `|| return 0`).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 421d89c6-2db5-47dc-b4d8-a777f97d1a80

📥 Commits

Reviewing files that changed from the base of the PR and between 46bb3a0 and 54d4f7a.

📒 Files selected for processing (2)

agents/hermes/start.sh
test/hermes-start.test.ts

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…tcher-shields-up Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

fix(hermes): suspend kanban dispatcher under shields-up

54d4f7a

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

laitingsheng added fix integration: hermes Hermes integration behavior labels Jun 3, 2026

laitingsheng added 2 commits June 3, 2026 07:44

test(hermes): extract shared locked-config stat mock

a5a8dff

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Merge remote-tracking branch 'origin/main' into fix/4299-kanban-dispa…

bb1bf59

…tcher-shields-up Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added bug-fix PR fixes a bug or regression v0.0.58 Release target labels Jun 3, 2026

wscurran removed fix v0.0.58 Release target labels Jun 3, 2026

laitingsheng added the v0.0.58 Release target label Jun 3, 2026

wscurran requested a review from cv June 3, 2026 16:49

cv self-assigned this Jun 3, 2026

cv approved these changes Jun 3, 2026

View reviewed changes

cv merged commit 5fcfcd9 into main Jun 3, 2026
34 of 35 checks passed

cv deleted the fix/4299-kanban-dispatcher-shields-up branch June 3, 2026 19:47

miyoungc mentioned this pull request Jun 4, 2026

docs: refresh 0.0.58 release docs and refresh skills #4743

Merged

coderabbitai Bot mentioned this pull request Jun 5, 2026

fix(hermes): keep remote secrets out of sandbox surfaces #4771

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(hermes): suspend kanban dispatcher under shields-up#4703

fix(hermes): suspend kanban dispatcher under shields-up#4703
cv merged 3 commits into
mainfrom
fix/4299-kanban-dispatcher-shields-up

laitingsheng commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented Jun 3, 2026 •

edited

Loading

E2E Scenario Advisor

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

laitingsheng commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Uh oh!

github-actions Bot commented Jun 3, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

laitingsheng commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading

github-actions Bot commented Jun 3, 2026 •

edited

Loading