Parent epic: #3588
Goal
Migrate the hermes E2E coverage area into the layered scenario framework without porting legacy scripts line-for-line. Add the missing primitive layer first, then move assertions into scenario plans/suites with stable IDs.
This issue is also the refresh point for Hermes-related bugs discovered after this issue was created. It should become a spec-ready issue for /vd-spec and later validation-spec work: scenarios may be expected to PASS or FAIL depending on whether the product bug is already fixed, in flight, or still open. Failing scenario tests are acceptable when they reproduce a real current Hermes bug.
Legacy / current coverage to absorb
test-hermes-e2e.sh
test-hermes-inference-switch.sh
test-hermes-discord-e2e.sh
test-hermes-slack-e2e.sh
test-rebuild-hermes.sh where behavior is Hermes-specific rather than generic rebuild coverage
- Hermes-relevant assertions currently living in shared messaging/channel/security helpers, including:
test/e2e/lib/discord-gateway-proof.sh
test/e2e/lib/slack-api-proof.sh
test/e2e/lib/security-posture-assertions.sh
test/e2e/lib/inference-switch-retry.sh
Architecture contract
- Add or extend the domain primitive library:
test/e2e/validation_suites/lib/hermes.sh.
- Helpers must consume
$E2E_CONTEXT_DIR/context.env; suites must not reinstall, onboard, or rediscover setup state.
- Add/extend suite family entries in
test/e2e/validation_suites/suites.yaml.
- Add onboarding profiles/test plans/onboarding assertions only when the behavior belongs before expected-state validation.
- Emit stable assertion IDs using
<layer>.<domain>.<behavior>.
- Update the current coverage metadata/reporting source in
test/e2e/docs/ and the resolver/reporting tests. If parity-map.yaml has been removed or replaced, update the successor generated/static metadata instead of recreating stale infrastructure.
- Preserve compatibility with existing
run-scenario.sh <id> --plan-only behavior.
- Do not hide known current product bugs as retired coverage. Represent them as runnable scenarios with explicit expected outcome metadata (
expected_pass, expected_fail_current_bug, or equivalent), linked to the source issue/PR.
Required Hermes scenario families
1. Hermes baseline runtime
Scenarios should validate a successfully onboarded Hermes sandbox without reinstalling:
expected.hermes.runtime.gateway-health — Hermes gateway is reachable and reports healthy.
expected.hermes.runtime.agent-home — expected Hermes paths exist and are readable/writable only where intended.
expected.hermes.runtime.env-integrity — /sandbox/.hermes/.env is present, credentials are resolved through the intended boundary, and no secret values are printed in scenario logs.
expected.hermes.runtime.security-posture — after capability drop / restricted execution, startup does not rewrite RC files or fail on root-owned/writable-path assumptions.
Coverage source: test-hermes-e2e.sh, #3891 / PR #3914.
Initial expected outcome: PASS on current main for landed behavior from #3914.
2. Hermes inference switching and provider routing
Scenarios should validate route/config correctness separately from external model availability:
expected.hermes.inference.switch-route-state — nemohermes inference set updates the intended OpenShell/provider route.
expected.hermes.inference.env-immutable-on-switch — .env hash is not rewritten by inference switching.
expected.hermes.inference.gateway-pid-stable — Hermes gateway process remains running during switch.
expected.hermes.inference.inference-local-chat — https://inference.local/v1/chat/completions works from inside the sandbox after switch.
expected.hermes.inference.hermes-api-chat — Hermes API chat endpoint still responds after switch.
expected.hermes.inference.external-timeout-classification — external model endpoint timeout is not misclassified as a product regression when route/config checks already passed.
Coverage source: test-hermes-inference-switch.sh, #4111, #4145, PR #4152, PR #4158.
Initial expected outcome: PASS or expected external-failure classification on current main. The scenario must distinguish product failures from external provider timeout/flake.
3. Hermes messaging: Discord
Scenarios should cover both configuration parity and live/fake gateway behavior where secrets/runners allow:
expected.hermes.discord.config-schema — Hermes config contains the Discord account/channel/plugin fields required by current Hermes/OpenClaw runtime.
expected.hermes.discord.policy-egress — Discord REST and Gateway/WebSocket egress use the expected OpenShell policy/proxy path.
expected.hermes.discord.gateway-connects — the Discord gateway path can connect using fake or live Discord test harness.
expected.hermes.discord.empty-user-allowlist-open-dm-policy — when a guild/server is configured and the Discord user allowlist is empty, generated config should not fall back to confusing pairing behavior; it should represent the intended open-to-guild-members policy.
expected.hermes.discord.no-openclaw-pairing-copy — NemoHermes-only Discord UX must not instruct users to approve pairing through an openclaw command.
expected.hermes.discord.plugin-entry-registered — if shared channel generation applies to Hermes, selected Discord channel config must register the plugin entry needed for startup; if this is OpenClaw-only, the scenario metadata must explicitly classify it out of Hermes scope and link the equivalent OpenClaw scenario.
Coverage source: test-hermes-discord-e2e.sh, #4070 / PR #4126, #4246, prior Discord facade/gateway coverage.
Initial expected outcome:
4. Hermes messaging: Slack
Scenarios should cover Slack config, token handling, socket startup, and reconnect behavior:
expected.hermes.slack.config-enabled — selected Slack channel produces an enabled channel config and required token placeholders.
expected.hermes.slack.provider-state — Slack bot/app tokens are present as OpenShell-resolved providers, not plaintext secrets.
expected.hermes.slack.socket-mode-starts — Hermes/OpenClaw runtime starts Slack Socket Mode and attempts wss-primary.slack.com through the expected policy/proxy path.
expected.hermes.slack.no-secret-leak — Slack tokens are not emitted to logs, generated config, scenario artifacts, or failure output.
expected.hermes.slack.idle-reconnect-delivers-first-mention — after idle socket reconnect, the first inbound @mention is delivered to Hermes instead of being silently dropped.
Coverage source: test-hermes-slack-e2e.sh, #4189 / PR #4222, #3582.
Initial expected outcome:
5. Hermes messaging: Telegram
Scenarios should cover Telegram tool dispatch and onboarding guidance:
expected.hermes.telegram.first-message-tool-dispatch — the first inbound Telegram message must be handled by registered tool dispatch, not leaked as raw send_message pseudo-call text.
expected.hermes.telegram.single-polling-loop — gateway startup must not produce concurrent getUpdates polling loops that conflict and prevent sendMessage.
expected.hermes.telegram.privacy-mode-guidance — group-chat onboarding/post-onboard guidance surfaces Telegram privacy-mode and remove/re-add requirements for new bots.
expected.hermes.telegram.group-message-preconditions — validation metadata records when live group-message testing is blocked by bot privacy mode or missing test secrets.
Coverage source: #3893 / PR #4175, #4067 / PR #3925, #4068 / PR #4107.
Initial expected outcome:
6. Hermes rebuild and durable state
Scenarios should cover Hermes-specific rebuild behavior, not generic rebuild assertions already owned by #3814:
expected.hermes.rebuild.provider-credential-reused — rebuild preflight succeeds when provider credentials exist in OpenShell gateway even if host env is empty.
expected.hermes.rebuild.messaging-config-preserved — rebuild preserves configured Hermes messaging channels and provider hashes.
expected.hermes.rebuild.dashboard-forward-released — rebuild/channel stop-start flows do not fail because the old dashboard/API port forward is still host-bound.
expected.hermes.rebuild.post-rebuild-health — Hermes gateway/API is healthy after rebuild.
Coverage source: test-rebuild-hermes.sh, #3895 / PR #3918, #4146 / PR #4144, prior Hermes rebuild fixes.
Initial expected outcome:
7. Hermes policy and network boundaries
Scenarios should validate Hermes-specific policy behavior and provider path coverage:
expected.hermes.policy.inactive-messaging-not-preenabled — inactive messaging policies are not enabled in Hermes sandbox policy by default.
expected.hermes.policy.managed-inference-anthropic-messages-path — Hermes managed inference policy allows Anthropic-compatible /v1/messages when that provider is selected.
expected.hermes.policy.venv-python-egress — /opt/hermes/.venv/bin/python outbound requests use the intended policy allowlist/proxy path and are not stuck behind an interactive approval gate.
expected.hermes.policy.no-phantom-allowlist — Hermes policy does not include unrelated permissive endpoints or binaries without explicit opt-in.
Coverage source: #3981 / PR #3984, #4230, #3225, related Hermes policy-additions changes.
Initial expected outcome:
8. Hermes provider compatibility
Scenarios should cover provider-specific runtime behavior after onboard smoke succeeds:
expected.hermes.provider.anthropic-compatible-chat — after Anthropic-compatible provider onboard succeeds, an in-sandbox Hermes chat succeeds through the managed inference path.
expected.hermes.provider.gemini-tool-schema-compatible — after Gemini provider onboard succeeds, Hermes tool schemas are accepted and chat succeeds.
expected.hermes.provider.onboard-smoke-not-sufficient — validation distinguishes host-side onboard smoke success from in-sandbox Hermes runtime chat success.
Coverage source: #4230, #4232.
Initial expected outcome: both #4230 and #4232 should FAIL on current main until product fixes land.
9. Hermes security / shields / TUI usability
Scenarios should cover Hermes security controls and interactive usability where platform supports it:
expected.hermes.security.shields-up-down-macos-vm-driver — on macOS Docker Desktop / OpenShell VM driver, nemohermes <sandbox> shields up/down should not call the non-existent openshell-cluster-nemoclaw k3s container.
expected.hermes.security.shields-config-locked — after shields up, Hermes config is actually locked and status reflects true sandbox filesystem state.
expected.hermes.tui.history-writable — Hermes TUI does not spam permission errors for /sandbox/.hermes/.hermes_history, and /exit can exit cleanly.
Coverage source: #4245, #2432, Hermes security posture assertions from #3891 / PR #3914.
Initial expected outcome:
Recent Hermes-related issue inventory to encode in coverage metadata
| Issue |
Status at refresh |
Fix / PR signal |
Scenario expectation |
| #3891 |
Closed |
PR #3914 merged; added test-hermes-e2e.sh security posture coverage |
PASS |
| #3893 |
Open |
PR #4175 open; unit coverage only |
FAIL until fixed |
| #3895 |
Open |
PR #3918 open; no E2E yet |
FAIL until fixed |
| #3981 |
Closed |
PR #3984 merged; unit/policy coverage |
PASS |
| #4067 |
Closed |
PR #3925 merged agent runtime dependency update |
PASS or gated live-secret evidence |
| #4068 |
Closed |
PR #4107 merged docs/onboarding guidance |
PASS for guidance assertion |
| #4070 |
Open |
PR #4126 open; unit regression only |
FAIL until fixed |
| #4111 |
Closed |
nightly Hermes inference-switch timeout; later E2E hardening in PR #4152/#4158 |
PASS or expected external-failure classification |
| #4145 |
Closed |
nightly inference-switch timeout; later E2E hardening in PR #4152/#4158 |
PASS or expected external-failure classification |
| #4146 |
Closed but PR #4144 still open at refresh |
channels stop/start rebuild port-forward race |
FAIL until harness/fix lands or prove fixed |
| #4189 |
Open |
PR #4222 closed/unmerged; unit-only attempt |
FAIL until fixed |
| #4230 |
Open |
no fix PR found |
FAIL until fixed |
| #4232 |
Open |
no fix PR found |
FAIL until fixed |
| #4245 |
Open |
no fix PR found |
FAIL on macOS VM-driver profile until fixed |
| #4246 |
Open |
no fix PR found |
FAIL if shared/Hermes-applicable; otherwise classify out-of-scope with evidence |
| #3582 |
Open older issue, still Hermes-relevant |
no fix PR found |
FAIL or live-secret gated until fixed |
| #3225 |
Open older issue, still Hermes-relevant |
PR #3228 closed/unmerged |
FAIL or platform-gated until fixed |
| #2432 |
Open older issue, still Hermes-relevant |
PR #2473 open |
FAIL or platform-gated until fixed |
Validation expectations
The validation spec generated from this issue should include both passing and failing expectations:
- For landed fixes, scenario execution should be GREEN on current main or explicitly explain platform/secret gating.
- For open product bugs, scenario execution should be RED on current main and the failure message should point to the linked issue.
- For in-flight PRs, validation should run against both current main and the PR branch when practical:
- main should reproduce the failure;
- PR branch should flip the scenario to pass.
- For live messaging scenarios requiring Slack/Discord/Telegram credentials, provide a fake-provider/fake-gateway assertion where possible and mark the live assertion with runner/secret requirements.
- For external provider flakes, separate route/config assertions from live model availability so transient model/API outages do not mask product regressions.
Acceptance criteria
- Domain primitive helpers exist and are used by migrated suite steps.
- Highest-value assertions from the listed legacy Hermes coverage are mapped to stable scenario assertion IDs.
- All Hermes-related issues in the inventory above are represented by a scenario, expected-failure scenario, or explicit out-of-scope classification with evidence.
- Known open Hermes bugs are allowed to produce failing scenario runs; they must not be silently retired.
- Remaining legacy assertions are explicitly classified as
covered, expected_fail_current_bug, deferred_platform_or_secret, or retired with layer/domain metadata.
- Scenario framework tests pass for resolver/schema/suite/coverage-report validation.
- The coverage report makes this domain visible as covered, expected-failing, deferred, or retired.
Parent epic: #3588
Goal
Migrate the
hermesE2E coverage area into the layered scenario framework without porting legacy scripts line-for-line. Add the missing primitive layer first, then move assertions into scenario plans/suites with stable IDs.This issue is also the refresh point for Hermes-related bugs discovered after this issue was created. It should become a spec-ready issue for
/vd-specand later validation-spec work: scenarios may be expected to PASS or FAIL depending on whether the product bug is already fixed, in flight, or still open. Failing scenario tests are acceptable when they reproduce a real current Hermes bug.Legacy / current coverage to absorb
test-hermes-e2e.shtest-hermes-inference-switch.shtest-hermes-discord-e2e.shtest-hermes-slack-e2e.shtest-rebuild-hermes.shwhere behavior is Hermes-specific rather than generic rebuild coveragetest/e2e/lib/discord-gateway-proof.shtest/e2e/lib/slack-api-proof.shtest/e2e/lib/security-posture-assertions.shtest/e2e/lib/inference-switch-retry.shArchitecture contract
test/e2e/validation_suites/lib/hermes.sh.$E2E_CONTEXT_DIR/context.env; suites must not reinstall, onboard, or rediscover setup state.test/e2e/validation_suites/suites.yaml.<layer>.<domain>.<behavior>.test/e2e/docs/and the resolver/reporting tests. Ifparity-map.yamlhas been removed or replaced, update the successor generated/static metadata instead of recreating stale infrastructure.run-scenario.sh <id> --plan-onlybehavior.expected_pass,expected_fail_current_bug, or equivalent), linked to the source issue/PR.Required Hermes scenario families
1. Hermes baseline runtime
Scenarios should validate a successfully onboarded Hermes sandbox without reinstalling:
expected.hermes.runtime.gateway-health— Hermes gateway is reachable and reports healthy.expected.hermes.runtime.agent-home— expected Hermes paths exist and are readable/writable only where intended.expected.hermes.runtime.env-integrity—/sandbox/.hermes/.envis present, credentials are resolved through the intended boundary, and no secret values are printed in scenario logs.expected.hermes.runtime.security-posture— after capability drop / restricted execution, startup does not rewrite RC files or fail on root-owned/writable-path assumptions.Coverage source:
test-hermes-e2e.sh, #3891 / PR #3914.Initial expected outcome: PASS on current main for landed behavior from #3914.
2. Hermes inference switching and provider routing
Scenarios should validate route/config correctness separately from external model availability:
expected.hermes.inference.switch-route-state—nemohermes inference setupdates the intended OpenShell/provider route.expected.hermes.inference.env-immutable-on-switch—.envhash is not rewritten by inference switching.expected.hermes.inference.gateway-pid-stable— Hermes gateway process remains running during switch.expected.hermes.inference.inference-local-chat—https://inference.local/v1/chat/completionsworks from inside the sandbox after switch.expected.hermes.inference.hermes-api-chat— Hermes API chat endpoint still responds after switch.expected.hermes.inference.external-timeout-classification— external model endpoint timeout is not misclassified as a product regression when route/config checks already passed.Coverage source:
test-hermes-inference-switch.sh, #4111, #4145, PR #4152, PR #4158.Initial expected outcome: PASS or expected external-failure classification on current main. The scenario must distinguish product failures from external provider timeout/flake.
3. Hermes messaging: Discord
Scenarios should cover both configuration parity and live/fake gateway behavior where secrets/runners allow:
expected.hermes.discord.config-schema— Hermes config contains the Discord account/channel/plugin fields required by current Hermes/OpenClaw runtime.expected.hermes.discord.policy-egress— Discord REST and Gateway/WebSocket egress use the expected OpenShell policy/proxy path.expected.hermes.discord.gateway-connects— the Discord gateway path can connect using fake or live Discord test harness.expected.hermes.discord.empty-user-allowlist-open-dm-policy— when a guild/server is configured and the Discord user allowlist is empty, generated config should not fall back to confusing pairing behavior; it should represent the intended open-to-guild-members policy.expected.hermes.discord.no-openclaw-pairing-copy— NemoHermes-only Discord UX must not instruct users to approve pairing through anopenclawcommand.expected.hermes.discord.plugin-entry-registered— if shared channel generation applies to Hermes, selected Discord channel config must register the plugin entry needed for startup; if this is OpenClaw-only, the scenario metadata must explicitly classify it out of Hermes scope and link the equivalent OpenClaw scenario.Coverage source:
test-hermes-discord-e2e.sh, #4070 / PR #4126, #4246, prior Discord facade/gateway coverage.Initial expected outcome:
4. Hermes messaging: Slack
Scenarios should cover Slack config, token handling, socket startup, and reconnect behavior:
expected.hermes.slack.config-enabled— selected Slack channel produces an enabled channel config and required token placeholders.expected.hermes.slack.provider-state— Slack bot/app tokens are present as OpenShell-resolved providers, not plaintext secrets.expected.hermes.slack.socket-mode-starts— Hermes/OpenClaw runtime starts Slack Socket Mode and attemptswss-primary.slack.comthrough the expected policy/proxy path.expected.hermes.slack.no-secret-leak— Slack tokens are not emitted to logs, generated config, scenario artifacts, or failure output.expected.hermes.slack.idle-reconnect-delivers-first-mention— after idle socket reconnect, the first inbound @mention is delivered to Hermes instead of being silently dropped.Coverage source:
test-hermes-slack-e2e.sh, #4189 / PR #4222, #3582.Initial expected outcome:
enabled: true, env vars present, and plugin installed #4189 should FAIL until Slack generated channel config reliably starts the channel.5. Hermes messaging: Telegram
Scenarios should cover Telegram tool dispatch and onboarding guidance:
expected.hermes.telegram.first-message-tool-dispatch— the first inbound Telegram message must be handled by registered tool dispatch, not leaked as rawsend_messagepseudo-call text.expected.hermes.telegram.single-polling-loop— gateway startup must not produce concurrentgetUpdatespolling loops that conflict and preventsendMessage.expected.hermes.telegram.privacy-mode-guidance— group-chat onboarding/post-onboard guidance surfaces Telegram privacy-mode and remove/re-add requirements for new bots.expected.hermes.telegram.group-message-preconditions— validation metadata records when live group-message testing is blocked by bot privacy mode or missing test secrets.Coverage source: #3893 / PR #4175, #4067 / PR #3925, #4068 / PR #4107.
Initial expected outcome:
6. Hermes rebuild and durable state
Scenarios should cover Hermes-specific rebuild behavior, not generic rebuild assertions already owned by #3814:
expected.hermes.rebuild.provider-credential-reused— rebuild preflight succeeds when provider credentials exist in OpenShell gateway even if host env is empty.expected.hermes.rebuild.messaging-config-preserved— rebuild preserves configured Hermes messaging channels and provider hashes.expected.hermes.rebuild.dashboard-forward-released— rebuild/channel stop-start flows do not fail because the old dashboard/API port forward is still host-bound.expected.hermes.rebuild.post-rebuild-health— Hermes gateway/API is healthy after rebuild.Coverage source:
test-rebuild-hermes.sh, #3895 / PR #3918, #4146 / PR #4144, prior Hermes rebuild fixes.Initial expected outcome:
7. Hermes policy and network boundaries
Scenarios should validate Hermes-specific policy behavior and provider path coverage:
expected.hermes.policy.inactive-messaging-not-preenabled— inactive messaging policies are not enabled in Hermes sandbox policy by default.expected.hermes.policy.managed-inference-anthropic-messages-path— Hermes managed inference policy allows Anthropic-compatible/v1/messageswhen that provider is selected.expected.hermes.policy.venv-python-egress—/opt/hermes/.venv/bin/pythonoutbound requests use the intended policy allowlist/proxy path and are not stuck behind an interactive approval gate.expected.hermes.policy.no-phantom-allowlist— Hermes policy does not include unrelated permissive endpoints or binaries without explicit opt-in.Coverage source: #3981 / PR #3984, #4230, #3225, related Hermes policy-additions changes.
Initial expected outcome:
/v1/messagespath coverage is fixed.8. Hermes provider compatibility
Scenarios should cover provider-specific runtime behavior after onboard smoke succeeds:
expected.hermes.provider.anthropic-compatible-chat— after Anthropic-compatible provider onboard succeeds, an in-sandbox Hermes chat succeeds through the managed inference path.expected.hermes.provider.gemini-tool-schema-compatible— after Gemini provider onboard succeeds, Hermes tool schemas are accepted and chat succeeds.expected.hermes.provider.onboard-smoke-not-sufficient— validation distinguishes host-side onboard smoke success from in-sandbox Hermes runtime chat success.Coverage source: #4230, #4232.
Initial expected outcome: both #4230 and #4232 should FAIL on current main until product fixes land.
9. Hermes security / shields / TUI usability
Scenarios should cover Hermes security controls and interactive usability where platform supports it:
expected.hermes.security.shields-up-down-macos-vm-driver— on macOS Docker Desktop / OpenShell VM driver,nemohermes <sandbox> shields up/downshould not call the non-existentopenshell-cluster-nemoclawk3s container.expected.hermes.security.shields-config-locked— after shields up, Hermes config is actually locked and status reflects true sandbox filesystem state.expected.hermes.tui.history-writable— Hermes TUI does not spam permission errors for/sandbox/.hermes/.hermes_history, and/exitcan exit cleanly.Coverage source: #4245, #2432, Hermes security posture assertions from #3891 / PR #3914.
Initial expected outcome:
Recent Hermes-related issue inventory to encode in coverage metadata
test-hermes-e2e.shsecurity posture coverageValidation expectations
The validation spec generated from this issue should include both passing and failing expectations:
Acceptance criteria
covered,expected_fail_current_bug,deferred_platform_or_secret, orretiredwith layer/domain metadata.