test(e2e): migrate Hermes feature coverage to scenario suites

Parent epic: #3588

## Goal

Migrate the `hermes` E2E coverage area into the layered scenario framework without porting legacy scripts line-for-line. Add the missing primitive layer first, then move assertions into scenario plans/suites with stable IDs.

This issue is also the refresh point for Hermes-related bugs discovered after this issue was created. It should become a spec-ready issue for `/vd-spec` and later validation-spec work: scenarios may be expected to PASS or FAIL depending on whether the product bug is already fixed, in flight, or still open. Failing scenario tests are acceptable when they reproduce a real current Hermes bug.

## Legacy / current coverage to absorb

- `test-hermes-e2e.sh`
- `test-hermes-inference-switch.sh`
- `test-hermes-discord-e2e.sh`
- `test-hermes-slack-e2e.sh`
- `test-rebuild-hermes.sh` where behavior is Hermes-specific rather than generic rebuild coverage
- Hermes-relevant assertions currently living in shared messaging/channel/security helpers, including:
  - `test/e2e/lib/discord-gateway-proof.sh`
  - `test/e2e/lib/slack-api-proof.sh`
  - `test/e2e/lib/security-posture-assertions.sh`
  - `test/e2e/lib/inference-switch-retry.sh`

## Architecture contract

- Add or extend the domain primitive library: `test/e2e/validation_suites/lib/hermes.sh`.
- Helpers must consume `$E2E_CONTEXT_DIR/context.env`; suites must not reinstall, onboard, or rediscover setup state.
- Add/extend suite family entries in `test/e2e/validation_suites/suites.yaml`.
- Add onboarding profiles/test plans/onboarding assertions only when the behavior belongs before expected-state validation.
- Emit stable assertion IDs using `<layer>.<domain>.<behavior>`.
- Update the current coverage metadata/reporting source in `test/e2e/docs/` and the resolver/reporting tests. If `parity-map.yaml` has been removed or replaced, update the successor generated/static metadata instead of recreating stale infrastructure.
- Preserve compatibility with existing `run-scenario.sh <id> --plan-only` behavior.
- Do not hide known current product bugs as retired coverage. Represent them as runnable scenarios with explicit expected outcome metadata (`expected_pass`, `expected_fail_current_bug`, or equivalent), linked to the source issue/PR.

## Required Hermes scenario families

### 1. Hermes baseline runtime

Scenarios should validate a successfully onboarded Hermes sandbox without reinstalling:

- `expected.hermes.runtime.gateway-health` — Hermes gateway is reachable and reports healthy.
- `expected.hermes.runtime.agent-home` — expected Hermes paths exist and are readable/writable only where intended.
- `expected.hermes.runtime.env-integrity` — `/sandbox/.hermes/.env` is present, credentials are resolved through the intended boundary, and no secret values are printed in scenario logs.
- `expected.hermes.runtime.security-posture` — after capability drop / restricted execution, startup does not rewrite RC files or fail on root-owned/writable-path assumptions.

Coverage source: `test-hermes-e2e.sh`, #3891 / PR #3914.

Initial expected outcome: PASS on current main for landed behavior from #3914.

### 2. Hermes inference switching and provider routing

Scenarios should validate route/config correctness separately from external model availability:

- `expected.hermes.inference.switch-route-state` — `nemohermes inference set` updates the intended OpenShell/provider route.
- `expected.hermes.inference.env-immutable-on-switch` — `.env` hash is not rewritten by inference switching.
- `expected.hermes.inference.gateway-pid-stable` — Hermes gateway process remains running during switch.
- `expected.hermes.inference.inference-local-chat` — `https://inference.local/v1/chat/completions` works from inside the sandbox after switch.
- `expected.hermes.inference.hermes-api-chat` — Hermes API chat endpoint still responds after switch.
- `expected.hermes.inference.external-timeout-classification` — external model endpoint timeout is not misclassified as a product regression when route/config checks already passed.

Coverage source: `test-hermes-inference-switch.sh`, #4111, #4145, PR #4152, PR #4158.

Initial expected outcome: PASS or expected external-failure classification on current main. The scenario must distinguish product failures from external provider timeout/flake.

### 3. Hermes messaging: Discord

Scenarios should cover both configuration parity and live/fake gateway behavior where secrets/runners allow:

- `expected.hermes.discord.config-schema` — Hermes config contains the Discord account/channel/plugin fields required by current Hermes/OpenClaw runtime.
- `expected.hermes.discord.policy-egress` — Discord REST and Gateway/WebSocket egress use the expected OpenShell policy/proxy path.
- `expected.hermes.discord.gateway-connects` — the Discord gateway path can connect using fake or live Discord test harness.
- `expected.hermes.discord.empty-user-allowlist-open-dm-policy` — when a guild/server is configured and the Discord user allowlist is empty, generated config should not fall back to confusing pairing behavior; it should represent the intended open-to-guild-members policy.
- `expected.hermes.discord.no-openclaw-pairing-copy` — NemoHermes-only Discord UX must not instruct users to approve pairing through an `openclaw` command.
- `expected.hermes.discord.plugin-entry-registered` — if shared channel generation applies to Hermes, selected Discord channel config must register the plugin entry needed for startup; if this is OpenClaw-only, the scenario metadata must explicitly classify it out of Hermes scope and link the equivalent OpenClaw scenario.

Coverage source: `test-hermes-discord-e2e.sh`, #4070 / PR #4126, #4246, prior Discord facade/gateway coverage.

Initial expected outcome:

- Existing gateway/config parity should PASS where legacy test already passes.
- #4070 should FAIL on current main until PR #4126 or equivalent lands.
- #4246 should FAIL if the same missing plugin-entry path applies to Hermes/shared channel generation; otherwise mark as explicitly out-of-scope for Hermes with evidence.

### 4. Hermes messaging: Slack

Scenarios should cover Slack config, token handling, socket startup, and reconnect behavior:

- `expected.hermes.slack.config-enabled` — selected Slack channel produces an enabled channel config and required token placeholders.
- `expected.hermes.slack.provider-state` — Slack bot/app tokens are present as OpenShell-resolved providers, not plaintext secrets.
- `expected.hermes.slack.socket-mode-starts` — Hermes/OpenClaw runtime starts Slack Socket Mode and attempts `wss-primary.slack.com` through the expected policy/proxy path.
- `expected.hermes.slack.no-secret-leak` — Slack tokens are not emitted to logs, generated config, scenario artifacts, or failure output.
- `expected.hermes.slack.idle-reconnect-delivers-first-mention` — after idle socket reconnect, the first inbound @mention is delivered to Hermes instead of being silently dropped.

Coverage source: `test-hermes-slack-e2e.sh`, #4189 / PR #4222, #3582.

Initial expected outcome:

- Basic config/token/no-secret assertions may PASS depending on current main.
- #4189 should FAIL until Slack generated channel config reliably starts the channel.
- #3582 should FAIL or be marked platform/live-secret gated until socket reconnect delivery is fixed/proven.

### 5. Hermes messaging: Telegram

Scenarios should cover Telegram tool dispatch and onboarding guidance:

- `expected.hermes.telegram.first-message-tool-dispatch` — the first inbound Telegram message must be handled by registered tool dispatch, not leaked as raw `send_message` pseudo-call text.
- `expected.hermes.telegram.single-polling-loop` — gateway startup must not produce concurrent `getUpdates` polling loops that conflict and prevent `sendMessage`.
- `expected.hermes.telegram.privacy-mode-guidance` — group-chat onboarding/post-onboard guidance surfaces Telegram privacy-mode and remove/re-add requirements for new bots.
- `expected.hermes.telegram.group-message-preconditions` — validation metadata records when live group-message testing is blocked by bot privacy mode or missing test secrets.

Coverage source: #3893 / PR #4175, #4067 / PR #3925, #4068 / PR #4107.

Initial expected outcome:

- #4067 and #4068 should PASS for landed fixes when exercised in the right profile, or report live-secret/platform gating with evidence.
- #3893 should FAIL until PR #4175 or equivalent lands.

### 6. Hermes rebuild and durable state

Scenarios should cover Hermes-specific rebuild behavior, not generic rebuild assertions already owned by #3814:

- `expected.hermes.rebuild.provider-credential-reused` — rebuild preflight succeeds when provider credentials exist in OpenShell gateway even if host env is empty.
- `expected.hermes.rebuild.messaging-config-preserved` — rebuild preserves configured Hermes messaging channels and provider hashes.
- `expected.hermes.rebuild.dashboard-forward-released` — rebuild/channel stop-start flows do not fail because the old dashboard/API port forward is still host-bound.
- `expected.hermes.rebuild.post-rebuild-health` — Hermes gateway/API is healthy after rebuild.

Coverage source: `test-rebuild-hermes.sh`, #3895 / PR #3918, #4146 / PR #4144, prior Hermes rebuild fixes.

Initial expected outcome:

- #3895 should FAIL until PR #3918 or equivalent lands.
- #4146 should FAIL if the old port-forward race still reproduces in the scenario profile; pass once the harness/fix lands.
- Previously fixed rebuild preservation behavior should PASS.

### 7. Hermes policy and network boundaries

Scenarios should validate Hermes-specific policy behavior and provider path coverage:

- `expected.hermes.policy.inactive-messaging-not-preenabled` — inactive messaging policies are not enabled in Hermes sandbox policy by default.
- `expected.hermes.policy.managed-inference-anthropic-messages-path` — Hermes managed inference policy allows Anthropic-compatible `/v1/messages` when that provider is selected.
- `expected.hermes.policy.venv-python-egress` — `/opt/hermes/.venv/bin/python` outbound requests use the intended policy allowlist/proxy path and are not stuck behind an interactive approval gate.
- `expected.hermes.policy.no-phantom-allowlist` — Hermes policy does not include unrelated permissive endpoints or binaries without explicit opt-in.

Coverage source: #3981 / PR #3984, #4230, #3225, related Hermes policy-additions changes.

Initial expected outcome:

- #3981 should PASS for landed fix.
- #4230 should FAIL until `/v1/messages` path coverage is fixed.
- #3225 should FAIL or be platform-gated until the venv Python allowlist behavior is proven fixed.

### 8. Hermes provider compatibility

Scenarios should cover provider-specific runtime behavior after onboard smoke succeeds:

- `expected.hermes.provider.anthropic-compatible-chat` — after Anthropic-compatible provider onboard succeeds, an in-sandbox Hermes chat succeeds through the managed inference path.
- `expected.hermes.provider.gemini-tool-schema-compatible` — after Gemini provider onboard succeeds, Hermes tool schemas are accepted and chat succeeds.
- `expected.hermes.provider.onboard-smoke-not-sufficient` — validation distinguishes host-side onboard smoke success from in-sandbox Hermes runtime chat success.

Coverage source: #4230, #4232.

Initial expected outcome: both #4230 and #4232 should FAIL on current main until product fixes land.

### 9. Hermes security / shields / TUI usability

Scenarios should cover Hermes security controls and interactive usability where platform supports it:

- `expected.hermes.security.shields-up-down-macos-vm-driver` — on macOS Docker Desktop / OpenShell VM driver, `nemohermes <sandbox> shields up/down` should not call the non-existent `openshell-cluster-nemoclaw` k3s container.
- `expected.hermes.security.shields-config-locked` — after shields up, Hermes config is actually locked and status reflects true sandbox filesystem state.
- `expected.hermes.tui.history-writable` — Hermes TUI does not spam permission errors for `/sandbox/.hermes/.hermes_history`, and `/exit` can exit cleanly.

Coverage source: #4245, #2432, Hermes security posture assertions from #3891 / PR #3914.

Initial expected outcome:

- #4245 should FAIL on macOS Docker Desktop until fixed.
- #2432 should FAIL or be marked platform-gated until fixed/proven.
- Existing Linux security-posture assertions from #3914 should PASS.

## Recent Hermes-related issue inventory to encode in coverage metadata

| Issue | Status at refresh | Fix / PR signal | Scenario expectation |
| --- | --- | --- | --- |
| #3891 | Closed | PR #3914 merged; added `test-hermes-e2e.sh` security posture coverage | PASS |
| #3893 | Open | PR #4175 open; unit coverage only | FAIL until fixed |
| #3895 | Open | PR #3918 open; no E2E yet | FAIL until fixed |
| #3981 | Closed | PR #3984 merged; unit/policy coverage | PASS |
| #4067 | Closed | PR #3925 merged agent runtime dependency update | PASS or gated live-secret evidence |
| #4068 | Closed | PR #4107 merged docs/onboarding guidance | PASS for guidance assertion |
| #4070 | Open | PR #4126 open; unit regression only | FAIL until fixed |
| #4111 | Closed | nightly Hermes inference-switch timeout; later E2E hardening in PR #4152/#4158 | PASS or expected external-failure classification |
| #4145 | Closed | nightly inference-switch timeout; later E2E hardening in PR #4152/#4158 | PASS or expected external-failure classification |
| #4146 | Closed but PR #4144 still open at refresh | channels stop/start rebuild port-forward race | FAIL until harness/fix lands or prove fixed |
| #4189 | Open | PR #4222 closed/unmerged; unit-only attempt | FAIL until fixed |
| #4230 | Open | no fix PR found | FAIL until fixed |
| #4232 | Open | no fix PR found | FAIL until fixed |
| #4245 | Open | no fix PR found | FAIL on macOS VM-driver profile until fixed |
| #4246 | Open | no fix PR found | FAIL if shared/Hermes-applicable; otherwise classify out-of-scope with evidence |
| #3582 | Open older issue, still Hermes-relevant | no fix PR found | FAIL or live-secret gated until fixed |
| #3225 | Open older issue, still Hermes-relevant | PR #3228 closed/unmerged | FAIL or platform-gated until fixed |
| #2432 | Open older issue, still Hermes-relevant | PR #2473 open | FAIL or platform-gated until fixed |

## Validation expectations

The validation spec generated from this issue should include both passing and failing expectations:

- For landed fixes, scenario execution should be GREEN on current main or explicitly explain platform/secret gating.
- For open product bugs, scenario execution should be RED on current main and the failure message should point to the linked issue.
- For in-flight PRs, validation should run against both current main and the PR branch when practical:
  - main should reproduce the failure;
  - PR branch should flip the scenario to pass.
- For live messaging scenarios requiring Slack/Discord/Telegram credentials, provide a fake-provider/fake-gateway assertion where possible and mark the live assertion with runner/secret requirements.
- For external provider flakes, separate route/config assertions from live model availability so transient model/API outages do not mask product regressions.

## Acceptance criteria

- Domain primitive helpers exist and are used by migrated suite steps.
- Highest-value assertions from the listed legacy Hermes coverage are mapped to stable scenario assertion IDs.
- All Hermes-related issues in the inventory above are represented by a scenario, expected-failure scenario, or explicit out-of-scope classification with evidence.
- Known open Hermes bugs are allowed to produce failing scenario runs; they must not be silently retired.
- Remaining legacy assertions are explicitly classified as `covered`, `expected_fail_current_bug`, `deferred_platform_or_secret`, or `retired` with layer/domain metadata.
- Scenario framework tests pass for resolver/schema/suite/coverage-report validation.
- The coverage report makes this domain visible as covered, expected-failing, deferred, or retired.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): migrate Hermes feature coverage to scenario suites #3811

Goal

Legacy / current coverage to absorb

Architecture contract

Required Hermes scenario families

1. Hermes baseline runtime

2. Hermes inference switching and provider routing

3. Hermes messaging: Discord

4. Hermes messaging: Slack

5. Hermes messaging: Telegram

6. Hermes rebuild and durable state

7. Hermes policy and network boundaries

8. Hermes provider compatibility

9. Hermes security / shields / TUI usability

Recent Hermes-related issue inventory to encode in coverage metadata

Validation expectations

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue	Status at refresh	Fix / PR signal	Scenario expectation
#3891	Closed	PR #3914 merged; added `test-hermes-e2e.sh` security posture coverage	PASS
#3893	Open	PR #4175 open; unit coverage only	FAIL until fixed
#3895	Open	PR #3918 open; no E2E yet	FAIL until fixed
#3981	Closed	PR #3984 merged; unit/policy coverage	PASS
#4067	Closed	PR #3925 merged agent runtime dependency update	PASS or gated live-secret evidence
#4068	Closed	PR #4107 merged docs/onboarding guidance	PASS for guidance assertion
#4070	Open	PR #4126 open; unit regression only	FAIL until fixed
#4111	Closed	nightly Hermes inference-switch timeout; later E2E hardening in PR #4152/#4158	PASS or expected external-failure classification
#4145	Closed	nightly inference-switch timeout; later E2E hardening in PR #4152/#4158	PASS or expected external-failure classification
#4146	Closed but PR #4144 still open at refresh	channels stop/start rebuild port-forward race	FAIL until harness/fix lands or prove fixed
#4189	Open	PR #4222 closed/unmerged; unit-only attempt	FAIL until fixed
#4230	Open	no fix PR found	FAIL until fixed
#4232	Open	no fix PR found	FAIL until fixed
#4245	Open	no fix PR found	FAIL on macOS VM-driver profile until fixed
#4246	Open	no fix PR found	FAIL if shared/Hermes-applicable; otherwise classify out-of-scope with evidence
#3582	Open older issue, still Hermes-relevant	no fix PR found	FAIL or live-secret gated until fixed
#3225	Open older issue, still Hermes-relevant	PR #3228 closed/unmerged	FAIL or platform-gated until fixed
#2432	Open older issue, still Hermes-relevant	PR #2473 open	FAIL or platform-gated until fixed

test(e2e): migrate Hermes feature coverage to scenario suites #3811

Description

Goal

Legacy / current coverage to absorb

Architecture contract

Required Hermes scenario families

1. Hermes baseline runtime

2. Hermes inference switching and provider routing

3. Hermes messaging: Discord

4. Hermes messaging: Slack

5. Hermes messaging: Telegram

6. Hermes rebuild and durable state

7. Hermes policy and network boundaries

8. Hermes provider compatibility

9. Hermes security / shields / TUI usability

Recent Hermes-related issue inventory to encode in coverage metadata

Validation expectations

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions