Skip to content

fix(messaging): refresh stale messaging render plans#5268

Merged
cv merged 1 commit into
mainfrom
fix/hermes-rebuild-stale-render
Jun 12, 2026
Merged

fix(messaging): refresh stale messaging render plans#5268
cv merged 1 commit into
mainfrom
fix/hermes-rebuild-stale-render

Conversation

@sandl99

@sandl99 sandl99 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Refresh rebuild messaging plans when a stored sandbox plan has active channels but no agent render entries. This fixes stale Hermes Discord rebuild state without hand-populating compiled render fixtures in the e2e test.

Fix

Closes #5263

Changes

  • Rebuild manifest-derived render entries for existing rebuild plans that are missing active channel render output.
  • Preserve existing credential hashes when refreshed credential bindings are merged back into the stored plan.
  • Add a regression test for stale Hermes Discord rebuild plans with empty agentRender.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

Targeted checks run before PR creation:

  • npx @biomejs/biome check --write src/lib/messaging/compiler/workflow-planner.ts src/lib/messaging/compiler/workflow-planner.test.ts passes
  • npx vitest run src/lib/messaging/compiler/workflow-planner.test.ts test/messaging-build-applier.test.ts passes

Full npx prek -a was started but did not complete before interruption; it remains pending.

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: San Dang sdang@nvidia.com

Summary by CodeRabbit

  • Bug Fixes
    • Improved rebuild plan handling to regenerate missing manifest entries that become stale, ensuring channel configurations are properly restored
    • Enhanced credential binding preservation during rebuild operations, ensuring authentication details remain intact across configuration updates

@sandl99 sandl99 self-assigned this Jun 12, 2026
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR enhances the workflow planner's rebuild mechanism to detect and regenerate missing manifest render entries from stale plans while preserving credential binding state. Two helper functions support detection and merging logic, integrated into the main rebuild flow with comprehensive test coverage.

Changes

Rebuild Plan Manifest Refresh

Layer / File(s) Summary
Helper functions for rebuild detection and merge
src/lib/messaging/compiler/workflow-planner.ts
Three new internal helpers: planMissingActiveChannelRender detects when active, unrendered channels need regeneration; credentialBindingKey builds a composite key from channelId and providerEnvKey; preserveCredentialBindingHashes restores credential hash values from existing plans to refreshed plans.
Rebuild plan compilation and credential merging
src/lib/messaging/compiler/workflow-planner.ts
buildRebuildPlanFromSandboxEntry now normalizes disabled channels, detects missing agentRender entries, conditionally recompiles with merged credential availability (from normalized plan, sandbox entry, and caller), then merges results while preserving credential binding hashes.
Rebuild plan test coverage
src/lib/messaging/compiler/workflow-planner.test.ts
New test verifies buildRebuildPlanFromSandboxEntry regenerates missing Discord env-lines render for ~/.hermes/.env from a stale rebuild plan, preserves credential hash mapping, and includes openshell:resolve:env: token resolution.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Suggested labels

v0.0.63

Poem

🐰 A stale plan once crumbled and lost,
But helpers now detect the cost—
Merge and preserve those hashes true,
While render entries fresh and new! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(messaging): refresh stale messaging render plans' directly aligns with the PR's main objective of refreshing rebuild messaging plans when stored sandbox plans have stale render entries.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/hermes-rebuild-stale-render

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: rebuild-hermes-e2e
Optional E2E: channels-stop-start-hermes-e2e, messaging-providers-e2e, rebuild-openclaw-e2e

Dispatch hint: rebuild-hermes-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • rebuild-hermes-e2e (high; Docker/OpenShell sandbox rebuild with NVIDIA_API_KEY, timeout 60 minutes): Directly matches this regression surface: the existing Hermes rebuild E2E seeds a stale messaging plan with empty agentRender/buildSteps and a Discord credential hash, runs real nemoclaw rebuild, and asserts the Hermes .env/config placeholder and messaging config are restored without re-exporting the token.

Optional E2E

  • channels-stop-start-hermes-e2e (high; multi-channel Hermes sandbox lifecycle, timeout 75 minutes): Adjacent confidence for Hermes channel lifecycle rebuild behavior: stop/start exercises cached messaging credentials, disabled/active channel state, and rebuild rendering after channel mutations.
  • messaging-providers-e2e (high; full messaging provider chain with fake tokens, timeout 75 minutes): Broad end-to-end validation of messaging provider placeholders, credential isolation, L7 proxy rewriting, and rendered channel config for Discord/Slack/Telegram/WeChat/WhatsApp paths affected by planner output.
  • rebuild-openclaw-e2e (very high; old OpenClaw image build plus real sandbox rebuild): Optional generic-agent guard because the planner change is not Hermes-specific; this verifies OpenClaw rebuild still preserves messaging/network policy state and credential-backed channel behavior.

New E2E recommendations

  • stale-messaging-render-rebuild (medium): Existing direct coverage is embedded in a costly Hermes old-version rebuild. Add a focused current-image E2E that seeds an existing sandbox registry entry with active messaging channels but missing agentRender/buildSteps, runs nemoclaw <sandbox> rebuild --yes, and asserts rendered config plus credential hashes are restored for both Hermes and OpenClaw.
    • Suggested test: focused-stale-messaging-render-rebuild-e2e

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: rebuild-hermes-e2e

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: token-rotation-vitest
Optional Vitest E2E scenarios: rebuild-openclaw-vitest

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=token-rotation-vitest

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required Vitest E2E scenarios

  • token-rotation-vitest: The PR changes messaging workflow rebuild planning, including credential hash preservation and refreshed render entries for active messaging channels. The wired free-standing token rotation Vitest job exercises real messaging credential bindings, stored hashes, provider rebuild/reuse behavior, and multi-channel messaging plans through the live CLI path.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=token-rotation-vitest

Optional Vitest E2E scenarios

  • rebuild-openclaw-vitest: Adjacent rebuild coverage: this live Vitest job exercises the real OpenClaw rebuild flow and preserved sandbox state. It is optional because the primary changed surface is messaging-channel plan refresh and credential hash preservation, which token-rotation-vitest targets more directly.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=rebuild-openclaw-vitest

Relevant changed files

  • src/lib/messaging/compiler/workflow-planner.ts
  • src/lib/messaging/compiler/workflow-planner.test.ts

@github-actions

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 2 needs attention, 4 worth checking, 0 nice ideas
Top item: Refresh detection misses partial stale messaging plan outputs

Review findings

🛠️ Needs attention

  • Refresh detection misses partial stale messaging plan outputs (src/lib/messaging/compiler/workflow-planner.ts:405): The new stale-plan detector only asks whether an active channel has any agentRender entry. That satisfies the new empty-agentRender test, but it does not catch a stored plan that has one render for a channel while missing other manifest-derived renders. For example, Hermes Discord has discord-hermes-env, discord-hermes-config, and discord-hermes-platform render entries; a plan containing only the env render would bypass refresh and remain stale.
    • Recommendation: Compare the stored channel outputs against the manifest-derived rebuild plan at the output identity level, such as renderId/target/kind for agentRender and equivalent identities for other compiled plan sections, or otherwise broaden the stale predicate to cover all manifest-owned outputs required by the acceptance scope.
    • Evidence: planMissingActiveChannelRender() builds a Set from plan.agentRender.map(entry => entry.channelId) and returns false as soon as any render exists for the channel. The PR body says it will rebuild manifest-derived render entries for plans missing active channel render output, while Discord Hermes declares multiple render IDs.
  • Changed monoliths exceed the repository growth guard: The deterministic monolith check flags both changed files as growing by more than 20 lines: workflow-planner.ts grows by 56 lines and workflow-planner.test.ts grows by 49 lines. This adds recovery logic and fixture setup to already-large files without extraction or offsetting cleanup.
    • Recommendation: Extract the new normalization/preservation helpers into a focused plan-normalization utility or otherwise offset the growth with local simplification. For tests, use a small fixture/helper to keep workflow-planner.test.ts from continuing to grow as a monolith.
    • Evidence: Monolith deltas: src/lib/messaging/compiler/workflow-planner.ts baseLines 444, headLines 500, delta 56; src/lib/messaging/compiler/workflow-planner.test.ts baseLines 726, headLines 775, delta 49.

🔎 Worth checking

  • Source-of-truth review needed: Messaging rebuild stale-plan recovery in buildRebuildPlanFromSandboxEntry(): The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: New code adds a tolerant recovery branch based on planMissingActiveChannelRender(), with no linked issue clauses and no source-boundary documentation in the changed files.
  • Stale blueprint sections can bypass rebuild normalization (src/lib/messaging/compiler/workflow-planner.ts:109): Messaging plans drive sandbox-rendered config, network policy intent, build/package-install steps, state updates, health checks, and credential placeholders. The new recovery path triggers only when active channels have no render entry at all, so stored plans with present render entries but missing buildSteps, networkPolicy entries, stateUpdates, healthChecks, or partial render entries are returned unchanged. That can leave the sandbox blueprint less complete than the current manifests intend.
    • Recommendation: Normalize stale rebuild plans against a freshly compiled manifest plan for the configured channels, or explicitly validate all manifest-derived sections before deciding to return the stored plan unchanged. Add tests for OpenClaw package-install build steps and policy/health output if those are expected to be recovered.
    • Evidence: buildRebuildPlanFromSandboxEntry() returns normalizedPlan immediately when !planMissingActiveChannelRender(normalizedPlan). The added test clears buildSteps but uses Hermes Discord, where buildSteps are expected to be empty, so missing build-step recovery is not exercised.
  • Regression coverage only proves empty agentRender recovery (src/lib/messaging/compiler/workflow-planner.test.ts:449): The added unit test covers the exact empty-agentRender case and credential-hash preservation, but it does not cover partial render staleness, stale OpenClaw build steps, policy/health/state recovery, disabled-channel behavior, or runtime rebuild behavior. Those are the paths most likely to reveal whether the recovered plan matches the current manifest-derived sandbox blueprint.
    • Recommendation: Add behavior-specific tests for partial Hermes Discord render drift, OpenClaw plugin channel build-step recovery, and a fresh-compile equivalence check for manifest-derived sections while preserving credential hashes and persisted inputs. Identify a targeted runtime/integration validation for the rebuild path.
    • Evidence: The new test constructs stalePlan with agentRender: [] and buildSteps: [], then asserts only one env-lines render and the preserved DISCORD_BOT_TOKEN hash.
  • Recovery workaround does not define the source boundary for stale plans (src/lib/messaging/compiler/workflow-planner.ts:99): This is localized compatibility/recovery behavior for an invalid persisted state, but the patch does not establish where that stale state is produced, why the source cannot be fixed here, or when the workaround can be removed. Without that boundary, future changes may assume rebuild fully normalizes plans even though this predicate only handles one stale shape.
    • Recommendation: Document or encode the source boundary for legacy/stale stored plans, and prefer a normalization approach that makes invalid manifest-derived sections impossible after rebuild. If this must remain a compatibility path, add a regression that proves the producer path cannot create the stale state again or that rebuild fully repairs all supported stale shapes.
    • Evidence: The PR body refers to stale Hermes Discord rebuild state, but the diff only adds a sentinel recovery check and does not identify the producer, migration boundary, or removal condition.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Add a unit test where a Hermes Discord stored plan contains only discord-hermes-env and is missing discord-hermes-config and discord-hermes-platform; rebuild should refresh all expected render IDs.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
  • **Runtime validation** — Add a unit test where an OpenClaw Discord or Slack stored plan has agentRender present but buildSteps: []; rebuild should restore required package-install build steps if stale build-step recovery is intended.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
  • **Runtime validation** — Add a unit or integration-style test that rebuilds from a stale sandbox entry and verifies the resulting manifest-derived render IDs, build step IDs, network policy entries, state updates, and health checks match a freshly compiled rebuild plan while preserving credential hashes and persisted input values.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
  • **Runtime validation** — Add a disabled-channel test showing that missing render for a disabled channel does not unexpectedly re-enable the channel or render active config.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
  • **Runtime validation** — Identify targeted runtime/integration validation for the rebuild behavior that exercises the sandbox messaging rebuild path without relying on external E2E status in this review.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
  • **Regression coverage only proves empty agentRender recovery** — Add behavior-specific tests for partial Hermes Discord render drift, OpenClaw plugin channel build-step recovery, and a fresh-compile equivalence check for manifest-derived sections while preserving credential hashes and persisted inputs. Identify a targeted runtime/integration validation for the rebuild path.
  • **Acceptance clause:** This fixes stale Hermes Discord rebuild state without hand-populating compiled render fixtures in the e2e test. — add test evidence or identify existing coverage. A unit test constructs stale Hermes Discord state and verifies a regenerated env-lines render. No changed runtime/e2e validation is present in the diff, and the deterministic test-depth context recommends targeted runtime/integration validation for this runtime/sandbox path.
  • **Acceptance clause:** Rebuild manifest-derived render entries for existing rebuild plans that are missing active channel render output. — add test evidence or identify existing coverage. The implementation rebuilds when a channel has zero render entries, but it does not detect partial missing render output for channels with multiple manifest render entries, such as Hermes Discord.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@sandl99 sandl99 added nightly-e2e Nightly E2E test failures area: messaging Messaging channels, bridges, manifests, or channel lifecycle labels Jun 12, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27392012712
Target ref: fix/hermes-rebuild-stale-render
Requested jobs: rebuild-hermes-e2e,rebuild-hermes-stale-base-e2e
Summary: 2 passed, 0 failed, 0 cancelled, 0 skipped

Job Result
rebuild-hermes-e2e ✅ success
rebuild-hermes-stale-base-e2e ✅ success

@sandl99 sandl99 requested a review from cv June 12, 2026 03:22
@sandl99 sandl99 changed the title fix(cli): refresh stale messaging render plans fix(messaging): refresh stale messaging render plans Jun 12, 2026
@sandl99

sandl99 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Reply to PR reviewer

Your point is good, but agentRender will be cleaned up after my migration, so I'll keep as it for now and clean up later.

@sandl99 sandl99 added the v0.0.64 Release target label Jun 12, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27392454550
Target ref: fix/hermes-rebuild-stale-render
Requested jobs: rebuild-hermes-e2e,messaging-providers-e2e,rebuild-openclaw-e2e,messaging-providers-e2e,channels-add-remove-e2e,channels-stop-start-openclaw-e2e,channels-stop-start-hermes-e2e,hermes-discord-e2e,hermes-slack-e2e
Summary: 8 passed, 0 failed, 0 cancelled, 0 skipped

Job Result
channels-add-remove-e2e ✅ success
channels-stop-start-hermes-e2e ✅ success
channels-stop-start-openclaw-e2e ✅ success
hermes-discord-e2e ✅ success
hermes-slack-e2e ✅ success
messaging-providers-e2e ✅ success
rebuild-hermes-e2e ✅ success
rebuild-openclaw-e2e ✅ success

@cv cv merged commit 4c28094 into main Jun 12, 2026
194 checks passed
@cv cv deleted the fix/hermes-rebuild-stale-render branch June 12, 2026 04:06
cv pushed a commit that referenced this pull request Jun 12, 2026
## Summary
- Add v0.0.64 release notes from the release announcement and link them
to the relevant deeper docs.
- Document that custom policy presets recorded through `policy-add
--from-file` and `--from-dir` survive snapshot restore and sandbox
recreation.
- Refresh generated NemoClaw user skills from the current source docs.

## Source summary
- #5104 -> `docs/manage-sandboxes/backup-restore.mdx`,
`docs/network-policy/customize-network-policy.mdx`: Documents custom
policy presets preserved through snapshot restore.
- #4955 -> `docs/about/release-notes.mdx`: Adds release-note coverage
for Brave web-search pinning and `BRAVE_API_KEY` placeholder
preservation.
- #5116, #5269 -> `docs/about/release-notes.mdx`: Adds release-note
coverage for Docker-driver gateway health and rootfs guard stability.
- #5241, #5085 -> `docs/about/release-notes.mdx`: Adds release-note
coverage for chat-completions provider selection and Nemotron Ultra 550B
tool-less request compatibility.
- #5268, #5210, #5257 -> `docs/about/release-notes.mdx`: Adds
release-note coverage for messaging render plan refresh, OpenClaw
scope-upgrade approval recovery, and Hermes WhatsApp bridge dependency
setup.
- Current source docs -> `.agents/skills/`: Regenerates user-skill
references so agent-facing guidance matches the source documentation.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `npm run docs`
- `npm run build:cli`
- `npm run typecheck:cli`
- Commit/pre-push hooks: markdownlint, gitleaks, docs-to-skills
verification, TypeScript CLI, and skills YAML checks passed.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Clarified sandbox snapshot restore preserves custom policy presets and
restores them without original files.
* Switched sandbox setup and remote deployment guidance to Docker-based
workflows and emphasized remote onboarding flow.
* Expanded troubleshooting for gateway recovery, Docker GPU/WSL issues,
and onboarding resume.
* Added/updated CLI docs: advanced maintenance, session export,
upload/download wrappers, and status recovery guidance.
* Added v0.0.64 release notes and links to NemoClaw Community; fixed
command reference formatting.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: messaging Messaging channels, bridges, manifests, or channel lifecycle nightly-e2e Nightly E2E test failures v0.0.64 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nightly-e2e: Hermes rebuild test fixture missing Discord agentRender entries

2 participants