fix(messaging): refresh stale messaging render plans by sandl99 · Pull Request #5268 · NVIDIA/NemoClaw

sandl99 · 2026-06-12T03:09:52Z

Summary

Refresh rebuild messaging plans when a stored sandbox plan has active channels but no agent render entries. This fixes stale Hermes Discord rebuild state without hand-populating compiled render fixtures in the e2e test.

Fix

Closes #5263

Changes

Rebuild manifest-derived render entries for existing rebuild plans that are missing active channel render output.
Preserve existing credential hashes when refreshed credential bindings are merged back into the stored plan.
Add a regression test for stale Hermes Discord rebuild plans with empty agentRender.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Targeted checks run before PR creation:

npx @biomejs/biome check --write src/lib/messaging/compiler/workflow-planner.ts src/lib/messaging/compiler/workflow-planner.test.ts passes
npx vitest run src/lib/messaging/compiler/workflow-planner.test.ts test/messaging-build-applier.test.ts passes

Full npx prek -a was started but did not complete before interruption; it remains pending.

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: San Dang sdang@nvidia.com

Summary by CodeRabbit

Bug Fixes
- Improved rebuild plan handling to regenerate missing manifest entries that become stale, ensuring channel configurations are properly restored
- Enhanced credential binding preservation during rebuild operations, ensuring authentication details remain intact across configuration updates

coderabbitai · 2026-06-12T03:10:06Z

📝 Walkthrough

Walkthrough

This PR enhances the workflow planner's rebuild mechanism to detect and regenerate missing manifest render entries from stale plans while preserving credential binding state. Two helper functions support detection and merging logic, integrated into the main rebuild flow with comprehensive test coverage.

Changes

Rebuild Plan Manifest Refresh

Layer / File(s)	Summary
Helper functions for rebuild detection and merge `src/lib/messaging/compiler/workflow-planner.ts`	Three new internal helpers: `planMissingActiveChannelRender` detects when active, unrendered channels need regeneration; `credentialBindingKey` builds a composite key from `channelId` and `providerEnvKey`; `preserveCredentialBindingHashes` restores credential hash values from existing plans to refreshed plans.
Rebuild plan compilation and credential merging `src/lib/messaging/compiler/workflow-planner.ts`	`buildRebuildPlanFromSandboxEntry` now normalizes disabled channels, detects missing `agentRender` entries, conditionally recompiles with merged credential availability (from normalized plan, sandbox entry, and caller), then merges results while preserving credential binding hashes.
Rebuild plan test coverage `src/lib/messaging/compiler/workflow-planner.test.ts`	New test verifies `buildRebuildPlanFromSandboxEntry` regenerates missing Discord `env-lines` render for `~/.hermes/.env` from a stale rebuild plan, preserves credential hash mapping, and includes `openshell:resolve:env:` token resolution.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

nightly-e2e: Hermes rebuild test fixture missing Discord agentRender entries #5263: Main changes add logic to detect missing active channel renders and rebuild to regenerate Discord agentRender entries (including env-lines) while preserving credential hashes, which directly addresses the retrieved nightly-e2e failure about missing Discord agentRender entries.

Suggested labels

v0.0.63

Poem

🐰 A stale plan once crumbled and lost,
But helpers now detect the cost—
Merge and preserve those hashes true,
While render entries fresh and new! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(messaging): refresh stale messaging render plans' directly aligns with the PR's main objective of refreshing rebuild messaging plans when stored sandbox plans have stale render entries.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/hermes-rebuild-stale-render

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-12T03:12:33Z

E2E Advisor Recommendation

Required E2E: rebuild-hermes-e2e
Optional E2E: channels-stop-start-hermes-e2e, messaging-providers-e2e, rebuild-openclaw-e2e

Dispatch hint: rebuild-hermes-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

rebuild-hermes-e2e (high; Docker/OpenShell sandbox rebuild with NVIDIA_API_KEY, timeout 60 minutes): Directly matches this regression surface: the existing Hermes rebuild E2E seeds a stale messaging plan with empty agentRender/buildSteps and a Discord credential hash, runs real nemoclaw rebuild, and asserts the Hermes .env/config placeholder and messaging config are restored without re-exporting the token.

Optional E2E

channels-stop-start-hermes-e2e (high; multi-channel Hermes sandbox lifecycle, timeout 75 minutes): Adjacent confidence for Hermes channel lifecycle rebuild behavior: stop/start exercises cached messaging credentials, disabled/active channel state, and rebuild rendering after channel mutations.
messaging-providers-e2e (high; full messaging provider chain with fake tokens, timeout 75 minutes): Broad end-to-end validation of messaging provider placeholders, credential isolation, L7 proxy rewriting, and rendered channel config for Discord/Slack/Telegram/WeChat/WhatsApp paths affected by planner output.
rebuild-openclaw-e2e (very high; old OpenClaw image build plus real sandbox rebuild): Optional generic-agent guard because the planner change is not Hermes-specific; this verifies OpenClaw rebuild still preserves messaging/network policy state and credential-backed channel behavior.

New E2E recommendations

stale-messaging-render-rebuild (medium): Existing direct coverage is embedded in a costly Hermes old-version rebuild. Add a focused current-image E2E that seeds an existing sandbox registry entry with active messaging channels but missing agentRender/buildSteps, runs nemoclaw <sandbox> rebuild --yes, and asserts rendered config plus credential hashes are restored for both Hermes and OpenClaw.
- Suggested test: focused-stale-messaging-render-rebuild-e2e

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: rebuild-hermes-e2e

github-actions · 2026-06-12T03:12:33Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: token-rotation-vitest
Optional Vitest E2E scenarios: rebuild-openclaw-vitest

Dispatch required Vitest E2E scenarios:

gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=token-rotation-vitest

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required Vitest E2E scenarios

token-rotation-vitest: The PR changes messaging workflow rebuild planning, including credential hash preservation and refreshed render entries for active messaging channels. The wired free-standing token rotation Vitest job exercises real messaging credential bindings, stored hashes, provider rebuild/reuse behavior, and multi-channel messaging plans through the live CLI path.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=token-rotation-vitest

Optional Vitest E2E scenarios

rebuild-openclaw-vitest: Adjacent rebuild coverage: this live Vitest job exercises the real OpenClaw rebuild flow and preserved sandbox state. It is optional because the primary changed surface is messaging-channel plan refresh and credential hash preservation, which token-rotation-vitest targets more directly.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=rebuild-openclaw-vitest

Relevant changed files

src/lib/messaging/compiler/workflow-planner.ts
src/lib/messaging/compiler/workflow-planner.test.ts

github-actions · 2026-06-12T03:13:03Z

PR Review Advisor

Findings: 2 needs attention, 4 worth checking, 0 nice ideas
Top item: Refresh detection misses partial stale messaging plan outputs

Review findings

🛠️ Needs attention

Refresh detection misses partial stale messaging plan outputs (src/lib/messaging/compiler/workflow-planner.ts:405): The new stale-plan detector only asks whether an active channel has any agentRender entry. That satisfies the new empty-agentRender test, but it does not catch a stored plan that has one render for a channel while missing other manifest-derived renders. For example, Hermes Discord has discord-hermes-env, discord-hermes-config, and discord-hermes-platform render entries; a plan containing only the env render would bypass refresh and remain stale.
- Recommendation: Compare the stored channel outputs against the manifest-derived rebuild plan at the output identity level, such as renderId/target/kind for agentRender and equivalent identities for other compiled plan sections, or otherwise broaden the stale predicate to cover all manifest-owned outputs required by the acceptance scope.
- Evidence: planMissingActiveChannelRender() builds a Set from plan.agentRender.map(entry => entry.channelId) and returns false as soon as any render exists for the channel. The PR body says it will rebuild manifest-derived render entries for plans missing active channel render output, while Discord Hermes declares multiple render IDs.
Changed monoliths exceed the repository growth guard: The deterministic monolith check flags both changed files as growing by more than 20 lines: workflow-planner.ts grows by 56 lines and workflow-planner.test.ts grows by 49 lines. This adds recovery logic and fixture setup to already-large files without extraction or offsetting cleanup.
- Recommendation: Extract the new normalization/preservation helpers into a focused plan-normalization utility or otherwise offset the growth with local simplification. For tests, use a small fixture/helper to keep workflow-planner.test.ts from continuing to grow as a monolith.
- Evidence: Monolith deltas: src/lib/messaging/compiler/workflow-planner.ts baseLines 444, headLines 500, delta 56; src/lib/messaging/compiler/workflow-planner.test.ts baseLines 726, headLines 775, delta 49.

🔎 Worth checking

Source-of-truth review needed: Messaging rebuild stale-plan recovery in buildRebuildPlanFromSandboxEntry(): The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: New code adds a tolerant recovery branch based on planMissingActiveChannelRender(), with no linked issue clauses and no source-boundary documentation in the changed files.
Stale blueprint sections can bypass rebuild normalization (src/lib/messaging/compiler/workflow-planner.ts:109): Messaging plans drive sandbox-rendered config, network policy intent, build/package-install steps, state updates, health checks, and credential placeholders. The new recovery path triggers only when active channels have no render entry at all, so stored plans with present render entries but missing buildSteps, networkPolicy entries, stateUpdates, healthChecks, or partial render entries are returned unchanged. That can leave the sandbox blueprint less complete than the current manifests intend.
- Recommendation: Normalize stale rebuild plans against a freshly compiled manifest plan for the configured channels, or explicitly validate all manifest-derived sections before deciding to return the stored plan unchanged. Add tests for OpenClaw package-install build steps and policy/health output if those are expected to be recovered.
- Evidence: buildRebuildPlanFromSandboxEntry() returns normalizedPlan immediately when !planMissingActiveChannelRender(normalizedPlan). The added test clears buildSteps but uses Hermes Discord, where buildSteps are expected to be empty, so missing build-step recovery is not exercised.
Regression coverage only proves empty agentRender recovery (src/lib/messaging/compiler/workflow-planner.test.ts:449): The added unit test covers the exact empty-agentRender case and credential-hash preservation, but it does not cover partial render staleness, stale OpenClaw build steps, policy/health/state recovery, disabled-channel behavior, or runtime rebuild behavior. Those are the paths most likely to reveal whether the recovered plan matches the current manifest-derived sandbox blueprint.
- Recommendation: Add behavior-specific tests for partial Hermes Discord render drift, OpenClaw plugin channel build-step recovery, and a fresh-compile equivalence check for manifest-derived sections while preserving credential hashes and persisted inputs. Identify a targeted runtime/integration validation for the rebuild path.
- Evidence: The new test constructs stalePlan with agentRender: [] and buildSteps: [], then asserts only one env-lines render and the preserved DISCORD_BOT_TOKEN hash.
Recovery workaround does not define the source boundary for stale plans (src/lib/messaging/compiler/workflow-planner.ts:99): This is localized compatibility/recovery behavior for an invalid persisted state, but the patch does not establish where that stale state is produced, why the source cannot be fixed here, or when the workaround can be removed. Without that boundary, future changes may assume rebuild fully normalizes plans even though this predicate only handles one stale shape.
- Recommendation: Document or encode the source boundary for legacy/stale stored plans, and prefer a normalization approach that makes invalid manifest-derived sections impossible after rebuild. If this must remain a compatibility path, add a regression that proves the producer path cannot create the stale state again or that rebuild fully repairs all supported stale shapes.
- Evidence: The PR body refers to stale Hermes Discord rebuild state, but the diff only adds a sentinel recovery check and does not identify the producer, migration boundary, or removal condition.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Add a unit test where a Hermes Discord stored plan contains only discord-hermes-env and is missing discord-hermes-config and discord-hermes-platform; rebuild should refresh all expected render IDs.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
**Runtime validation** — Add a unit test where an OpenClaw Discord or Slack stored plan has agentRender present but buildSteps: []; rebuild should restore required package-install build steps if stale build-step recovery is intended.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
**Runtime validation** — Add a unit or integration-style test that rebuilds from a stale sandbox entry and verifies the resulting manifest-derived render IDs, build step IDs, network policy entries, state updates, and health checks match a freshly compiled rebuild plan while preserving credential hashes and persisted input values.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
**Runtime validation** — Add a disabled-channel test showing that missing render for a disabled channel does not unexpectedly re-enable the channel or render active config.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
**Runtime validation** — Identify targeted runtime/integration validation for the rebuild behavior that exercises the sandbox messaging rebuild path without relying on external E2E status in this review.. The changed planner output affects runtime sandbox rebuild blueprints, including rendered agent config, credential placeholders, build steps, and network policy intent. Unit coverage proves the empty-render case but not runtime rebuild behavior or partial stale plan normalization.
**Regression coverage only proves empty agentRender recovery** — Add behavior-specific tests for partial Hermes Discord render drift, OpenClaw plugin channel build-step recovery, and a fresh-compile equivalence check for manifest-derived sections while preserving credential hashes and persisted inputs. Identify a targeted runtime/integration validation for the rebuild path.
**Acceptance clause:** This fixes stale Hermes Discord rebuild state without hand-populating compiled render fixtures in the e2e test. — add test evidence or identify existing coverage. A unit test constructs stale Hermes Discord state and verifies a regenerated env-lines render. No changed runtime/e2e validation is present in the diff, and the deterministic test-depth context recommends targeted runtime/integration validation for this runtime/sandbox path.
**Acceptance clause:** Rebuild manifest-derived render entries for existing rebuild plans that are missing active channel render output. — add test evidence or identify existing coverage. The implementation rebuilds when a channel has zero render entries, but it does not detect partial missing render output for channels with multiple manifest render entries, such as Hermes Discord.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

github-actions · 2026-06-12T03:21:46Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27392012712
Target ref: fix/hermes-rebuild-stale-render
Requested jobs: rebuild-hermes-e2e,rebuild-hermes-stale-base-e2e
Summary: 2 passed, 0 failed, 0 cancelled, 0 skipped

Job	Result
rebuild-hermes-e2e	✅ success
rebuild-hermes-stale-base-e2e	✅ success

sandl99 · 2026-06-12T03:29:59Z

Reply to PR reviewer

Your point is good, but agentRender will be cleaned up after my migration, so I'll keep as it for now and clean up later.

github-actions · 2026-06-12T03:52:48Z

Selective E2E Results — ✅ All requested jobs passed

Run: 27392454550
Target ref: fix/hermes-rebuild-stale-render
Requested jobs: rebuild-hermes-e2e,messaging-providers-e2e,rebuild-openclaw-e2e,messaging-providers-e2e,channels-add-remove-e2e,channels-stop-start-openclaw-e2e,channels-stop-start-hermes-e2e,hermes-discord-e2e,hermes-slack-e2e
Summary: 8 passed, 0 failed, 0 cancelled, 0 skipped

Job	Result
channels-add-remove-e2e	✅ success
channels-stop-start-hermes-e2e	✅ success
channels-stop-start-openclaw-e2e	✅ success
hermes-discord-e2e	✅ success
hermes-slack-e2e	✅ success
messaging-providers-e2e	✅ success
rebuild-hermes-e2e	✅ success
rebuild-openclaw-e2e	✅ success

## Summary - Add v0.0.64 release notes from the release announcement and link them to the relevant deeper docs. - Document that custom policy presets recorded through `policy-add --from-file` and `--from-dir` survive snapshot restore and sandbox recreation. - Refresh generated NemoClaw user skills from the current source docs. ## Source summary - #5104 -> `docs/manage-sandboxes/backup-restore.mdx`, `docs/network-policy/customize-network-policy.mdx`: Documents custom policy presets preserved through snapshot restore. - #4955 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Brave web-search pinning and `BRAVE_API_KEY` placeholder preservation. - #5116, #5269 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Docker-driver gateway health and rootfs guard stability. - #5241, #5085 -> `docs/about/release-notes.mdx`: Adds release-note coverage for chat-completions provider selection and Nemotron Ultra 550B tool-less request compatibility. - #5268, #5210, #5257 -> `docs/about/release-notes.mdx`: Adds release-note coverage for messaging render plan refresh, OpenClaw scope-upgrade approval recovery, and Hermes WhatsApp bridge dependency setup. - Current source docs -> `.agents/skills/`: Regenerates user-skill references so agent-facing guidance matches the source documentation. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` - `npm run build:cli` - `npm run typecheck:cli` - Commit/pre-push hooks: markdownlint, gitleaks, docs-to-skills verification, TypeScript CLI, and skills YAML checks passed.  ## Summary by CodeRabbit * **Documentation** * Clarified sandbox snapshot restore preserves custom policy presets and restores them without original files. * Switched sandbox setup and remote deployment guidance to Docker-based workflows and emphasized remote onboarding flow. * Expanded troubleshooting for gateway recovery, Docker GPU/WSL issues, and onboarding resume. * Added/updated CLI docs: advanced maintenance, session export, upload/download wrappers, and status recovery guidance. * Added v0.0.64 release notes and links to NemoClaw Community; fixed command reference formatting.

fix(cli): refresh stale messaging render plans

6eb3b51

sandl99 self-assigned this Jun 12, 2026

sandl99 added nightly-e2e Nightly E2E test failures area: messaging Messaging channels, bridges, manifests, or channel lifecycle labels Jun 12, 2026

sandl99 requested a review from cv June 12, 2026 03:22

sandl99 changed the title ~~fix(cli): refresh stale messaging render plans~~ fix(messaging): refresh stale messaging render plans Jun 12, 2026

cv approved these changes Jun 12, 2026

View reviewed changes

sandl99 added the v0.0.64 Release target label Jun 12, 2026

cv merged commit 4c28094 into main Jun 12, 2026
194 checks passed

cv deleted the fix/hermes-rebuild-stale-render branch June 12, 2026 04:06

coderabbitai Bot mentioned this pull request Jun 12, 2026

refactor(messaging): compact persisted messaging plans #5328

Merged

12 tasks

miyoungc mentioned this pull request Jun 12, 2026

docs: refresh v0.0.64 release docs #5358

Merged

Conversation

sandl99 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested labels

Poem

Uh oh!

github-actions Bot commented Jun 12, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 12, 2026

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

github-actions Bot commented Jun 12, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented Jun 12, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

sandl99 commented Jun 12, 2026

Reply to PR reviewer

Uh oh!

github-actions Bot commented Jun 12, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sandl99 commented Jun 12, 2026 •

edited

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading