Skip to content

fix(hermes): normalize Telegram send_message pseudo-call leaks#4175

Merged
cv merged 7 commits into
mainfrom
fix/3893_telegram-tool-call-leak-dco
Jun 3, 2026
Merged

fix(hermes): normalize Telegram send_message pseudo-call leaks#4175
cv merged 7 commits into
mainfrom
fix/3893_telegram-tool-call-leak-dco

Conversation

@chengjiew

@chengjiew chengjiew commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the NemoHermes first Telegram reply path so a raw send_message: "to telegram: ..." pseudo-tool response is not delivered to the user as plain text.

The PR adds both a first-turn prompt guard for messaging platforms and a narrow final-response normalizer for targeted raw messaging pseudo-calls.

Related Issue

Fixes #3893

Changes

  • Normalize whole-response raw messaging pseudo-calls such as send_message: "to telegram: Hello" into the actual message body before gateway delivery.
  • Add first-turn NemoClaw context telling messaging agents to reply to the current chat with normal assistant text, and reserve send_message for explicit cross-platform delivery requests.
  • Add Hermes plugin regression coverage for the normalizer and Telegram first-turn context.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Additional verification run locally:

  • npm test -- test/hermes-plugin-handlers.test.ts test/generate-hermes-config.test.ts
  • python3 -m py_compile agents/hermes/plugin/__init__.py
  • npm run build:cli
  • git diff --check HEAD~1..HEAD

Note: local pre-commit was not used for the final commit because its CLI coverage hook recursively triggered hooks through a temporary Git fixture. The targeted checks above were run explicitly after rebasing onto latest origin/main.


Signed-off-by: Chengjie Wang chengjiew@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Prevented raw tool-like messaging strings from appearing in chat when Hermes is used as a gateway for messaging platforms.
  • Improvements

    • Normalized and cleaned messaging outputs so assistant replies appear as natural chat text; cross-platform sends occur only when explicitly requested and platform-aware behavior is maintained across sessions.
  • Tests

    • Added automated tests to verify messaging normalization and correct platform-specific reply behavior.

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 25, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 94c4bf01-b088-4092-b418-c56d918490b3

📥 Commits

Reviewing files that changed from the base of the PR and between f43e6e3 and fb02ff2.

📒 Files selected for processing (2)
  • agents/hermes/plugin/__init__.py
  • test/hermes-plugin-handlers.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/hermes-plugin-handlers.test.ts
  • agents/hermes/plugin/init.py

📝 Walkthrough

Walkthrough

Detects and extracts intended message bodies from raw send_message: pseudo-tool outputs for same-platform deliveries, patches the agent think-block stripper to return normalized text, and injects a platform-aware NemoClaw context to avoid raw tool-like final responses; tests cover normalization and context injection.

Changes

Messaging Platform Response Handling

Layer / File(s) Summary
Message Response Normalization and Patching
agents/hermes/plugin/__init__.py
Adds re import, patch guard, platform identifiers and regexes, helpers to strip quotes and extract message bodies, current-platform tracking, and installs a guarded monkeypatch on run_agent.AIAgent._strip_think_blocks at register, _pre_llm_call, and on_session_start.
Context Instruction for Messaging Platforms
agents/hermes/plugin/__init__.py
Refactors NemoClaw context construction to build ordered context lines and conditionally append a platform-specific instruction that tells the model to reply with normal assistant text for known messaging adapters and to avoid emitting raw send_message: / to <platform>: strings as final chat output.
Test Validation
test/hermes-plugin-handlers.test.ts
Adds two Vitest tests (via Python harness): one verifies targeted/untargeted send_message normalization, cross-platform and unknown-platform blocking, and patched _strip_think_blocks behavior; the other verifies _pre_llm_call injects the Telegram-first-turn reply constraint.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through logs at break of dawn,
Found raw calls where chat belonged,
With regex teeth and patching paw,
I trimmed the quotes and fixed the flaw,
Now Hermes greets with proper song.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main fix: normalizing Telegram send_message pseudo-call leaks, which aligns perfectly with the core objective to prevent raw send_message text from leaking to users.
Linked Issues check ✅ Passed The code changes fully address issue #3893 by normalizing send_message pseudo-calls for Telegram and adding first-turn context instructions to prevent raw tool text from appearing in user-facing responses.
Out of Scope Changes check ✅ Passed All changes are scoped to Hermes plugin initialization and test coverage, directly addressing the Telegram send_message leak issue without introducing unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/3893_telegram-tool-call-leak-dco

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: hermes-e2e, hermes-discord-e2e, hermes-slack-e2e
Optional E2E: messaging-providers-e2e, hermes-inference-switch-e2e

Dispatch hint: hermes-e2e,hermes-discord-e2e,hermes-slack-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • hermes-e2e (high): Required because the Hermes plugin changed. This job runs install → onboard --agent hermes → sandbox health → live inference, validating plugin load and real Hermes assistant behavior in an OpenShell sandbox.
  • hermes-discord-e2e (high): Required because the change is explicitly platform-aware messaging response handling. This is existing Hermes Discord E2E coverage for the Hermes messaging onboarding/config/gateway path and is one of the closest real-channel checks for the new current-platform guard.
  • hermes-slack-e2e (high): Required because cross-platform send_message normalization now depends on the active messaging platform. Existing Hermes Slack E2E validates Hermes Slack onboarding, policy, provider placeholder handling, and sandbox integration for this affected platform.

Optional E2E

  • messaging-providers-e2e (high): Optional adjacent confidence for the broader Telegram/Discord/Slack messaging credential provider and L7 proxy chain. The PR does not change credential injection or network policy, but the runtime behavior is messaging-related.
  • hermes-inference-switch-e2e (high): Optional confidence that Hermes remains healthy after inference configuration changes, since the modified pre_llm_call context includes provider/model/gateway state used during live inference turns.

New E2E recommendations

  • hermes-telegram-first-message (high): The PR specifically references a first-turn messaging race and examples use Telegram, but there is no existing Hermes Telegram E2E job analogous to hermes-discord-e2e or hermes-slack-e2e. Add a Hermes Telegram E2E that onboards Hermes with Telegram enabled and verifies first inbound reply text is delivered as normal assistant text, while cross-platform raw send_message output is not silently delivered to the wrong chat.
    • Suggested test: Add hermes-telegram-e2e using a fake or hermetic Telegram gateway path plus an optional real-token inbound reply mode.
  • hermes-messaging-response-normalization (medium): Current unit tests cover the normalizer and hook anchoring with stubs, but existing E2Es do not appear to assert that real Hermes gateway first-message responses avoid leaking raw send_message: ... final text.
    • Suggested test: Extend Hermes messaging E2Es to assert first-turn Discord/Slack/Telegram assistant replies do not surface raw send_message pseudo-calls and preserve cross-platform targets as an error/dispatch path.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: hermes-e2e,hermes-discord-e2e,hermes-slack-e2e

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-hermes-discord, ubuntu-repo-cloud-hermes-slack
Optional scenario E2E: ubuntu-repo-cloud-hermes

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes-discord
  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes-slack

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • ubuntu-repo-cloud-hermes-discord: Hermes plugin changes alter messaging-platform grounding and raw send_message response normalization; the Hermes Discord scenario exercises the Hermes messaging adapter path on a dispatchable scenario route.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes-discord
  • ubuntu-repo-cloud-hermes-slack: Hermes plugin changes alter messaging response filtering and cross-platform target handling; the Hermes Slack scenario provides targeted coverage for a second Hermes messaging adapter path.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes-slack

Optional scenario E2E

  • ubuntu-repo-cloud-hermes: Optional baseline Hermes scenario to verify the plugin still loads and the Hermes sandbox remains healthy outside messaging-adapter onboarding.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes

Relevant changed files

  • agents/hermes/plugin/__init__.py

@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • PR review advisor unavailable: The automated advisor could not complete: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt
    • Recommendation: Re-run the PR Review Advisor or perform a manual review.
    • Evidence: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agents/hermes/plugin/__init__.py`:
- Around line 1108-1111: The broad except Exception around the dynamic import of
"run_agent" (the __import__("run_agent", fromlist=["AIAgent"]) call) should be
narrowed to only catch import-related errors so other failures surface; change
the handler to catch ImportError and ModuleNotFoundError (and return False
there) instead of catching Exception so AttributeError or other runtime errors
during import are not swallowed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d6dbc18b-0cb7-482b-83ca-c4d54be0aece

📥 Commits

Reviewing files that changed from the base of the PR and between 50c208b and 2e9e2b5.

📒 Files selected for processing (2)
  • agents/hermes/plugin/__init__.py
  • test/hermes-plugin-handlers.test.ts

Comment thread agents/hermes/plugin/__init__.py
Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26434471925
Target ref: fix/3893_telegram-tool-call-leak-dco
Workflow ref: main
Requested jobs: hermes-e2e,hermes-discord-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
hermes-discord-e2e ✅ success
hermes-e2e ✅ success

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26459374275
Target ref: f43e6e342f19a5d8cb3c7dd2dd560709b7d9297c
Workflow ref: main
Requested jobs: hermes-slack-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
hermes-slack-e2e ✅ success

@wscurran wscurran added fix integration: telegram Telegram integration or channel behavior labels May 26, 2026
@wscurran

Copy link
Copy Markdown
Contributor

@wscurran wscurran added the integration: hermes Hermes integration behavior label May 26, 2026
@cv cv added v0.0.52 Release target v0.0.53 Release target and removed v0.0.50 Release target v0.0.52 Release target labels May 26, 2026

@cv cv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes based on the PR Review Advisor rubric.

The global raw send_message normalizer can misroute cross-platform pseudo-calls. _RAW_MESSAGING_TARGET_RE accepts many platforms and _install_messaging_response_patch() installs the normalization globally on AIAgent._strip_think_blocks() without knowing the current chat platform, first-turn state, or whether the user explicitly asked to send a separate cross-platform message. A malformed response like send_message: "to slack: ..." from a Telegram-origin chat could be stripped to plain text and delivered back to Telegram instead of remaining a tool-dispatch/error case.

Please restrict the fallback to the current platform/current-chat readiness case (ideally with platform/turn context), and add a negative test proving a non-current-platform target is not silently normalized into the current chat. Also document the source-of-truth rationale for the workaround and add/cite runtime validation through the Hermes gateway first-message path.

@ericksoa ericksoa added v0.0.55 and removed v0.0.53 Release target labels May 27, 2026
@jyaunches jyaunches added R2 v0.0.56 Release target and removed v0.0.55 labels May 29, 2026
@cv cv added the v0.0.57 Release target label Jun 1, 2026
@cv cv removed the v0.0.56 Release target label Jun 1, 2026
cjagwani and others added 2 commits June 2, 2026 12:37
…4175)

Addresses the CHANGES_REQUESTED review by @cv on PR #4175. The original
patch installed `_normalize_raw_messaging_tool_response` globally on
`AIAgent._strip_think_blocks` with no current-platform context, so a stray
response like `send_message: "to slack: ..."` emitted in a Telegram chat
would be silently normalized to plain text and delivered back to Telegram
— misrouting the cross-platform send_message intent into the wrong chat.

This commit:

- Adds `_current_messaging_platform` module state + `_set_/_get_` helpers,
  validated against the `_MESSAGING_PLATFORMS` allowlist.
- Extends `_normalize_raw_messaging_tool_response` to take an optional
  `current_platform` argument and leave the raw `send_message:` text intact
  whenever the target platform doesn't match the current chat or the
  current platform is unknown — so dispatch / error paths still surface
  upstream rather than getting silently delivered into the wrong chat.
- Wires the patched `_strip_think_blocks` closure to read
  `_get_current_messaging_platform()`.
- Updates `_pre_llm_call` to call `_set_current_messaging_platform` on
  every turn (not gated on context injection), so the normalizer has a
  current-platform anchor even on non-first / non-grounding turns.
- Expands the docstring to call out (a) `agents/hermes/run_agent.py` as
  source of truth for send_message routing, (b) this normalizer as
  defense-in-depth output filtering, (c) `hermes-e2e`, `hermes-discord-e2e`,
  and `hermes-slack-e2e` as runtime validation through the gateway
  first-message path.
- Adds negative-test coverage in `test/hermes-plugin-handlers.test.ts`
  for the cross-platform-blocked and unknown-platform-blocked cases plus
  positive coverage of the same-platform normalization through the class
  patch.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Co-authored-by: Chengjie Wang <chengjiew@nvidia.com>
@cv cv added v0.0.58 Release target and removed v0.0.57 Release target labels Jun 3, 2026
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output area: messaging Messaging channels, bridges, manifests, or channel lifecycle bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed NemoClaw CLI labels Jun 3, 2026
cjagwani and others added 2 commits June 3, 2026 11:59
…m anchor

Addresses @cv's review ask #4 on PR #4175 — runtime validation through the
Hermes gateway first-message path — by adding an integration-style test
that drives the actual `_pre_llm_call` → `AIAgent._strip_think_blocks`
chain Hermes uses, rather than only exercising the normalizer in isolation.

The new test:
- Stubs `run_agent.AIAgent._strip_think_blocks` as a real method.
- Calls `_pre_llm_call(platform="telegram", is_first_turn=True)` to simulate
  the first Telegram turn arriving via the gateway hook — this both sets
  the platform anchor and installs the patch via the same code path Hermes
  uses at runtime.
- Calls `AIAgent._strip_think_blocks(...)` twice to assert (a) same-platform
  body is extracted, (b) cross-platform `to slack: ...` is preserved.
- Calls `_pre_llm_call(platform="discord", is_first_turn=True)` to simulate
  a turn on a different platform and asserts the anchor refreshes per-turn
  (Discord body extracted, telegram target preserved).

Combined with the existing isolated-unit cases and the cited
`hermes-e2e` / `hermes-discord-e2e` / `hermes-slack-e2e` scenarios, this
gives both "added" and "cited" runtime validation for the gateway
first-message path.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Co-authored-by: Chengjie Wang <chengjiew@nvidia.com>
@cjagwani

cjagwani commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@cv The normalizer is now anchored to the current chat: _current_messaging_platform is set on every _pre_llm_call and read by the patched _strip_think_blocks, so cross-platform or unknown-platform targets pass through untouched.

Your to slack: from a Telegram chat case is pinned by negative tests at both the direct-API and class-patch layers, plus a new integration test that drives _pre_llm_call → _strip_think_blocks across a Telegram → Discord switch. Docstring names agents/hermes/run_agent.py as authoritative for routing and cites hermes-e2e / hermes-discord-e2e / hermes-slack-e2e as gateway coverage.

The #3893 readiness-gate suggestion is a different architectural approach to the same symptom — left out of scope here. Happy to file a follow-up if you'd rather pursue it. Re-requesting review.

@cjagwani cjagwani requested a review from cv June 3, 2026 19:10
@cv cv enabled auto-merge (squash) June 3, 2026 19:11
@cv cv disabled auto-merge June 3, 2026 19:21
@cv cv merged commit 8637a78 into main Jun 3, 2026
18 checks passed
@cv cv deleted the fix/3893_telegram-tool-call-leak-dco branch June 3, 2026 19:22
@wscurran wscurran removed the feature PR adds or expands user-visible functionality label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output area: messaging Messaging channels, bridges, manifests, or channel lifecycle bug-fix PR fixes a bug or regression integration: hermes Hermes integration behavior integration: telegram Telegram integration or channel behavior v0.0.58 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Brev][Agent&Skills] NemoHermes first Telegram message leaks raw send_message tool call as text instead of executing it

7 participants