Skip to content

fix(hermes): route Anthropic messages through managed inference#4402

Merged
cv merged 3 commits into
mainfrom
fix/4230_hermes_anthropic_messages_policy
Jun 4, 2026
Merged

fix(hermes): route Anthropic messages through managed inference#4402
cv merged 3 commits into
mainfrom
fix/4230_hermes_anthropic_messages_policy

Conversation

@chengjiew

@chengjiew chengjiew commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • allow Hermes managed inference policy to POST to /v1/messages for Anthropic Messages API providers
  • pass NEMOCLAW_INFERENCE_API into Hermes sandbox config generation and map Anthropic/OpenAI Responses API modes for Hermes
  • install Hermes native anthropic extra in the base image and add regression tests for the policy/config/provisioning path

Fixes #4230

Validation

  • npm run build:cli
  • ./node_modules/.bin/vitest run src/lib/onboard/dockerfile-patch.test.ts test/generate-hermes-config.test.ts test/sandbox-provisioning.test.ts test/validate-blueprint.test.ts test/validate-config-schemas.test.ts
  • npm run validate:configs
  • git diff --check origin/main

E2E evidence

  • Linux old-policy Hermes sandbox reproduced 403 connection not allowed by policy for Anthropic-compatible routing.
  • Linux fixed clean sandbox generated api_mode: anthropic_messages, imported anthropic==0.87.0, and the fake Anthropic server observed POST /v1/messages from Hermes without the policy 403.

Signed-off-by: Chengjie Wang chengjiew@nvidia.com

Summary by CodeRabbit

  • New Features

    • Added support for Anthropic Messages as an inference backend and dynamic routing of inference requests based on deployment settings.
  • Configuration

    • Inference API can now be specified at build/runtime; base image now includes Anthropic extras and policies now allow POST /v1/messages and POST /v1/responses for managed inference.
  • Tests

    • Added regression, routing, sandbox provisioning, and validation tests, including a failure case for unsupported inference API values.

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8dd6b3ea-f82e-4f47-808a-7bec59622f79

📥 Commits

Reviewing files that changed from the base of the PR and between a934490 and a0d7dd1.

📒 Files selected for processing (5)
  • agents/hermes/Dockerfile.base
  • agents/hermes/config/hermes-config.ts
  • agents/hermes/policy-additions.yaml
  • test/generate-hermes-config.test.ts
  • test/validate-blueprint.test.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • agents/hermes/Dockerfile.base
  • test/generate-hermes-config.test.ts
  • test/validate-blueprint.test.ts
  • agents/hermes/policy-additions.yaml

📝 Walkthrough

Walkthrough

Adds Anthropic Messages support to Hermes: accepts NEMOCLAW_INFERENCE_API at build time and exposes it at runtime, installs the anthropic uv extra, maps inference API types to Hermes api_mode in config generation, and allows POST /v1/messages in the managed_inference network policy. Tests added for Docker, config, and policy.

Changes

Anthropic Messages API Integration

Layer / File(s) Summary
Docker build configuration and dependencies
agents/hermes/Dockerfile, agents/hermes/Dockerfile.base, test/sandbox-provisioning.test.ts
Adds ARG NEMOCLAW_INFERENCE_API and propagates it via ENV; adds anthropic to HERMES_UV_EXTRAS default; tests verify Dockerfile/base values are present.
Config generation and API mode routing
agents/hermes/config/hermes-config.ts, test/generate-hermes-config.test.ts
Adds local hermesApiMode() helper and refactors buildHermesConfig to build modelConfig and conditionally set model.api_mode from settings.inferenceApi; tests assert anthropic_messages and codex_responses mappings and error on unsupported values.
Network policy and validation
agents/hermes/policy-additions.yaml, test/validate-blueprint.test.ts
Adds POST /v1/messages and POST /v1/responses allow rules to managed_inference policy and test fixtures/assertions validating the ordered rules and endpoint fields for the Hermes sandbox policy.

Sequence Diagram(s)

sequenceDiagram
  participant User as User/Client
  participant Hermes as Hermes runtime
  participant ConfigGen as config generator
  participant InferenceGW as inference.local (gateway)

  User->>Hermes: send message
  Hermes->>ConfigGen: read model config (NEMOCLAW_INFERENCE_API / api_mode)
  Hermes->>InferenceGW: POST /v1/messages or POST /v1/chat/completions (per api_mode)
  InferenceGW->>Hermes: model response
  Hermes->>User: deliver response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • NVIDIA/NemoClaw#4718 — Modifies the same Hermes config path; both PRs update model config handling and related defaults.

Suggested labels

Provider: Anthropic

Suggested reviewers

  • cv
  • cjagwani

Poem

🐰 I hop with joy, configs aligned,
Build args threaded, rules defined,
Anthropic paths no longer barred,
Messages travel fast and hard,
A tiny rabbit's cheer — no 403 mind!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the primary change: enabling Anthropic messages routing through managed inference policy by fixing the /v1/messages policy path.
Linked Issues check ✅ Passed All coding requirements from #4230 are met: policy allows /v1/messages, NEMOCLAW_INFERENCE_API is passed to Hermes config, api_mode mapping supports Anthropic and OpenAI Responses, anthropic extra is installed, and regression tests validate the full path.
Out of Scope Changes check ✅ Passed All changes directly support #4230 objectives: policy fixes, config mapping, provisioning, and comprehensive regression tests are all in-scope for the Anthropic messages routing fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/4230_hermes_anthropic_messages_policy

Comment @coderabbitai help to get the list of available commands and usage tips.

@chengjiew chengjiew added bug Something fails against expected or documented behavior platform: macos Affects macOS, including Apple Silicon provider: anthropic Anthropic or Claude provider behavior integration: hermes Hermes integration behavior enhancement: inference NV QA Bugs found by the NVIDIA QA Team labels May 28, 2026
@github-actions

github-actions Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

@github-actions

github-actions Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

@github-actions

github-actions Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • PR review advisor unavailable: The automated advisor could not complete: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt
    • Recommendation: Re-run the PR Review Advisor or perform a manual review.
    • Evidence: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@wscurran wscurran added area: inference Inference routing, serving, model selection, or outputs bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed enhancement: inference labels Jun 3, 2026
@cv cv added the v0.0.58 Release target label Jun 3, 2026
@cv cv self-assigned this Jun 3, 2026
@cv cv added v0.0.59 Release target and removed v0.0.58 Release target labels Jun 4, 2026
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-requested a review June 4, 2026 19:30
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv merged commit c5ce66c into main Jun 4, 2026
31 checks passed
@cv cv deleted the fix/4230_hermes_anthropic_messages_policy branch June 4, 2026 19:59
cv pushed a commit that referenced this pull request Jun 5, 2026
## Summary
- Add the v0.0.59 release notes from the GitHub announcement discussion.
- Refresh local inference and credential-storage guidance for the
current release behavior.
- Regenerate the user skills from the updated Fern docs.
- Tighten release-prep and docs review guidance for generated skills, PR
labels, and shared `$$nemoclaw` command placeholders.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" --glob '*.{md,mdx}'`
- `git diff --check`
- `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC
permission failure)
- `npm run typecheck:cli`
- Pre-commit hooks during commit passed, including markdownlint,
docs-to-skills verification, gitleaks, commitlint, and skills YAML
tests.

## Source Summary
- #3679, #4437, #4681, #4766, #4772, #4775, #4786 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`,
`docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27
compatibility, runtime path pinning, plugin registry recovery, live
gateway reconciliation, and clearer host-alias/startup diagnostics.
- #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`,
`docs/inference/inference-options.mdx`,
`docs/inference/use-local-inference.mdx`,
`docs/inference/switch-inference-providers.mdx`: Document the release
inference changes covering Local NIM waits, Hermes Anthropic routing,
Nemotron 3 Ultra, the current Ollama starter fallback, and Spark
managed-vLLM context length.
- #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`,
`docs/security/credential-storage.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`: Capture permission healing,
gateway-stored credential reuse, cross-sandbox messaging credential
conflict checks, and CDI preflight diagnostics.
- #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`:
Regenerate the user skill references from the updated source docs.
- Follow-up maintenance ->
`.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`,
`.coderabbit.yaml`: Add release-prep area labels for docs and skills
PRs, and teach docs review guidance that `$$nemoclaw` is the correct
shared command placeholder for examples that work across agent aliases.

Note: the `documentation` label was not present in the repository, so
this PR is labeled with `v0.0.59` only.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
  * Updated default model for local Ollama inference setup to qwen3.5:9b
  * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option
* Clarified credential storage and reuse behavior for post-deployment
(day-two) operations
* Added v0.0.59 release notes covering OpenClaw compatibility, inference
options, Hermes messaging sync, and troubleshooting
* Clarified CLI selection guidance and updated OpenClaw version example
in status output
* Revised release-prep instructions and docs review guidance for CLI
alias usage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
cv pushed a commit that referenced this pull request Jun 5, 2026
## Summary
- Resolve the managed inference API family during `nemoclaw inference
set` / `nemohermes inference set` before patching in-sandbox config.
- Set Hermes `model.api_mode` for Anthropic Messages and OpenAI
Responses routes, and clear stale `api_mode` when switching back to
OpenAI-style chat completions.
- Preserve the Bedrock Runtime adapter exception: same-provider
compatible-Anthropic routes that were resolved as OpenAI-compatible stay
on `/v1/chat/completions`.
- Add hermetic Anthropic Messages switch coverage for both Hermes and
OpenClaw: the E2E scripts can register a compatible Anthropic mock
provider, verify `/v1/messages` through `inference.local`, then exercise
the agent path after the switch.

## Why
#4809 reported a `403 connection not allowed by policy` while the agent
was calling `https://inference.local`, so the right fix is not to open
direct sandbox egress to the upstream inference host. #4402 fixed fresh
Hermes onboarding by allowing managed `/v1/messages` and baking
`api_mode: anthropic_messages`. This PR covers the remaining
runtime-switch path so both Hermes and OpenClaw keep using
OpenShell-managed inference correctly after `inference set`.

## References
- Refs #4809
- Related #4230
- Builds on #4402

## Validation
- `npx vitest run src/lib/actions/inference-set.test.ts`
- `npx vitest run src/lib/actions/inference-set.test.ts
src/lib/inference/config.test.ts test/generate-hermes-config.test.ts
test/generate-openclaw-config.test.ts` (initial combined run hit two
existing 5s per-test timeouts in
`test/generate-openclaw-config.test.ts`; rerun below passed with a
larger timeout)
- `npx vitest run test/generate-openclaw-config.test.ts --testTimeout
20000`
- `npx vitest run test/validate-e2e-coverage.test.ts`
- `shellcheck test/e2e/test-hermes-inference-switch.sh
test/e2e/test-openclaw-inference-switch.sh
test/e2e/lib/anthropic-switch-provider.sh
test/e2e/lib/inference-switch-retry.sh`
- `bash -n test/e2e/test-hermes-inference-switch.sh
test/e2e/test-openclaw-inference-switch.sh
test/e2e/lib/anthropic-switch-provider.sh
test/e2e/lib/inference-switch-retry.sh`
- `npx biome check src/lib/actions/inference-set.ts
src/lib/actions/inference-set.test.ts`
- `npm run build:cli`
- `npm run validate:configs`
- `git diff --check`
- PR checks green on head `e21952d57e8ef23caa266d6862e7367ec3bd3814`,
including commit-lint and DCO.
- Targeted E2E run `27014755537` passed both new agent-path proofs:
  - `hermes-anthropic-inference-switch-e2e / run`
  - `openclaw-anthropic-inference-switch-e2e / run`


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added support for switching between OpenAI and Anthropic inference API
modes in sandbox configurations.

* **Tests**
* Introduced nightly E2E test jobs for validating Anthropic inference
switching across agents.
* Expanded test coverage for inference API configuration validation and
provider switching scenarios.
  * Added mock Anthropic provider support for local E2E testing.

* **Chores**
* Updated CI/CD workflow to include new inference-switch E2E test jobs
and orchestration.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@wscurran wscurran removed bug Something fails against expected or documented behavior feature PR adds or expands user-visible functionality labels Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: inference Inference routing, serving, model selection, or outputs bug-fix PR fixes a bug or regression integration: hermes Hermes integration behavior NV QA Bugs found by the NVIDIA QA Team platform: macos Affects macOS, including Apple Silicon provider: anthropic Anthropic or Claude provider behavior v0.0.59 Release target

Projects

None yet

3 participants