fix(hermes): route Anthropic messages through managed inference by chengjiew · Pull Request #4402 · NVIDIA/NemoClaw

chengjiew · 2026-05-28T04:40:55Z

Summary

allow Hermes managed inference policy to POST to /v1/messages for Anthropic Messages API providers
pass NEMOCLAW_INFERENCE_API into Hermes sandbox config generation and map Anthropic/OpenAI Responses API modes for Hermes
install Hermes native anthropic extra in the base image and add regression tests for the policy/config/provisioning path

Fixes #4230

Validation

npm run build:cli
./node_modules/.bin/vitest run src/lib/onboard/dockerfile-patch.test.ts test/generate-hermes-config.test.ts test/sandbox-provisioning.test.ts test/validate-blueprint.test.ts test/validate-config-schemas.test.ts
npm run validate:configs
git diff --check origin/main

E2E evidence

Linux old-policy Hermes sandbox reproduced 403 connection not allowed by policy for Anthropic-compatible routing.
Linux fixed clean sandbox generated api_mode: anthropic_messages, imported anthropic==0.87.0, and the fake Anthropic server observed POST /v1/messages from Hermes without the policy 403.

Signed-off-by: Chengjie Wang chengjiew@nvidia.com

Summary by CodeRabbit

New Features
- Added support for Anthropic Messages as an inference backend and dynamic routing of inference requests based on deployment settings.
Configuration
- Inference API can now be specified at build/runtime; base image now includes Anthropic extras and policies now allow POST /v1/messages and POST /v1/responses for managed inference.
Tests
- Added regression, routing, sandbox provisioning, and validation tests, including a failure case for unsupported inference API values.

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

coderabbitai · 2026-05-28T04:41:07Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8dd6b3ea-f82e-4f47-808a-7bec59622f79

📥 Commits

Reviewing files that changed from the base of the PR and between a934490 and a0d7dd1.

📒 Files selected for processing (5)

agents/hermes/Dockerfile.base
agents/hermes/config/hermes-config.ts
agents/hermes/policy-additions.yaml
test/generate-hermes-config.test.ts
test/validate-blueprint.test.ts

🚧 Files skipped from review as they are similar to previous changes (4)

agents/hermes/Dockerfile.base
test/generate-hermes-config.test.ts
test/validate-blueprint.test.ts
agents/hermes/policy-additions.yaml

📝 Walkthrough

Walkthrough

Adds Anthropic Messages support to Hermes: accepts NEMOCLAW_INFERENCE_API at build time and exposes it at runtime, installs the anthropic uv extra, maps inference API types to Hermes api_mode in config generation, and allows POST /v1/messages in the managed_inference network policy. Tests added for Docker, config, and policy.

Changes

Anthropic Messages API Integration

Layer / File(s)	Summary
Docker build configuration and dependencies `agents/hermes/Dockerfile`, `agents/hermes/Dockerfile.base`, `test/sandbox-provisioning.test.ts`	Adds `ARG NEMOCLAW_INFERENCE_API` and propagates it via `ENV`; adds `anthropic` to `HERMES_UV_EXTRAS` default; tests verify Dockerfile/base values are present.
Config generation and API mode routing `agents/hermes/config/hermes-config.ts`, `test/generate-hermes-config.test.ts`	Adds local `hermesApiMode()` helper and refactors `buildHermesConfig` to build `modelConfig` and conditionally set `model.api_mode` from `settings.inferenceApi`; tests assert `anthropic_messages` and `codex_responses` mappings and error on unsupported values.
Network policy and validation `agents/hermes/policy-additions.yaml`, `test/validate-blueprint.test.ts`	Adds `POST /v1/messages` and `POST /v1/responses` allow rules to `managed_inference` policy and test fixtures/assertions validating the ordered rules and endpoint fields for the Hermes sandbox policy.

Sequence Diagram(s)

sequenceDiagram
  participant User as User/Client
  participant Hermes as Hermes runtime
  participant ConfigGen as config generator
  participant InferenceGW as inference.local (gateway)

  User->>Hermes: send message
  Hermes->>ConfigGen: read model config (NEMOCLAW_INFERENCE_API / api_mode)
  Hermes->>InferenceGW: POST /v1/messages or POST /v1/chat/completions (per api_mode)
  InferenceGW->>Hermes: model response
  Hermes->>User: deliver response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

nemoclaw inference set breaks Hermes sandboxes for Anthropic providers (Hermes sync drops the anthropic-messages wire mode) #4746 — The change to set model.api_mode from an inference API flag and propagate NEMOCLAW_INFERENCE_API appears to address the issue of Hermes omitting api_mode for Anthropic wire formats.

Possibly related PRs

NVIDIA/NemoClaw#4718 — Modifies the same Hermes config path; both PRs update model config handling and related defaults.

Suggested labels

Provider: Anthropic

Suggested reviewers

cv
cjagwani

Poem

🐰 I hop with joy, configs aligned,
Build args threaded, rules defined,
Anthropic paths no longer barred,
Messages travel fast and hard,
A tiny rabbit's cheer — no 403 mind!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the primary change: enabling Anthropic messages routing through managed inference policy by fixing the /v1/messages policy path.
Linked Issues check	✅ Passed	All coding requirements from `#4230` are met: policy allows /v1/messages, NEMOCLAW_INFERENCE_API is passed to Hermes config, api_mode mapping supports Anthropic and OpenAI Responses, anthropic extra is installed, and regression tests validate the full path.
Out of Scope Changes check	✅ Passed	All changes directly support `#4230` objectives: policy fixes, config mapping, provisioning, and comprehensive regression tests are all in-scope for the Anthropic messages routing fix.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/4230_hermes_anthropic_messages_policy

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-28T04:42:41Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

github-actions · 2026-05-28T04:42:42Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

github-actions · 2026-05-28T04:47:58Z

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

PR review advisor unavailable: The automated advisor could not complete: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt
- Recommendation: Re-run the PR Review Advisor or perform a manual review.
- Evidence: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt

🌱 Nice ideas

None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

## Summary - Add the v0.0.59 release notes from the GitHub announcement discussion. - Refresh local inference and credential-storage guidance for the current release behavior. - Regenerate the user skills from the updated Fern docs. - Tighten release-prep and docs review guidance for generated skills, PR labels, and shared `$$nemoclaw` command placeholders. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" --glob '*.{md,mdx}'` - `git diff --check` - `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC permission failure) - `npm run typecheck:cli` - Pre-commit hooks during commit passed, including markdownlint, docs-to-skills verification, gitleaks, commitlint, and skills YAML tests. ## Source Summary - #3679, #4437, #4681, #4766, #4772, #4775, #4786 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`, `docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27 compatibility, runtime path pinning, plugin registry recovery, live gateway reconciliation, and clearer host-alias/startup diagnostics. - #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`, `docs/inference/inference-options.mdx`, `docs/inference/use-local-inference.mdx`, `docs/inference/switch-inference-providers.mdx`: Document the release inference changes covering Local NIM waits, Hermes Anthropic routing, Nemotron 3 Ultra, the current Ollama starter fallback, and Spark managed-vLLM context length. - #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`, `docs/security/credential-storage.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/troubleshooting.mdx`: Capture permission healing, gateway-stored credential reuse, cross-sandbox messaging credential conflict checks, and CDI preflight diagnostics. - #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`: Regenerate the user skill references from the updated source docs. - Follow-up maintenance -> `.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`, `.coderabbit.yaml`: Add release-prep area labels for docs and skills PRs, and teach docs review guidance that `$$nemoclaw` is the correct shared command placeholder for examples that work across agent aliases. Note: the `documentation` label was not present in the repository, so this PR is labeled with `v0.0.59` only.  ## Summary by CodeRabbit * **Documentation** * Updated default model for local Ollama inference setup to qwen3.5:9b * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option * Clarified credential storage and reuse behavior for post-deployment (day-two) operations * Added v0.0.59 release notes covering OpenClaw compatibility, inference options, Hermes messaging sync, and troubleshooting * Clarified CLI selection guidance and updated OpenClaw version example in status output * Revised release-prep instructions and docs review guidance for CLI alias usage

## Summary - Resolve the managed inference API family during `nemoclaw inference set` / `nemohermes inference set` before patching in-sandbox config. - Set Hermes `model.api_mode` for Anthropic Messages and OpenAI Responses routes, and clear stale `api_mode` when switching back to OpenAI-style chat completions. - Preserve the Bedrock Runtime adapter exception: same-provider compatible-Anthropic routes that were resolved as OpenAI-compatible stay on `/v1/chat/completions`. - Add hermetic Anthropic Messages switch coverage for both Hermes and OpenClaw: the E2E scripts can register a compatible Anthropic mock provider, verify `/v1/messages` through `inference.local`, then exercise the agent path after the switch. ## Why #4809 reported a `403 connection not allowed by policy` while the agent was calling `https://inference.local`, so the right fix is not to open direct sandbox egress to the upstream inference host. #4402 fixed fresh Hermes onboarding by allowing managed `/v1/messages` and baking `api_mode: anthropic_messages`. This PR covers the remaining runtime-switch path so both Hermes and OpenClaw keep using OpenShell-managed inference correctly after `inference set`. ## References - Refs #4809 - Related #4230 - Builds on #4402 ## Validation - `npx vitest run src/lib/actions/inference-set.test.ts` - `npx vitest run src/lib/actions/inference-set.test.ts src/lib/inference/config.test.ts test/generate-hermes-config.test.ts test/generate-openclaw-config.test.ts` (initial combined run hit two existing 5s per-test timeouts in `test/generate-openclaw-config.test.ts`; rerun below passed with a larger timeout) - `npx vitest run test/generate-openclaw-config.test.ts --testTimeout 20000` - `npx vitest run test/validate-e2e-coverage.test.ts` - `shellcheck test/e2e/test-hermes-inference-switch.sh test/e2e/test-openclaw-inference-switch.sh test/e2e/lib/anthropic-switch-provider.sh test/e2e/lib/inference-switch-retry.sh` - `bash -n test/e2e/test-hermes-inference-switch.sh test/e2e/test-openclaw-inference-switch.sh test/e2e/lib/anthropic-switch-provider.sh test/e2e/lib/inference-switch-retry.sh` - `npx biome check src/lib/actions/inference-set.ts src/lib/actions/inference-set.test.ts` - `npm run build:cli` - `npm run validate:configs` - `git diff --check` - PR checks green on head `e21952d57e8ef23caa266d6862e7367ec3bd3814`, including commit-lint and DCO. - Targeted E2E run `27014755537` passed both new agent-path proofs: - `hermes-anthropic-inference-switch-e2e / run` - `openclaw-anthropic-inference-switch-e2e / run`  ## Summary by CodeRabbit ## Release Notes * **New Features** * Added support for switching between OpenAI and Anthropic inference API modes in sandbox configurations. * **Tests** * Introduced nightly E2E test jobs for validating Anthropic inference switching across agents. * Expanded test coverage for inference API configuration validation and provider switching scenarios. * Added mock Anthropic provider support for local E2E testing. * **Chores** * Updated CI/CD workflow to include new inference-switch E2E test jobs and orchestration.  --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

fix(hermes): route Anthropic messages through managed inference

97aeeda

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

wscurran added area: inference Inference routing, serving, model selection, or outputs bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed enhancement: inference labels Jun 3, 2026

cv added the v0.0.58 Release target label Jun 3, 2026

cv self-assigned this Jun 3, 2026

cv added v0.0.59 Release target and removed v0.0.58 Release target labels Jun 4, 2026

merge: main into hermes anthropic messages policy

a934490

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-requested a review June 4, 2026 19:30

fix(hermes): address inference routing review feedback

a0d7dd1

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv approved these changes Jun 4, 2026

View reviewed changes

cv merged commit c5ce66c into main Jun 4, 2026
31 checks passed

cv deleted the fix/4230_hermes_anthropic_messages_policy branch June 4, 2026 19:59

miyoungc mentioned this pull request Jun 5, 2026

docs: refresh 0.0.59 release notes #4790

Merged

ericksoa mentioned this pull request Jun 5, 2026

fix(inference): sync anthropic runtime routes #4847

Merged

wscurran removed bug Something fails against expected or documented behavior feature PR adds or expands user-visible functionality labels Jun 8, 2026

coderabbitai Bot mentioned this pull request Jun 9, 2026

fix(hermes): surface upstream provider and clarify onboard menu #5010

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(hermes): route Anthropic messages through managed inference#4402

fix(hermes): route Anthropic messages through managed inference#4402
cv merged 3 commits into
mainfrom
fix/4230_hermes_anthropic_messages_policy

chengjiew commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

E2E Scenario Advisor

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chengjiew commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

E2E evidence

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chengjiew commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading