fix(hermes): route Anthropic messages through managed inference#4402
Conversation
Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughAdds Anthropic Messages support to Hermes: accepts ChangesAnthropic Messages API Integration
Sequence Diagram(s)sequenceDiagram
participant User as User/Client
participant Hermes as Hermes runtime
participant ConfigGen as config generator
participant InferenceGW as inference.local (gateway)
User->>Hermes: send message
Hermes->>ConfigGen: read model config (NEMOCLAW_INFERENCE_API / api_mode)
Hermes->>InferenceGW: POST /v1/messages or POST /v1/chat/completions (per api_mode)
InferenceGW->>Hermes: model response
Hermes->>User: deliver response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryE2E Recommendation AdvisorFailed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt |
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorFailed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt |
PR Review AdvisorFindings: 0 needs attention, 1 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
This is an automated advisory review. A human maintainer must make the final merge decision. |
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
## Summary
- Add the v0.0.59 release notes from the GitHub announcement discussion.
- Refresh local inference and credential-storage guidance for the
current release behavior.
- Regenerate the user skills from the updated Fern docs.
- Tighten release-prep and docs review guidance for generated skills, PR
labels, and shared `$$nemoclaw` command placeholders.
## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" --glob '*.{md,mdx}'`
- `git diff --check`
- `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC
permission failure)
- `npm run typecheck:cli`
- Pre-commit hooks during commit passed, including markdownlint,
docs-to-skills verification, gitleaks, commitlint, and skills YAML
tests.
## Source Summary
- #3679, #4437, #4681, #4766, #4772, #4775, #4786 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`,
`docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27
compatibility, runtime path pinning, plugin registry recovery, live
gateway reconciliation, and clearer host-alias/startup diagnostics.
- #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`,
`docs/inference/inference-options.mdx`,
`docs/inference/use-local-inference.mdx`,
`docs/inference/switch-inference-providers.mdx`: Document the release
inference changes covering Local NIM waits, Hermes Anthropic routing,
Nemotron 3 Ultra, the current Ollama starter fallback, and Spark
managed-vLLM context length.
- #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`,
`docs/security/credential-storage.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`: Capture permission healing,
gateway-stored credential reuse, cross-sandbox messaging credential
conflict checks, and CDI preflight diagnostics.
- #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`:
Regenerate the user skill references from the updated source docs.
- Follow-up maintenance ->
`.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`,
`.coderabbit.yaml`: Add release-prep area labels for docs and skills
PRs, and teach docs review guidance that `$$nemoclaw` is the correct
shared command placeholder for examples that work across agent aliases.
Note: the `documentation` label was not present in the repository, so
this PR is labeled with `v0.0.59` only.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Updated default model for local Ollama inference setup to qwen3.5:9b
* Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option
* Clarified credential storage and reuse behavior for post-deployment
(day-two) operations
* Added v0.0.59 release notes covering OpenClaw compatibility, inference
options, Hermes messaging sync, and troubleshooting
* Clarified CLI selection guidance and updated OpenClaw version example
in status output
* Revised release-prep instructions and docs review guidance for CLI
alias usage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary - Resolve the managed inference API family during `nemoclaw inference set` / `nemohermes inference set` before patching in-sandbox config. - Set Hermes `model.api_mode` for Anthropic Messages and OpenAI Responses routes, and clear stale `api_mode` when switching back to OpenAI-style chat completions. - Preserve the Bedrock Runtime adapter exception: same-provider compatible-Anthropic routes that were resolved as OpenAI-compatible stay on `/v1/chat/completions`. - Add hermetic Anthropic Messages switch coverage for both Hermes and OpenClaw: the E2E scripts can register a compatible Anthropic mock provider, verify `/v1/messages` through `inference.local`, then exercise the agent path after the switch. ## Why #4809 reported a `403 connection not allowed by policy` while the agent was calling `https://inference.local`, so the right fix is not to open direct sandbox egress to the upstream inference host. #4402 fixed fresh Hermes onboarding by allowing managed `/v1/messages` and baking `api_mode: anthropic_messages`. This PR covers the remaining runtime-switch path so both Hermes and OpenClaw keep using OpenShell-managed inference correctly after `inference set`. ## References - Refs #4809 - Related #4230 - Builds on #4402 ## Validation - `npx vitest run src/lib/actions/inference-set.test.ts` - `npx vitest run src/lib/actions/inference-set.test.ts src/lib/inference/config.test.ts test/generate-hermes-config.test.ts test/generate-openclaw-config.test.ts` (initial combined run hit two existing 5s per-test timeouts in `test/generate-openclaw-config.test.ts`; rerun below passed with a larger timeout) - `npx vitest run test/generate-openclaw-config.test.ts --testTimeout 20000` - `npx vitest run test/validate-e2e-coverage.test.ts` - `shellcheck test/e2e/test-hermes-inference-switch.sh test/e2e/test-openclaw-inference-switch.sh test/e2e/lib/anthropic-switch-provider.sh test/e2e/lib/inference-switch-retry.sh` - `bash -n test/e2e/test-hermes-inference-switch.sh test/e2e/test-openclaw-inference-switch.sh test/e2e/lib/anthropic-switch-provider.sh test/e2e/lib/inference-switch-retry.sh` - `npx biome check src/lib/actions/inference-set.ts src/lib/actions/inference-set.test.ts` - `npm run build:cli` - `npm run validate:configs` - `git diff --check` - PR checks green on head `e21952d57e8ef23caa266d6862e7367ec3bd3814`, including commit-lint and DCO. - Targeted E2E run `27014755537` passed both new agent-path proofs: - `hermes-anthropic-inference-switch-e2e / run` - `openclaw-anthropic-inference-switch-e2e / run` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added support for switching between OpenAI and Anthropic inference API modes in sandbox configurations. * **Tests** * Introduced nightly E2E test jobs for validating Anthropic inference switching across agents. * Expanded test coverage for inference API configuration validation and provider switching scenarios. * Added mock Anthropic provider support for local E2E testing. * **Chores** * Updated CI/CD workflow to include new inference-switch E2E test jobs and orchestration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Summary
/v1/messagesfor Anthropic Messages API providersNEMOCLAW_INFERENCE_APIinto Hermes sandbox config generation and map Anthropic/OpenAI Responses API modes for Hermesanthropicextra in the base image and add regression tests for the policy/config/provisioning pathFixes #4230
Validation
npm run build:cli./node_modules/.bin/vitest run src/lib/onboard/dockerfile-patch.test.ts test/generate-hermes-config.test.ts test/sandbox-provisioning.test.ts test/validate-blueprint.test.ts test/validate-config-schemas.test.tsnpm run validate:configsgit diff --check origin/mainE2E evidence
403 connection not allowed by policyfor Anthropic-compatible routing.api_mode: anthropic_messages, importedanthropic==0.87.0, and the fake Anthropic server observedPOST /v1/messagesfrom Hermes without the policy 403.Signed-off-by: Chengjie Wang chengjiew@nvidia.com
Summary by CodeRabbit
New Features
Configuration
Tests