You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is the source of truth for the model performance and capability audit across the models and agent surfaces NemoClaw supports.
The goal is not just to confirm that each model can answer a one-shot prompt. The goal is to verify that each supported model works well as an agent model in the NemoClaw/OpenShell environment: tool calls, shell execution, multi-turn tool-result continuation, sub-agent delegation where applicable, and the provider-specific response shapes that our agents consume.
PR #3046 fixed a concrete Kimi K2.6/OpenClaw incompatibility where moonshotai/kimi-k2.6 could emit a combined shell command such as hostname; date; uptime as one exec call. OpenClaw needs separate tool-call boundaries for persistence, replay, and tool-result correlation. The individual Kimi issue is closed as fixed in #2620.
That fix exposed the broader product requirement: every model exposed through onboarding should be validated as an agent model, not merely as a chat model. Some models need model-aware or provider-aware affordances to work correctly in shell-agent loops. Those affordances must be discovered, documented, tested, and either captured in the model-specific setup registry proposed by #3120 or classified as provider-class transport policy.
Initial audit artifact:
model-affordance-audit.md generated from main at f5b8144d577ccd680875291d33eaabb656509d5a
Agent surfaces in scope
Audit the model behavior against the agent surfaces NemoClaw currently supports:
OpenClaw primary main agent through the default NemoClaw sandbox path
OpenClaw CLI prompt path, including shell/tool execution trajectories
OpenClaw browser/gateway path when it changes request/response behavior from the CLI path
OpenClaw sub-agent delegation through sessions_spawn / agents.list
NemoHermes / Hermes sandbox path and OpenAI-compatible API surface
Task-specific auxiliary models documented by NemoClaw examples, such as the Omni vision sub-agent pattern, when credentials and runnable test coverage are available
Messaging integrations are not separate model-capability targets unless the message channel changes model routing or response handling. The core model audit should run at the agent/runtime boundary first.
Supported model inventory to audit
NVIDIA Endpoints
nvidia/nemotron-3-super-120b-a12b
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning
z-ai/glm-5.1
minimaxai/minimax-m2.7
moonshotai/kimi-k2.6
openai/gpt-oss-120b
deepseek-ai/deepseek-v4-pro
OpenAI
gpt-5.4
gpt-5.4-mini
gpt-5.4-nano
gpt-5.4-pro-2026-03-05
Anthropic
claude-sonnet-4-6
claude-haiku-4-5
claude-opus-4-6
Gemini
gemini-3.1-pro-preview
gemini-3.1-flash-lite-preview
gemini-3-flash-preview
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
Local and experimental providers
Local Ollama default path, including qwen2.5:7b
Local Ollama default path, including nemotron-3-nano:30b when hardware permits
Local Ollama arbitrary installed model path, gated by declared tools capability
Local vLLM managed DGX Spark/Station profile: Qwen/Qwen3.6-27B-FP8
Local vLLM managed Linux NVIDIA GPU profile: nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8
Local NVIDIA NIM experimental path
Other OpenAI-compatible endpoint path
Other Anthropic-compatible endpoint path
Audit results
Completed rows:
OpenClaw / Anthropic / claude-sonnet-4-6 — pass. Validated on 2026-05-07 UTC on maind98dd8c97d1ddddfd7b6d82962934493dd6e139f with local sandbox anth-sonnet-openclaw-audit-0507, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider key anthropic, primary model anthropic/claude-sonnet-4-6, and API anthropic-messages via https://inference.local. Workflow: ANTHROPIC_API_KEY=<redacted> NEMOCLAW_PROVIDER=anthropic NEMOCLAW_MODEL=claude-sonnet-4-6 ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name anth-sonnet-openclaw-audit-0507 --agent openclaw --fresh --recreate-sandbox, then openshell sandbox exec -n anth-sonnet-openclaw-audit-0507 --timeout 900 -- /usr/local/bin/nemoclaw-start openclaw agent --agent main --json --thinking off --session-id ... -m <standard and multi-turn prompts>. Evidence: /sandbox/.openclaw/agents/main/sessions/anth-sonnet-openclaw-oneshot-1778118974.trajectory.jsonl, .jsonl, /sandbox/.openclaw/agents/main/sessions/anth-sonnet-openclaw-multiturn-1778119436.trajectory.jsonl, and .jsonl. One-shot recorded finalStatus: success, timedOut: false, no prompt error, three structured exec calls (hostname, date, uptime), correlated Anthropic tool_use IDs to toolResult entries, and a final assistant summary. Multi-turn reused the same OpenClaw session: turn 1 returned HOSTNAME=anth-sonnet-openclaw-audit-0507; turn 2 ran echo "seen:anth-sonnet-openclaw-audit-0507" without re-running hostname and summarized. Latency: 43.899s model duration one-shot, 38.971s turn 1, 47.677s turn 2. Required affordance: none; registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.
Hermes / Anthropic / claude-sonnet-4-6 — blocked. Validated on 2026-05-07 UTC on maind98dd8c97d1ddddfd7b6d82962934493dd6e139f with sandbox anth-sonnet-hermes-audit-0507, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), and Hermes API server 127.0.0.1:18642. Workflow: ANTHROPIC_API_KEY=<redacted> NEMOCLAW_PROVIDER=anthropic NEMOCLAW_MODEL=claude-sonnet-4-6 ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name anth-sonnet-hermes-audit-0507 --agent hermes --fresh --recreate-sandbox, then Hermes' own API POST http://127.0.0.1:18642/v1/chat/completions with model: hermes-agent and the standard shell-loop prompt. Evidence: /sandbox/.hermes/config.yaml, /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json, /sandbox/.hermes/sessions/request_dump_api-9816e26b83c423bc_20260507_023806_692889.json, /sandbox/.hermes/logs/agent.log, and /sandbox/.hermes/logs/errors.log. NemoClaw generated model.provider: custom, model.base_url: "https://inference.local", and no api_mode; Hermes therefore called https://inference.local/chat/completions. The API returned HTTP 200 in about 2s with assistant text Error code: 403 - {'error': 'connection not allowed by policy'}. Tool-call count: 0; no final model summary; multi-turn not attempted because one-shot fails before tool use. Required affordance: Hermes provider-config/transport behavior for Anthropic Messages, not model-specific setup. Registry decision: refactor: add agent-scoped model setup registry #3121 v1 cannot express this runtime API-mode/provider transport fix cleanly; no manifest.
OpenClaw / Anthropic / claude-haiku-4-5 — pass. Validated on 2026-05-07 UTC on maind98dd8c97d1ddddfd7b6d82962934493dd6e139f with sandbox anth-haiku-openclaw-audit-0507, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider key anthropic, primary model anthropic/claude-haiku-4-5, and API anthropic-messages via https://inference.local. Workflow: ANTHROPIC_API_KEY=<redacted> NEMOCLAW_PROVIDER=anthropic NEMOCLAW_MODEL=claude-haiku-4-5 ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name anth-haiku-openclaw-audit-0507 --agent openclaw --fresh --recreate-sandbox, then nemoclaw-start openclaw agent --agent main --json --thinking off --session-id ... -m <standard and multi-turn prompts>. Evidence: /sandbox/.openclaw/agents/main/sessions/anth-haiku-openclaw-oneshot-1778120014.trajectory.jsonl, .jsonl, /sandbox/.openclaw/agents/main/sessions/anth-haiku-openclaw-multiturn-1778120085.trajectory.jsonl, and .jsonl. One-shot recorded three structured exec calls and a final assistant summary. Multi-turn turn 1 returned HOSTNAME=anth-haiku-openclaw-audit-0507; turn 2 ran echo "seen:anth-haiku-openclaw-audit-0507", did not re-run hostname, and summarized. Latency: 39.982s model duration one-shot, 43.531s turn 1, 39.409s turn 2. Tool/result correlation used native Anthropic tool_use IDs mapped to OpenClaw toolResult entries; no prompt error or timeout observed. Required affordance: none; registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.
Hermes / Anthropic / claude-haiku-4-5 — blocked. Validated on 2026-05-07 UTC on maind98dd8c97d1ddddfd7b6d82962934493dd6e139f with sandbox anth-haiku-hermes-audit-0507, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), and Hermes API server 127.0.0.1:18642. Workflow: same Hermes onboarding/API path as Sonnet with NEMOCLAW_MODEL=claude-haiku-4-5. Evidence: /sandbox/.hermes/config.yaml, /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json, /sandbox/.hermes/sessions/request_dump_api-9816e26b83c423bc_20260507_024058_994265.json, /sandbox/.hermes/logs/agent.log, and /sandbox/.hermes/logs/errors.log. Generated config was model.provider: custom, model.base_url: "https://inference.local", no api_mode; request dump showed upstream URL https://inference.local/chat/completions. Hermes API returned HTTP 200 in about 1s with assistant text Error code: 403 - {'error': 'connection not allowed by policy'}. Tool-call count: 0; no final model summary; multi-turn not attempted. Required affordance: Hermes Anthropic Messages provider-config/transport behavior, not model-specific setup. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.
OpenClaw / Anthropic / claude-opus-4-6 — pass. Validated on 2026-05-07 UTC on maind98dd8c97d1ddddfd7b6d82962934493dd6e139f with sandbox anth-opus-openclaw-audit-0507, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider key anthropic, primary model anthropic/claude-opus-4-6, and API anthropic-messages via https://inference.local. Workflow: same OpenClaw onboarding/agent path as Sonnet with NEMOCLAW_MODEL=claude-opus-4-6; a transient OpenShell tls handshake eof during the first create was cleared by restarting the intended nemoclaw gateway and resuming onboarding. Evidence: /sandbox/.openclaw/agents/main/sessions/5795ee4f-ec16-4c6c-9c12-dcf0c0988096.trajectory.jsonl, .jsonl, /sandbox/.openclaw/agents/main/sessions/d00fc1b7-1f89-416e-bbbf-daafd363db77.trajectory.jsonl, and .jsonl; session keys were anth-opus-openclaw-oneshot-1778121149 and anth-opus-openclaw-multiturn-1778121149. One-shot recorded three structured exec calls and a final assistant summary; multi-turn turn 1 returned HOSTNAME=anth-opus-openclaw-audit-0507, and turn 2 ran echo "seen:anth-opus-openclaw-audit-0507" without re-running hostname. Tool-result correlation was correct (toolu_01QnTcTFxoYqgJUc6ZNunTMf -> toolResult, then toolu_01RcMUVDog12AhCd6BVkvJN3 -> toolResult). Latency: 36.108s model duration one-shot, 9.253s turn 1, 6.322s turn 2. Required affordance: none; registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.
Hermes / Anthropic / claude-opus-4-6 — blocked. Validated on 2026-05-07 UTC on maind98dd8c97d1ddddfd7b6d82962934493dd6e139f with sandbox anth-opus-hermes-audit-0507, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), and Hermes API server 127.0.0.1:18642. Workflow: same Hermes onboarding/API path as Sonnet with NEMOCLAW_MODEL=claude-opus-4-6. Evidence: /sandbox/.hermes/config.yaml, /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json, /sandbox/.hermes/sessions/request_dump_api-9816e26b83c423bc_20260507_024350_714854.json, /sandbox/.hermes/logs/agent.log, and /sandbox/.hermes/logs/errors.log. Generated config was model.provider: custom, model.base_url: "https://inference.local", no api_mode; request dump showed upstream URL https://inference.local/chat/completions. Hermes API returned HTTP 200 in about 2s with assistant text Error code: 403 - {'error': 'connection not allowed by policy'}. Tool-call count: 0; no final model summary; multi-turn not attempted. Required affordance: Hermes Anthropic Messages provider-config/transport behavior, not model-specific setup. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.
Extended-thinking docs matter for future Claude 4 agent work: tool use with thinking requires preserving returned thinking blocks, and Sonnet 4.6 / Opus 4.6 have interleaved-thinking behavior under Anthropic's current docs. This audit ran OpenClaw with --thinking off, so no new thinking-state preservation affordance was required for these pass rows. See https://platform.claude.com/docs/en/build-with-claude/extended-thinking.
Static NemoClaw inspection matched the runtime results: src/lib/onboard-providers.ts maps Anthropic sandbox config to provider key anthropic, primary model anthropic/<model>, base URL https://inference.local, and inferenceApi: anthropic-messages; OpenClaw consumes that route correctly. agents/hermes/generate-config.ts / agents/hermes/config/hermes-config.ts currently emit Hermes provider: custom plus base_url: https://inference.local with no Anthropic api_mode, so Hermes cannot express the native Anthropic Messages route today. This is provider-adapter/config-path follow-up work, not a per-model registry entry.
OpenClaw / NVIDIA Endpoints / minimaxai/minimax-m2.7 — pass. Validated on 2026-05-07 UTC on mainfa99a37065664f2a4c2af16a0bfc3bb4fac2d605 with local sandbox minimax-openclaw-audit-0507, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider/model inference/minimaxai/minimax-m2.7, and API openai-completions via https://inference.local/v1. Workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=minimaxai/minimax-m2.7 NEMOCLAW_PREFERRED_API=openai-completions ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name minimax-openclaw-audit-0507 --agent openclaw --fresh --recreate-sandbox, then openshell sandbox exec -n minimax-openclaw-audit-0507 --timeout 900 -- /usr/local/bin/nemoclaw-start openclaw agent --agent main --json --thinking off --session-id ... -m <standard and multi-turn prompts>. Evidence: /sandbox/.openclaw/agents/main/sessions/minimax-openclaw-oneshot-1778116787.trajectory.jsonl, /sandbox/.openclaw/agents/main/sessions/minimax-openclaw-oneshot-1778116787.jsonl, /sandbox/.openclaw/agents/main/sessions/minimax-openclaw-multiturn-1778116939.trajectory.jsonl, and /sandbox/.openclaw/agents/main/sessions/minimax-openclaw-multiturn-1778116939.jsonl. One-shot recorded finalStatus: success, timedOut: false, no prompt error, three structured exec tool calls (hostname, date, uptime), and a final assistant summary. Multi-turn reused the same OpenClaw session: turn 1 ran one exec call for hostname and returned HOSTNAME=minimax-openclaw-audit-0507; turn 2 ran one new exec call echo "seen:minimax-openclaw-audit-0507" > /tmp/seen_hostname.txt, did not re-run hostname, wrote the expected value, and summarized successfully. Latency was inside timeout: 122.074s one-shot, 81.676s turn 1, and 49.817s turn 2. Tool-call shape was structured OpenAI-compatible tool calls, not raw tool text; MiniMax thinking was present as OpenClaw thinking blocks with thinkingSignature: reasoning_content, and final assistant text was non-empty after tool results. Operational note: the CLI printed a gateway websocket 1006 close and used OpenClaw's embedded runner, but the model/provider run completed successfully and persisted the normal trajectory. Required affordance: none beyond the generic OpenClaw --thinking off/thinkingDefault: off path already used for these sandbox smoke runs; no MiniMax-specific request mutation, response parser, shell rewriter, or plugin is justified. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest because there is no concrete MiniMax-specific setup behavior to express; if a future MiniMax issue required request-body mutation such as explicit reasoning_split, registry v1 would not express that class cleanly. External docs checked: NVIDIA MiniMax M2.7 model card, NVIDIA MiniMax M2.7 infer reference, MiniMax Tool Use & Interleaved Thinking guide, MiniMax OpenAI-compatible chat docs, and MiniMax-M2 GitHub README.
Hermes / NVIDIA Endpoints / minimaxai/minimax-m2.7 — pass. Validated on 2026-05-07 UTC on mainfa99a37065664f2a4c2af16a0bfc3bb4fac2d605 with local sandbox minimax-hermes-audit-0507, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), Hermes config provider: custom, base_url: https://inference.local/v1, and model minimaxai/minimax-m2.7. Workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=minimaxai/minimax-m2.7 NEMOCLAW_PREFERRED_API=openai-completions ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name minimax-hermes-audit-0507 --agent hermes --fresh --recreate-sandbox, then Hermes' own OpenAI-compatible API inside the sandbox: POST http://127.0.0.1:18642/v1/chat/completions with model: hermes-agent for the standard shell-loop prompt and POST http://127.0.0.1:18642/v1/responses with previous_response_id for server-side multi-turn continuation. Evidence: /sandbox/.hermes/config.yaml, /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json, /sandbox/.hermes/sessions/session_efda2305-58cd-487a-aecc-aba1b0a646b3.json, /sandbox/.hermes/logs/agent.log, and /sandbox/.hermes/logs/errors.log. Chat-completions one-shot returned HTTP 200 in 29.814s, recorded three structured terminal function calls (hostname, date, uptime) with successful tool results, and returned a final assistant summary. Responses multi-turn returned HTTP 200 in 13.774s for turn 1 and 17.686s for turn 2; turn 1 stored one structured terminalhostname call and HOSTNAME=minimax-hermes-audit-0507, while turn 2 chained from resp_a88cc8e84c644fb2a918a3b037d0, made one new terminal call echo "seen:minimax-hermes-audit-0507", did not make a new hostname call in the persisted session, and summarized successfully. Tool-call shape was structured OpenAI-compatible function calling, not raw tool text; MiniMax reasoning was stored in Hermes reasoning_content fields and did not break tool-result continuation. agent.log contained non-blocking context-length autodetect warnings that defaulted the model to 128,000 tokens; errors.log contained only startup warnings about no API-server key/user allowlist. Required affordance: none; no Hermes MiniMax manifest, runtime shim, request mutation, generic parser, or shell rewrite is justified. Registry decision: refactor: add agent-scoped model setup registry #3121 v1 could express declarative Hermes compat if a concrete behavior existed, but this audit found none to record.
OpenClaw / NVIDIA Endpoints / z-ai/glm-5.1 — pass. Validated on 2026-05-07 UTC on main09b66c68384e16e828917b8d7afdbc61893cd4a4 with local sandbox glm-openclaw-audit-0507, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider/model inference/z-ai/glm-5.1, and API openai-completions via https://inference.local/v1. Workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=z-ai/glm-5.1 NEMOCLAW_PREFERRED_API=openai-completions ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name glm-openclaw-audit-0507 --agent openclaw --fresh --recreate-sandbox, then nemoclaw-start openclaw agent --agent main --json --thinking off --session-id ... -m <standard and multi-turn prompts>. Evidence: /sandbox/.openclaw/agents/main/sessions/glm-openclaw-oneshot-1778114960.trajectory.jsonl, /sandbox/.openclaw/agents/main/sessions/glm-openclaw-oneshot-1778114960.jsonl, /sandbox/.openclaw/agents/main/sessions/glm-openclaw-multiturn-1778115140.trajectory.jsonl, and /sandbox/.openclaw/agents/main/sessions/glm-openclaw-multiturn-1778115140.jsonl. One-shot recorded finalStatus: success, timedOut: false, no prompt error, three structured exec tool calls (hostname, date, uptime), and a final assistant summary; no raw tool-call text was persisted as assistant prose. Multi-turn reused the same OpenClaw session: turn 1 ran one exec call for hostname and returned HOSTNAME=glm-openclaw-audit-0507; turn 2 ran one exec call echo 'seen:glm-openclaw-audit-0507', did not re-run hostname, and summarized successfully. Latency was high but inside timeout: about 69s one-shot, about 147s turn 1, and about 104s turn 2. Required affordance: none beyond generic OpenClaw --thinking off/thinkingDefault: off behavior already used for sandbox smoke paths; no GLM-specific request mutation, plugin, shell rewriter, or manifest is justified. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest for GLM because there is no concrete GLM-specific behavior to express. External docs checked: NVIDIA GLM-5.1 model card, NVIDIA GLM-5.1 infer reference, Z.ai GLM-5.1 overview, and Z.ai thinking mode/tool-result guidance.
Hermes / NVIDIA Endpoints / z-ai/glm-5.1 — pass. Validated on 2026-05-07 UTC on main09b66c68384e16e828917b8d7afdbc61893cd4a4 with local sandbox glm-hermes-audit-0507, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), Hermes config provider: custom, base_url: https://inference.local/v1, and model z-ai/glm-5.1. Workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=z-ai/glm-5.1 NEMOCLAW_PREFERRED_API=openai-completions ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name glm-hermes-audit-0507 --agent hermes --fresh --recreate-sandbox, then Hermes own OpenAI-compatible API inside the sandbox, POST http://127.0.0.1:18642/v1/chat/completions with model: hermes-agent for the standard shell-loop prompt and POST http://127.0.0.1:18642/v1/responses with previous_response_id for server-side multi-turn continuation. Evidence: /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json, /sandbox/.hermes/sessions/session_0441870a-3401-4475-a959-0928220483d5.json, /sandbox/.hermes/logs/agent.log, and /sandbox/.hermes/logs/errors.log. Chat-completions one-shot returned HTTP 200 in 88.693816s, recorded three structured terminal function calls (hostname, date, uptime) with successful tool results, and returned a final assistant summary. Responses multi-turn returned HTTP 200 in 82.152558s for turn 1 and 65.737139s for turn 2; turn 1 stored a structured terminalhostname call and function-call output, and turn 2 chained from resp_f44117b7f73e4ac788e3429d2be4, made one new terminal call echo 'seen:glm-hermes-audit-0507', did not make a new hostname call, and summarized successfully. errors.log contained only startup warnings about no API-server key/user allowlist; no model prompt/runtime errors were observed. Required affordance: none; no Hermes GLM manifest, runtime shim, request mutation, generic parser, or shell rewrite is justified. Registry decision: refactor: add agent-scoped model setup registry #3121 v1 could express declarative Hermes compat if a concrete behavior existed, but this audit found none to record.
Hermes / NVIDIA Endpoints / moonshotai/kimi-k2.6 — pass. Validated on PR refactor: add agent-scoped model setup registry #3121 head be8c398bdaba7e1b9d86501515f5ec1ece6a4f3f using a rebuilt local Hermes sandbox (hermes-kimi-audit-0506) and Hermes own OpenAI-compatible API on 127.0.0.1:18642, not a direct inference.local curl. The acceptance prompt produced separate terminal tool calls for hostname, date, and uptime, then a final response. No Hermes Kimi manifest or runtime shim is justified from this evidence. PR evidence: refactor: add agent-scoped model setup registry #3121 (comment)
OpenClaw / NVIDIA Endpoints / deepseek-ai/deepseek-v4-pro — pass-with-affordance. Validated on PR refactor: add agent-scoped model setup registry #3121 head be8c398bdaba7e1b9d86501515f5ec1ece6a4f3f (merged into main by 97ae39d4a16472eabb81d0c2e82e36eb6a62d6e9) with local OpenClaw sandbox deepseek-openclaw-audit-0506, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider/model inference/deepseek-ai/deepseek-v4-pro, and API openai-completions via https://inference.local/v1. Workflow: node bin/nemoclaw.js onboard ... --agent openclaw, then nemoclaw-start openclaw agent --agent main --json --thinking off --session-id deepseek-openclaw-audit-1778090935 -m <standard shell prompt>. Evidence: /sandbox/.openclaw/agents/main/sessions/f7d14bbc-0312-4f5a-b1be-ca17e20a0612.trajectory.jsonl recorded finalStatus: success, timedOut: false, toolMetas for hostname, date, and uptime, and a final assistant summary. Duration: 99,016ms. Required affordance remains the existing OpenClaw startup preload request mutation that injects chat_template_kwargs.thinking = false for exact model deepseek-ai/deepseek-v4-pro; refactor: add agent-scoped model setup registry #3121 registry v1 cannot express request mutation, so no DeepSeek manifest was added.
Hermes / NVIDIA Endpoints / deepseek-ai/deepseek-v4-pro — pass. Validated on main97ae39d4a16472eabb81d0c2e82e36eb6a62d6e9 with local Hermes sandbox deepseek-hermes-audit-0506, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), config provider: custom, base_url: https://inference.local/v1, and model deepseek-ai/deepseek-v4-pro. Workflow: node bin/nemohermes.js onboard ... --agent hermes, then Hermes own API POST http://127.0.0.1:18642/v1/chat/completions with model: hermes-agent and the standard shell prompt. Evidence: /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json recorded three separate terminal tool calls (hostname, date, uptime) with successful tool results and final assistant summary; /sandbox/.hermes/logs/agent.log recorded main provider custom (deepseek-ai/deepseek-v4-pro). API returned 200, finish_reason: stop, usage 48,431 tokens. No Hermes DeepSeek affordance is justified.
OpenClaw / NVIDIA Endpoints / nvidia/nemotron-3-super-120b-a12b - pass-with-affordance. Validated on 2026-05-06 after the local OpenShell DiskPressure condition cleared, on main3477ab7da13c51749eedef1662aa4e998ae0feb2 with local sandbox nemotron-super-openclaw-audit2-0506, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider/model inference/nvidia/nemotron-3-super-120b-a12b, and API openai-completions via https://inference.local/v1. Workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=nvidia/nemotron-3-super-120b-a12b ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name nemotron-super-openclaw-audit2-0506 --agent openclaw --fresh --recreate-sandbox, then nemoclaw-start openclaw agent --agent main --json --thinking off --session-id nemotron-super-openclaw-audit2-1778103869 -m <standard shell prompt>. Evidence: /sandbox/.openclaw/agents/main/sessions/aa8473de-504f-4fe3-aaf5-554dd13042a4.trajectory.jsonl recorded finalStatus: success; the session recorded three separate exec tool calls for hostname, date, and uptime, followed by a final assistant summary. Duration: 44,400ms. Required affordance remains the existing OpenClaw startup preload request mutation that injects chat_template_kwargs.force_nonempty_content = true for Nemotron chat-completions requests; refactor: add agent-scoped model setup registry #3121 registry v1 cannot express request mutation, so no Nemotron manifest should be added yet.
Hermes / NVIDIA Endpoints / nvidia/nemotron-3-super-120b-a12b - pass. Validated on 2026-05-06 on main3477ab7da13c51749eedef1662aa4e998ae0feb2 with local Hermes sandbox nemotron-super-hermes-audit2-0506, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), provider custom, base URL https://inference.local/v1, and model nvidia/nemotron-3-super-120b-a12b. Workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=nvidia/nemotron-3-super-120b-a12b ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name nemotron-super-hermes-audit2-0506 --agent hermes --fresh --recreate-sandbox, then Hermes own OpenAI-compatible API inside the sandbox, POST http://127.0.0.1:18642/v1/chat/completions with model: hermes-agent and the standard shell prompt. Evidence: /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json recorded three actual terminal tool calls for hostname, date, and uptime; the API returned HTTP 200 in 34.680551s with a final assistant summary. No Hermes Nemotron affordance is justified by this row.
OpenClaw / NVIDIA Endpoints / nvidia/nemotron-3-nano-omni-30b-a3b-reasoning - blocked by observed OpenClaw agent behavior, not by local infrastructure. Validated on 2026-05-06 on main3477ab7da13c51749eedef1662aa4e998ae0feb2 with sandbox nemotron-omni-openclaw-audit2-0506, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider/model inference/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning, and API openai-completions. Onboard workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=nvidia/nemotron-3-nano-omni-30b-a3b-reasoning ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name nemotron-omni-openclaw-audit2-0506 --agent openclaw --fresh --recreate-sandbox. First standard-prompt run, session nemotron-omni-openclaw-audit2-1778104704, evidence /sandbox/.openclaw/agents/main/sessions/573b888d-639f-4440-956e-8f0788d176d5.trajectory.jsonl, made malformed or unsupported tool attempts around exec host selection, read /etc/hostname, and stopped without completing date or uptime. Clean retry, session nemotron-omni-openclaw-retry-1778104885, evidence /sandbox/.openclaw/agents/main/sessions/8a4cdc8a-0765-457f-a5ac-2432be5d4820.trajectory.jsonl, made three separate successful exec calls for hostname, date, and uptime with toolSummary.failures: 0 and duration 31,821ms, but the final assistant response was NO_REPLY/thinking-only instead of the requested summary. The existing force_nonempty_content request mutation is still relevant but insufficient to pass the full OpenClaw shell-loop acceptance scenario for this Omni reasoning model.
Hermes / NVIDIA Endpoints / nvidia/nemotron-3-nano-omni-30b-a3b-reasoning - pass. Validated on 2026-05-06 on main3477ab7da13c51749eedef1662aa4e998ae0feb2 with local Hermes sandbox nemotron-omni-hermes-audit2-0506, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), provider custom, base URL https://inference.local/v1, and model nvidia/nemotron-3-nano-omni-30b-a3b-reasoning. Workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=nvidia/nemotron-3-nano-omni-30b-a3b-reasoning ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name nemotron-omni-hermes-audit2-0506 --agent hermes --fresh --recreate-sandbox, then Hermes own OpenAI-compatible API inside the sandbox, POST http://127.0.0.1:18642/v1/chat/completions with model: hermes-agent and the standard shell prompt. Evidence: /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json recorded three actual terminal tool calls for hostname, date, and uptime; /sandbox/.hermes/logs/agent.log recorded main provider custom (nvidia/nemotron-3-nano-omni-30b-a3b-reasoning). The API returned HTTP 200 in 28.026483s with a final assistant summary. No Hermes Nemotron affordance is justified by this row.
OpenClaw / NVIDIA Endpoints / openai/gpt-oss-120b - degraded. Validated on 2026-05-06 on current mainca1d6b84a5c938611be412239718f1e46963d8d0 after refactor: add agent-scoped model setup registry #3121 was already merged, with local sandbox gpt-oss-openclaw-audit-0506, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider route nvidia-prod, model inference/openai/gpt-oss-120b, and API openai-completions via https://inference.local/v1 (NVIDIA Endpoints route to https://integrate.api.nvidia.com/v1). Onboard workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=openai/gpt-oss-120b ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name gpt-oss-openclaw-audit-0506 --agent openclaw --fresh --recreate-sandbox. One-shot workflow: openshell sandbox exec -n gpt-oss-openclaw-audit-0506 --timeout 900 -- /usr/local/bin/nemoclaw-start openclaw agent --agent main --json --thinking off --session-id gpt-oss-openclaw-oneshot-1778106366 -m <standard shell prompt>. Evidence: /sandbox/.openclaw/agents/main/sessions/8cc780fa-23ac-41dd-80c3-146393e39e00.trajectory.jsonl recorded finalStatus: success, timedOut: false, no promptError, finishReason: stop, and final assistant text. Tool behavior: structured OpenAI-compatible tool calls, not raw Harmony/tool text; thinking content was stored as thinking metadata, not assistant prose. Tool count: 4 exec attempts in one-shot (hostname with security: allowlist denied, then successful hostname, date, uptime with security: full), so the target commands completed but with one extra denied retry; duration 31,922ms. Multi-turn workflow used session gpt-oss-openclaw-multiturn-1778106429; evidence /sandbox/.openclaw/agents/main/sessions/7fb1d1b4-967c-45e9-b3bd-3e05a5989292.trajectory.jsonl recorded turn 1 hostname -> HOSTNAME=gpt-oss-openclaw-audit-0506 in 4,767ms, then turn 2 did not re-run hostname, made one shell call echo "seen:gpt-oss-openclaw-audit-0506" > seen.txt, and finalized successfully in 4,104ms. Required affordance: none. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest; there is no model-specific setup effect to express, and the only observed limitation is an OpenClaw tool-argument/security retry rather than request mutation, response normalization, or Harmony parsing.
Hermes / NVIDIA Endpoints / openai/gpt-oss-120b - pass. Validated on 2026-05-06 on current mainca1d6b84a5c938611be412239718f1e46963d8d0 with local sandbox gpt-oss-hermes-audit-0506, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), config provider: custom, base_url: https://inference.local/v1, and model openai/gpt-oss-120b. Onboard workflow: NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=openai/gpt-oss-120b ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name gpt-oss-hermes-audit-0506 --agent hermes --fresh --recreate-sandbox; config evidence: /sandbox/.hermes/config.yaml. API workflow: Hermes own OpenAI-compatible API, POST http://127.0.0.1:18642/v1/chat/completions with model: hermes-agent, not a direct inference.local curl. One-shot evidence: /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json, /sandbox/.hermes/logs/agent.log, /sandbox/.hermes/logs/errors.log, and /tmp/gateway.log; API returned HTTP 200 in 7.222s, finish_reason: stop, three structured terminal tool calls (hostname, date, uptime), successful tool results, and a final assistant summary. Multi-turn evidence: session api-92e4a54a694502a8 in /sandbox/.hermes/sessions/session_api-92e4a54a694502a8.json plus /sandbox/.hermes/state.db; turn 1 returned HOSTNAME=gpt-oss-hermes-audit-0506 in 2.196s, and turn 2 returned HTTP 200 in 4.011s with one structured terminal call echo "seen:gpt-oss-hermes-audit-0506" and no second hostname call. State DB recorded message_count: 8, tool_call_count: 2, model openai/gpt-oss-120b, source api_server; final response summarized the seen: output. Raw Harmony markers (<|...|>) were absent from persisted message/tool fields; reasoning content was stored separately in reasoning/reasoning_content. Latency/operability note: the first Hermes sandbox start was interrupted by local OpenShell gateway TLS/ephemeral-storage recovery and a local-image reimport, but the agent/API run passed after the sandbox reached Ready; this was local infrastructure, not model behavior. Required affordance: none. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest; no Hermes-specific setup, parser, request mutation, or response normalization is justified by this evidence.
Additional GPT-OSS setup audit evidence:
2026-05-06 source/docs audit on mainca1d6b84a5c938611be412239718f1e46963d8d0: openai/gpt-oss-120b is a curated NVIDIA Endpoints model in src/lib/inference-config.ts. The NVIDIA Endpoints provider path resolves to chat completions (openai-completions) through https://inference.local/v1 inside the sandbox, while the gateway route targets NVIDIA Endpoints. src/lib/onboard-inference-probes.ts uses the generic chat-completions probe for this model; scripts/nemoclaw-start.sh only preloads the current Nemotron/DeepSeek request mutations; agents/hermes/start.sh and agents/hermes/generate-config.ts add no GPT-OSS-specific handling; and nemoclaw-blueprint/model-specific-setup/** contains no GPT-OSS manifest.
NemoClaw runtime evidence did not show raw Harmony text from NVIDIA Endpoints for openai/gpt-oss-120b. Both OpenClaw and Hermes received structured OpenAI-compatible tool calls with separate reasoning fields and final assistant answers after tool results. Therefore, a generic Harmony parser, shell command rewriter, or refactor: add agent-scoped model setup registry #3121 v1 registry manifest should not be added for this model/provider based on this audit. If a future provider returns raw Harmony/tool text, that would be response normalization or serving-template/parser policy, not a v1 manifest effect.
OpenClaw / Gemini / gemini-3.1-pro-preview - blocked. Re-run on 2026-05-06/2026-05-07 UTC on current mainf586cc59131ec396cfcaab3b915ad76f001210ca after refactor: add agent-scoped model setup registry #3121 was merged, with OpenShell 0.0.36 and OpenClaw 2026.4.24 (cbcfdf6). Provider path: NemoClaw provider gemini, OpenShell provider gemini-api, Google OpenAI-compatible base URL https://generativelanguage.googleapis.com/v1beta/openai/, sandbox route https://inference.local/v1, model ref inference/gemini-3.1-pro-preview, API openai-completions, supportsStore: false, and OpenClaw config thinkingDefault: off. Onboard workflow: NEMOCLAW_PROVIDER=gemini NEMOCLAW_MODEL=gemini-3.1-pro-preview ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name gemini31-pro-openclaw-audit-0506b --agent openclaw --fresh --recreate-sandbox. Standard one-shot workflow: nemoclaw-start openclaw agent --agent main --json --session-id gemini31-openclaw-oneshot-1778111108 -m <standard shell prompt>. Evidence captured before a later gateway restart removed that sandbox: /sandbox/.openclaw/agents/main/sessions/16c5d6c7-5096-4ab4-80c9-494d73d74c42.jsonl and .trajectory.jsonl. Result: the model emitted structured OpenAI-compatible exec tool calls for hostname, date, and uptime and all three tool results completed, but the continuation/final-answer request after tool results returned 400 status code (no body), leaving no final assistant summary. The persisted OpenClaw session contained no thought_signature or extra_content fields. Multi-turn turn 1 in session gemini31-openclaw-multiturn-1778111208 similarly made one hostname tool call and then failed with 400 status code (no body), so turn 2 could not run. Follow-up retry on 2026-05-07 in sandbox gemini31-pro-openclaw-audit-0506c first saw stale sandbox DNS/proxy causing 503 "inference service unavailable"; after ./bin/nemoclaw.js internal dns setup-proxy nemoclaw gemini31-pro-openclaw-audit-0506c rewired DNS to 10.200.0.1:53 -> 10.42.0.17, the route was reachable again. Retry session gemini31-openclaw-retry-dnsfix-1778114081, log /tmp/gemini31-openclaw-retry-dnsfix-1778114081.log, evidence /sandbox/.openclaw/agents/main/sessions/gemini31-openclaw-retry-dnsfix-1778114081.jsonl and .trajectory.jsonl, emitted three structured exec tool calls (hostname, date, uptime) and completed all tool results, then reproduced 400 status code (no body) with an empty final assistant message; duration 32,162ms; no thought_signature or extra_content persisted. Final classification remains blocked by the observed OpenClaw Gemini 3.1 tool-result continuation/state-preservation failure, not by provider availability. Required affordance/fix class: OpenClaw/provider adapter response-history preservation for tool_calls[].extra_content.google.thought_signature or equivalent Gemini 3 function-call state handling. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest; this is response/history preservation or adapter behavior, not declarative setup.
Hermes / Gemini / gemini-3.1-pro-preview - pass. Validated on 2026-05-07 UTC on mainf586cc59131ec396cfcaab3b915ad76f001210ca with sandbox gemini31-pro-hermes-audit-0506b, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), config provider: custom, base_url: https://inference.local/v1, model gemini-3.1-pro-preview, and API server forwarded at http://127.0.0.1:8642/v1. Config evidence: /sandbox/.hermes/config.yaml; logs: /sandbox/.hermes/logs/agent.log and /sandbox/.hermes/logs/errors.log. API workflow used Hermes' own OpenAI-compatible API, not direct inference.local: POST http://127.0.0.1:8642/v1/chat/completions with model: hermes-agent. One-shot prompt returned HTTP 200 in 22.920s, finish_reason: stop, and session header X-Hermes-Session-Id: api-9816e26b83c423bc; evidence /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json recorded three structured terminal tool calls (hostname, date, uptime) plus final summary. That session persisted tool_calls[].extra_content.google.thought_signature on the first Gemini 3.1 tool call, proving Hermes preserved the Google-specific state that OpenClaw dropped. Multi-turn used the same API with explicit message history because this sandbox had no API server key configured and therefore rejects X-Hermes-Session-Id continuation; turn 1 returned HOSTNAME=gemini31-pro-hermes-audit-0506b in 7.933s, and turn 2 returned HTTP 200 in 12.486s with the same derived session header api-92e4a54a694502a8. Evidence /sandbox/.hermes/sessions/session_api-92e4a54a694502a8.json recorded one terminal call echo 'seen:gemini31-pro-hermes-audit-0506b' > seen.txt && cat seen.txt, no second hostname call, a successful tool result, and a final summary. Required affordance: none for Hermes. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.
OpenClaw / Gemini / gemini-2.5-pro - pass. Validated on 2026-05-07 UTC on mainf586cc59131ec396cfcaab3b915ad76f001210ca with live sandbox gemini25-pro-openclaw-audit-0506c, OpenShell 0.0.36, OpenClaw 2026.4.24 (cbcfdf6), provider route gemini-api, model inference/gemini-2.5-pro, API openai-completions, and thinkingDefault: off. Onboard workflow: NEMOCLAW_PROVIDER=gemini NEMOCLAW_MODEL=gemini-2.5-pro ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name gemini25-pro-openclaw-audit-0506c --agent openclaw --fresh --recreate-sandbox. One-shot workflow: nemoclaw-start openclaw agent --agent main --local --json --session-id gemini25-openclaw-oneshot-c-1778113536 -m <standard shell prompt>. Evidence: /sandbox/.openclaw/agents/main/sessions/gemini25-openclaw-oneshot-c-1778113536.jsonl and .trajectory.jsonl recorded finalStatus: success, timedOut: false, no prompt error, three separate structured exec tool calls (hostname, date, uptime), toolMetas for those commands, and a final summary; duration 26.930s. Multi-turn workflow used session gemini25-openclaw-multiturn-c-1778113583; evidence /sandbox/.openclaw/agents/main/sessions/gemini25-openclaw-multiturn-c-1778113583.jsonl and .trajectory.jsonl recorded turn 1 hostname -> HOSTNAME=gemini25-pro-openclaw-audit-0506c in 25.913s, then turn 2 made one exec call echo 'seen:gemini25-pro-openclaw-audit-0506c' > hostname.txt without re-running hostname, and returned a final summary in 28.515s. No raw function-call text, thought_signature, or extra_content fields were persisted for Gemini 2.5 in OpenClaw. Required affordance: none. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.
Hermes / Gemini / gemini-2.5-pro - pass. Validated on 2026-05-07 UTC on mainf586cc59131ec396cfcaab3b915ad76f001210ca with sandbox gemini25-pro-hermes-audit-0506b, OpenShell 0.0.36, Hermes Agent v0.11.0 (2026.4.23), config provider: custom, base_url: https://inference.local/v1, model gemini-2.5-pro, and API server forwarded at http://127.0.0.1:8642/v1. Config evidence: /sandbox/.hermes/config.yaml; logs: /sandbox/.hermes/logs/agent.log and /sandbox/.hermes/logs/errors.log. API workflow used Hermes' own OpenAI-compatible API, POST http://127.0.0.1:8642/v1/chat/completions with model: hermes-agent. One-shot returned HTTP 200 in 12.366s, finish_reason: stop, and session header api-9816e26b83c423bc; evidence /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json recorded three structured terminal tool calls (hostname, date, uptime) and a final summary. Multi-turn used explicit message history; retry evidence /sandbox/.hermes/sessions/session_api-92e4a54a694502a8.json recorded turn 1 HOSTNAME=gemini25-pro-hermes-audit-0506b, then turn 2 made one terminal call echo 'seen:gemini25-pro-hermes-audit-0506b' > hostname.log without re-running hostname, and returned a final summary. Turn latencies on the passing retry were 5.985s and 8.550s. No thought_signature or extra_content fields were observed for Gemini 2.5 Hermes sessions. Operational note: an earlier 2.5 Hermes multi-turn attempt returned HTTP 200 but wrote a typoed seen:gemini2s... command while the final text claimed the correct hostname; the immediate retry passed, and this was a model/tool-argument accuracy hiccup rather than a continuation or thought-signature failure. Required affordance: none. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.
Additional Gemini setup audit evidence:
2026-05-06/2026-05-07 source/docs audit on mainf586cc59131ec396cfcaab3b915ad76f001210ca: Gemini curated onboarding models live in src/lib/model-prompts.ts, including gemini-3.1-pro-preview and gemini-2.5-pro. src/lib/onboard-providers.ts wires Google Gemini as OpenAI-compatible provider gemini-api with GEMINI_API_KEY and https://generativelanguage.googleapis.com/v1beta/openai/. src/lib/onboard-providers.ts maps Gemini sandbox config to provider key inference, primary model inference/<model>, https://inference.local/v1, openai-completions, and inferenceCompat.supportsStore = false. src/lib/validation.ts skips the Responses API for gemini-api, and src/lib/onboard-inference-probes.ts sends Bearer auth to the OpenAI-compatible endpoint. scripts/nemoclaw-start.sh, scripts/generate-openclaw-config.py, agents/hermes/generate-config.ts, and agents/hermes/start.sh currently add no Gemini-specific thought-signature handling. No Gemini manifests exist under nemoclaw-blueprint/model-specific-setup/**.
Credential/runtime state during this audit: the first pass had no Gemini key, the second pass reached Google validation but was quota-blocked, and this re-run used a funded personal Gemini key that passed onboarding validation. A later OpenClaw 3.1 recreate initially saw 503 "inference service unavailable" from https://inference.local/v1/chat/completions; this was cleared by refreshing the sandbox DNS/proxy with ./bin/nemoclaw.js internal dns setup-proxy, after which the route was reachable and the same post-tool 400 status code (no body) reproduced. Gemini 2.5 remained available during the final OpenClaw and Hermes runs.
External source notes: Google documents Gemini OpenAI compatibility with base URL https://generativelanguage.googleapis.com/v1beta/openai/, Bearer GEMINI_API_KEY, and /chat/completions function calling. Google thought-signature docs state that thinking models in the Gemini 3 and 2.5 series may return thought signatures, that signatures should be passed back exactly in conversation history, and that Gemini 3 models require thought signatures during function calling or a 4xx validation error can result. For OpenAI-compatible chat completions, Google represents signatures under tool_calls[].extra_content.google.thought_signature; Gemini 3 requires the first function call signature to be returned, while Gemini 2.5 signature return is documented as optional for function calls. Docs inspected: https://ai.google.dev/gemini-api/docs/openai, https://ai.google.dev/gemini-api/docs/function-calling, https://ai.google.dev/gemini-api/docs/thought-signatures, and https://ai.google.dev/gemini-api/docs/thinking.
Agent-surface conclusion: Gemini 3.1 Pro is the concrete thought-signature risk. Hermes preserved extra_content.google.thought_signature and passed; OpenClaw did not persist thought_signature/extra_content and failed the post-tool continuation with Google-route 400 status code (no body); a later stale-DNS/proxy 503 recreate was cleared and the same post-tool 400 reproduced. Gemini 2.5 Pro passed both OpenClaw and Hermes without a model-specific affordance. If OpenClaw Gemini 3.1 is fixed later, the fix should be scoped to Google/Gemini OpenAI-compatible tool-call state preservation or agent adapter behavior. refactor: add agent-scoped model setup registry #3121 registry v1 cannot express that class cleanly, so no manifest should be added based on this audit.
Additional Nemotron setup audit evidence:
2026-05-06 architecture audit on main3477ab7da13c51749eedef1662aa4e998ae0feb2: current OpenClaw behavior remains a runtime request mutation in scripts/nemoclaw-start.sh, not a refactor: add agent-scoped model setup registry #3121 v1 manifest. The preload wraps Node HTTP(S) POST /v1/chat/completions calls and injects chat_template_kwargs.force_nonempty_content = true for model IDs matching /nemotron/i. Registry v1 can match exact route metadata and apply config/plugin effects, but it cannot express request-body mutations, so no Nemotron registry manifest should be added yet. Future registry support should model request mutations as an explicit OpenClaw-owned effect and should prefer exact supported IDs or provider-class policy once the provider boundary is proven.
Current onboarding-supported Nemotron IDs affected by the OpenClaw preload are the NVIDIA Endpoints dropdown models nvidia/nemotron-3-super-120b-a12b and nvidia/nemotron-3-nano-omni-30b-a3b-reasoning, the managed local vLLM Linux profile nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8, and local/custom routes whose selected model ID contains nemotron, including the Ollama default nemotron-3-nano:30b when OpenClaw sends chat-completions for that model. The broad regex is not ideal long-term, but narrowing it now would drop documented compatible-endpoint/NIM/vLLM Nemotron routes and the historical nvidia/llama-3.3-nemotron-super-49b-v1 failure family without a replacement request-mutation registry capability.
Direct NVIDIA endpoint probes during this audit show the affordance is still relevant for at least one supported model shape: nvidia/nemotron-3-nano-omni-30b-a3b-reasoning returned HTTP 200 with content: null and reasoning-only output without force_nonempty_content, then returned non-null content with chat_template_kwargs.force_nonempty_content = true. nvidia/nemotron-3-super-120b-a12b returned HTTP 200 with non-empty content in this simple raw probe, but historical issues openclaw agent returns empty content when model makes tool calls instead of text responses #1193 and Nemotron-3-Super:120b and nemoclaw stalls due to OpenClaw's interpretation of end-of-turn #2051 plus the OpenClaw tool-bearing request shape still justify retaining the mutation until a full agent run proves it obsolete.
Hermes does not load this Node preload and currently has no Nemotron-specific manifest or runtime shim. Based on code architecture, Hermes passes through its custom chat-completions provider path without this affordance; no Hermes-specific Nemotron handling is justified until Hermes runtime evidence shows a failure.
Local Ollama was not runtime-tested here: this host only had qwen2.5:0.5b and nemotron-mini:latest installed, not nemotron-3-nano:30b, and no NVIDIA GPU was available for local vLLM. Code inspection shows OpenClaw would currently send the extra chat_template_kwargs field for Ollama model IDs containing nemotron; future exact manifests/request-mutation metadata should exclude Ollama unless a local Ollama run proves it both accepts and needs the field.
Additional DeepSeek follow-up evidence:
2026-05-06 multi-turn continuation attempt — blocked by NVIDIA Endpoints rate limiting, not by observed agent/session request-shape behavior. OpenClaw sandbox deepseek-openclaw-audit-0506 on main97ae39d4a16472eabb81d0c2e82e36eb6a62d6e9 completed turn 1 in persistent session deepseek-openclaw-multiturn-1778091696 with one exec tool call (hostname) and HOSTNAME=deepseek-openclaw-audit-0506; turn 2 and retry both failed before any model output/tool call with provider 429 status code (no body). Evidence: /sandbox/.openclaw/agents/main/sessions/21f437f3-5c42-4b49-b2d1-08d3def4b6b2.trajectory.jsonl, /tmp/gateway.log, /tmp/nemoclaw-start.log. Hermes sandbox deepseek-hermes-audit-0506 retried the same multi-turn shape through Hermes own API with conversation deepseek-hermes-multiturn-1778092277; turn 1 failed before terminal tool use with HTTP 429: Too Many Requests after 3 retries. Evidence: /sandbox/.hermes/sessions/request_dump_api-57283730231debee_20260506_183130_753804.json, /sandbox/.hermes/logs/agent.log, /sandbox/.hermes/logs/errors.log. Re-run needed after endpoint quota resets to prove multi-turn continuation.
2026-05-06 19:19-19:26 UTC retry — OpenClaw multi-turn continuation pass; Hermes multi-turn still blocked by NVIDIA Endpoints 429. Readiness first cleared at 2026-05-06T19:19:49Z with raw POST https://inference.local/v1/chat/completions returning HTTP 200, nvcf-status: fulfilled, and content OK. OpenClaw sandbox deepseek-openclaw-audit-0506 then completed persistent session deepseek-openclaw-multiturn-pass-1778095200: turn 1 made one exec tool call for hostname and returned HOSTNAME=deepseek-openclaw-audit-0506; turn 2 reused that hostname without re-running hostname, made one exec tool call printf 'seen:deepseek-openclaw-audit-0506\n', and finalized successfully. Evidence: /sandbox/.openclaw/agents/main/sessions/550bbc05-d91d-4c54-a127-25a61f9c24e3.jsonl and .trajectory.jsonl (finalStatus: success, toolMetas contains the printf command). Hermes sandbox deepseek-hermes-audit-0506 turn 1 through Hermes own API returned HOSTNAME=deepseek-hermes-audit-0506; turn 2 and a retry failed before final continuation with HTTP 429: Too Many Requests, and raw one-token readiness was also back to HTTP 429 at 2026-05-06T19:26:05Z. Evidence: /sandbox/.hermes/sessions/session_api-755da3cf323317c1.json, /sandbox/.hermes/sessions/request_dump_api-f14936ec1fec0f20_20260506_192147_850678.json, /sandbox/.hermes/logs/agent.log, /sandbox/.hermes/logs/errors.log.
Repo state: current main at 3351fbdd4eb7d9b80ec471545083956327da2b10; PR refactor: add agent-scoped model setup registry #3121 is merged, so current main was used. Checkout was clean on main...origin/main. Identity/signing audit before any commit path: global user.name Aaron Erickson, user.email aerickson@nvidia.com, SSH signing key configured, commit.gpgsign true. No repo code changes were made.
Common workflows: OpenClaw used ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --agent openclaw --fresh --recreate-sandbox with NEMOCLAW_PROVIDER=openai, then openshell sandbox exec ... /usr/local/bin/nemoclaw-start openclaw agent --agent main --json --thinking off --session-id <id> -m <prompt>. Hermes used ./bin/nemohermes.js onboard ... --agent hermes, then Hermes' own local API at http://127.0.0.1:18642: /v1/chat/completions with model: hermes-agent for the one-shot prompt, and /v1/responses + previous_response_id for same-conversation continuation.
OpenClaw selected NemoClaw/OpenShell route openai-api -> sandbox provider openai/<model> via https://inference.local/v1, with generated openclaw.jsonapi: openai-responses for all four models. Hermes generated model.provider: custom, model.base_url: https://inference.local/v1, and model.default: <model>; Hermes did not receive an OpenAI-specific provider/API-mode config from NemoClaw.
Infra notes: OpenShell gateway had intermittent tls handshake eof and later k3s disk-pressure/image-pull recovery during sandbox creation; those were resolved by gateway restart and pruning stale generated sandbox images. They are not counted as model failures. OpenClaw mini had one denied allowlist attempt before retrying hostname with full exec in the multi-turn first turn; it still completed correctly and is not a model-specific blocker.
Sandbox oa54-openclaw-audit-0507; one-shot /sandbox/.openclaw/agents/main/sessions/92728641-7ca3-4898-bae5-6620d5e2c1eb.trajectory.jsonl; multi /sandbox/.openclaw/agents/main/sessions/0a61dbbb-217f-4e23-8376-328e964fa07c.trajectory.jsonl
Structured exec calls, not raw tool text. One-shot issued 3 shell calls (hostname, date, uptime) and produced final summary. Observed one-shot duration about 33.1s from CLI output.
Turn 1 ran hostname and replied HOSTNAME=oa54-openclaw-audit-0507; turn 2 did not re-run hostname, ran a shell command for seen:oa54-openclaw-audit-0507, then summarized.
No OpenAI-model affordance needed. No registry manifest.
Hermes
OpenAI / gpt-5.4
Hermes local /v1/chat/completions and /v1/responses; upstream config custom -> https://inference.local/v1, model gpt-5.4
pass
Sandbox oa54-hermes-audit-0507; one-shot /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json; multi /sandbox/.hermes/sessions/session_44c94f31-940a-4887-919d-40f01d0328ad.json; logs /sandbox/.hermes/logs/agent.log, /sandbox/.hermes/logs/errors.log
Structured Hermes terminal calls. One-shot issued 3 terminal calls (hostname, date, uptime) and returned a final assistant summary. Latency about 12.1s.
/v1/responses continuation issued 2 terminal calls total: hostname, then printf ... seen:oa54-hermes-audit-0507; no hostname re-run in turn 2; final summary present.
No model-specific affordance. Hermes chat session header continuation requires API_SERVER_KEY, but Responses continuation works. No registry manifest.
Sandbox oa54mini-openclaw-audit-0507; one-shot /sandbox/.openclaw/agents/main/sessions/7e0d64e2-89df-41c4-8100-c884f95c159f.trajectory.jsonl; multi /sandbox/.openclaw/agents/main/sessions/46776da6-ee31-4513-9091-bc6dd9d6ebe0.trajectory.jsonl
Structured exec calls. One-shot issued 3 shell calls and summarized hostname/date/uptime. Runtime about 6.3s.
Turn 1 first tried restricted hostname and got allowlist denial, retried full hostname, then turn 2 ran printf 'seen:%s\n' 'oa54mini-openclaw-audit-0507' without re-running hostname; final summary present.
No model-specific affordance; allowlist retry is OpenClaw policy behavior. No registry manifest.
Hermes
OpenAI / gpt-5.4-mini
Hermes local /v1/chat/completions and /v1/responses; upstream config custom -> https://inference.local/v1, model gpt-5.4-mini
pass
Sandbox oa54mini-hermes-audit-0507; one-shot /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json; multi /sandbox/.hermes/sessions/session_122350b7-f803-4776-9a68-947ee6d78231.json; logs /sandbox/.hermes/logs/agent.log, /sandbox/.hermes/logs/errors.log
One-shot made 3 terminal calls plus one skill_view prelude, then final summary. Latency about 9.1s.
Multi-turn made hostname, then wrote seen:oa54mini-hermes-audit-0507 to /tmp/seen_hostname.txt without re-running hostname; final summary present.
No model-specific affordance. No registry manifest.
Sandbox oa54nano-openclaw-audit-0507; one-shot /sandbox/.openclaw/agents/main/sessions/303e744e-7934-4a13-99cd-cc59dab18478.trajectory.jsonl; multi /sandbox/.openclaw/agents/main/sessions/3b89aa65-e644-4c15-bc49-ec1dc0a2adf3.trajectory.jsonl
Structured exec calls. One-shot issued 3 shell calls and summarized hostname/date/uptime. Runtime about 6.9s.
Turn 1 ran hostname; turn 2 ran echo seen:oa54nano-openclaw-audit-0507 without re-running hostname; final summary present.
No model-specific affordance. No registry manifest.
Hermes
OpenAI / gpt-5.4-nano
Hermes local /v1/chat/completions and /v1/responses; upstream config custom -> https://inference.local/v1, model gpt-5.4-nano
pass
Sandbox oa54nano-hermes-audit-0507; one-shot /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json; multi /sandbox/.hermes/sessions/session_1769f4fb-5ef0-4bed-ba6b-47d57c6d3366.json; logs /sandbox/.hermes/logs/agent.log, /sandbox/.hermes/logs/errors.log
Structured terminal calls. One-shot issued 3 terminal calls and summarized hostname/date/uptime. Latency about 8.3s.
Multi-turn made 2 terminal calls total: hostname, then printf 'seen:%s\n' 'oa54nano-hermes-audit-0507'; no hostname re-run in turn 2; final summary present.
No model-specific affordance. No registry manifest.
Sandbox oa54pro-openclaw-audit-0507; one-shot /sandbox/.openclaw/agents/main/sessions/73b6a9aa-80b4-4991-91fa-7ce77c56e880.trajectory.jsonl; multi turn-1 /sandbox/.openclaw/agents/main/sessions/fd0ac825-0b24-4280-8245-b3c2e6fe45ad.trajectory.jsonl
The model emitted a structured exec hostname tool call, not raw text, but after the tool result OpenAI returned 404 Item with id ... not found. Items are not persisted when store is set to false. One-shot stopped after 1 tool call; no date, no uptime, no final assistant summary. CLI returned nonzero; latency about 44.2s.
Turn 1 of continuation failed the same way after one structured hostname call, so turn 2 was not meaningful.
Needs OpenAI Responses statefulness/stateless-reasoning affordance in the OpenClaw OpenAI transport/agent adapter: e.g. store:true when carrying response item ids, or encrypted reasoning items when using store:false. This is provider transport/request-response behavior, not a JSON setup manifest. #3121 registry v1 cannot express it.
Hermes
OpenAI / gpt-5.4-pro-2026-03-05
Hermes local API accepted requests, but upstream custom path called https://inference.local/v1/chat/completions with model gpt-5.4-pro-2026-03-05
blocked
Sandbox oa54pro-hermes-audit-0507; one-shot /sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json; multi /sandbox/.hermes/sessions/session_fbb5c5f1-3ee5-4929-85b1-07ca0bd0f953.json; request dumps under /sandbox/.hermes/sessions/request_dump_*042156*.json, *042204*.json, *042215*.json; logs /sandbox/.hermes/logs/agent.log, /sandbox/.hermes/logs/errors.log
No tool calls. Hermes returned an assistant error string after upstream retry exhaustion: HTTP 404: This is not a chat model and thus not supported in the v1/chat/completions endpoint. One-shot local API status was 200 but semantic result is blocked. Latency about 9.8s.
/v1/responses continuation also failed before any tool call because the Hermes agent's upstream model call still used /v1/chat/completions; both turns returned the same 404 error text.
Needs Hermes provider/API-mode selection for OpenAI Responses when the model is Responses-only. This mirrors the Anthropic Hermes config-path class (custom against https://inference.local) but the failure mode differs: OpenAI pro returns model/endpoint 404 rather than provider-policy 403. This is provider/agent config work, not a per-model setup manifest. #3121 registry v1 cannot express it.
Overall OpenAI verdict: gpt-5.4, gpt-5.4-mini, and gpt-5.4-nano are valid curated agent models for both OpenClaw and Hermes with no model-specific registry affordance. gpt-5.4-pro-2026-03-05 is blocked on both surfaces for different OpenAI Responses integration reasons: OpenClaw reaches Responses and tool calls but mishandles reasoning/state continuation with store:false; Hermes routes through a custom chat-completions upstream path that the pro snapshot rejects. No code change was made in this audit; if fixed later, the work should be scoped to provider transport/API mode or agent adapter behavior, not registry v1 manifests.
OpenClaw: pass-with-affordance; exact-model runtime request mutation in scripts/nemoclaw-start.sh injects chat_template_kwargs.thinking = false for /v1/chat/completions.
Hermes: pass; custom chat-completions path works with no DeepSeek-specific Hermes manifest or runtime shim.
Registry decision: do not add a manifest under refactor: add agent-scoped model setup registry #3121 v1. The live behavior is request mutation plus onboarding validation policy, while registry v1 only expresses agent config/plugin effects. Move the OpenClaw mutation later only after a declarative, OpenClaw-owned request-mutation effect exists; keep onboarding timeout/streaming policy in src/lib/onboard-inference-probes.ts or adjacent validation-policy metadata.
Nemotron-family models
Current OpenClaw runtime preload injects chat_template_kwargs.force_nonempty_content = true for model IDs matching nemotron.
Registry decision for refactor: add agent-scoped model setup registry #3121 v1: do not add a manifest yet. The behavior is a request-body mutation, while v1 manifests only express route matching plus config/plugin effects. Move later only after request-mutation support exists.
Runtime agent validation after the gateway recovered: OpenClaw Super is pass-with-affordance, Hermes Super is pass, Hermes Omni is pass, and OpenClaw Omni is blocked by model/runtime response behavior (NO_REPLY/thinking-only final after tool results), not by infrastructure.
Reasoning/tool-capable OSS model behind an OpenAI-compatible route.
Gemini Pro models through Google's OpenAI-compatible endpoint
Re-run with funded Gemini key on 2026-05-06/2026-05-07 UTC. Gemini 3.1 Pro: OpenClaw blocked by tool-result continuation/state handling (400 status code (no body) after structured tool calls; stale-DNS/proxy 503 cleared on retry), Hermes pass with extra_content.google.thought_signature preserved. Gemini 2.5 Pro: OpenClaw pass, Hermes pass.
z-ai/glm-5.1
Audited on 2026-05-07 UTC for OpenClaw and Hermes through NVIDIA Endpoints; no GLM-specific setup behavior justified.
minimaxai/minimax-m2.7
Discovery required before adding any setup behavior.
Provider-class policy, not necessarily model-specific setup
Other OpenAI-compatible endpoints defaulting to /v1/chat/completions
Local vLLM forcing /v1/chat/completions
Local NIM forcing /v1/chat/completions
Local Ollama forcing /v1/chat/completions
Local Ollama tools capability gate
Anthropic-compatible provider adapter behavior
Required audit scenarios
Each model/provider/agent combination should be classified with evidence from a repeatable scenario set.
Baseline chat
Simple deterministic response works.
Provider validation succeeds or fails with a clear actionable reason.
No provider credentials leak into sandbox-visible files, logs, or prompts.
Shell tool loop
Use a standard prompt such as:
Run hostname, then run date, then run uptime. Use separate shell tool calls for each command, and after the tool results, summarize what you saw.
Required checks:
Structured tool calls are emitted.
Tool calls are persisted in the trajectory with non-empty metadata.
The expected tool-call count is visible.
Tool-call IDs/names correlate with tool results.
No raw function-call text is persisted as assistant prose.
No combined hostname; date; uptime command appears unless that is the explicit expected behavior for the test.
No promptError.
No empty assistant stop.
No reasoning-only stop.
A final assistant response appears after all tool results.
Multi-turn continuation
The model can use a tool result from turn 1 to decide a dependent tool call in turn 2.
Reasoning/thinking state, if present, does not break the next tool turn.
The model does not ask the user to continue after a complete tool result.
Sub-agent delegation
Primary agent can decide to delegate.
sessions_spawn request is structured correctly.
Sub-agent receives the intended task, model config, and workspace path.
Sub-agent can use tools if the role requires tools.
Primary agent can consume the sub-agent result and continue.
Hermes path
Hermes sandbox starts with the selected provider/model.
Hermes OpenAI-compatible API returns the expected response shape.
Tool/capability expectations are explicit for Hermes, even if Hermes does not exercise the same OpenClaw tool stack.
Failures are separated from OpenClaw-only request-shape issues.
Performance and operability
Track at least:
Validation time
Time to first token or first streamed event when available
Total scenario duration
Retry behavior
Timeout budget used
Whether the model needs streaming to be reliable
Whether the model needs a model-specific request mutation
Whether the model needs provider-specific API path forcing
Whether cold-start behavior differs from warm behavior
Result states
Every row in the audit matrix should end in one of these states:
pass: works without model-specific changes
pass-with-affordance: works with a documented model/provider affordance
degraded: usable but has documented limitations or performance concerns
Purpose
This issue is the source of truth for the model performance and capability audit across the models and agent surfaces NemoClaw supports.
The goal is not just to confirm that each model can answer a one-shot prompt. The goal is to verify that each supported model works well as an agent model in the NemoClaw/OpenShell environment: tool calls, shell execution, multi-turn tool-result continuation, sub-agent delegation where applicable, and the provider-specific response shapes that our agents consume.
This is related to, but separate from, #3120:
Background
PR #3046 fixed a concrete Kimi K2.6/OpenClaw incompatibility where
moonshotai/kimi-k2.6could emit a combined shell command such ashostname; date; uptimeas oneexeccall. OpenClaw needs separate tool-call boundaries for persistence, replay, and tool-result correlation. The individual Kimi issue is closed as fixed in #2620.That fix exposed the broader product requirement: every model exposed through onboarding should be validated as an agent model, not merely as a chat model. Some models need model-aware or provider-aware affordances to work correctly in shell-agent loops. Those affordances must be discovered, documented, tested, and either captured in the model-specific setup registry proposed by #3120 or classified as provider-class transport policy.
Initial audit artifact:
model-affordance-audit.mdgenerated frommainatf5b8144d577ccd680875291d33eaabb656509d5aAgent surfaces in scope
Audit the model behavior against the agent surfaces NemoClaw currently supports:
mainagent through the default NemoClaw sandbox pathsessions_spawn/agents.listMessaging integrations are not separate model-capability targets unless the message channel changes model routing or response handling. The core model audit should run at the agent/runtime boundary first.
Supported model inventory to audit
NVIDIA Endpoints
nvidia/nemotron-3-super-120b-a12bnvidia/nemotron-3-nano-omni-30b-a3b-reasoningz-ai/glm-5.1minimaxai/minimax-m2.7moonshotai/kimi-k2.6openai/gpt-oss-120bdeepseek-ai/deepseek-v4-proOpenAI
gpt-5.4gpt-5.4-minigpt-5.4-nanogpt-5.4-pro-2026-03-05Anthropic
claude-sonnet-4-6claude-haiku-4-5claude-opus-4-6Gemini
gemini-3.1-pro-previewgemini-3.1-flash-lite-previewgemini-3-flash-previewgemini-2.5-progemini-2.5-flashgemini-2.5-flash-liteLocal and experimental providers
qwen2.5:7bnemotron-3-nano:30bwhen hardware permitstoolscapabilityQwen/Qwen3.6-27B-FP8nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8Audit results
Completed rows:
claude-sonnet-4-6—pass. Validated on 2026-05-07 UTC onmaind98dd8c97d1ddddfd7b6d82962934493dd6e139fwith local sandboxanth-sonnet-openclaw-audit-0507, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider keyanthropic, primary modelanthropic/claude-sonnet-4-6, and APIanthropic-messagesviahttps://inference.local. Workflow:ANTHROPIC_API_KEY=<redacted> NEMOCLAW_PROVIDER=anthropic NEMOCLAW_MODEL=claude-sonnet-4-6 ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name anth-sonnet-openclaw-audit-0507 --agent openclaw --fresh --recreate-sandbox, thenopenshell sandbox exec -n anth-sonnet-openclaw-audit-0507 --timeout 900 -- /usr/local/bin/nemoclaw-start openclaw agent --agent main --json --thinking off --session-id ... -m <standard and multi-turn prompts>. Evidence:/sandbox/.openclaw/agents/main/sessions/anth-sonnet-openclaw-oneshot-1778118974.trajectory.jsonl,.jsonl,/sandbox/.openclaw/agents/main/sessions/anth-sonnet-openclaw-multiturn-1778119436.trajectory.jsonl, and.jsonl. One-shot recordedfinalStatus: success,timedOut: false, no prompt error, three structuredexeccalls (hostname,date,uptime), correlated Anthropictool_useIDs totoolResultentries, and a final assistant summary. Multi-turn reused the same OpenClaw session: turn 1 returnedHOSTNAME=anth-sonnet-openclaw-audit-0507; turn 2 ranecho "seen:anth-sonnet-openclaw-audit-0507"without re-runninghostnameand summarized. Latency:43.899smodel duration one-shot,38.971sturn 1,47.677sturn 2. Required affordance: none; registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.claude-sonnet-4-6—blocked. Validated on 2026-05-07 UTC onmaind98dd8c97d1ddddfd7b6d82962934493dd6e139fwith sandboxanth-sonnet-hermes-audit-0507, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), and Hermes API server127.0.0.1:18642. Workflow:ANTHROPIC_API_KEY=<redacted> NEMOCLAW_PROVIDER=anthropic NEMOCLAW_MODEL=claude-sonnet-4-6 ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name anth-sonnet-hermes-audit-0507 --agent hermes --fresh --recreate-sandbox, then Hermes' own APIPOST http://127.0.0.1:18642/v1/chat/completionswithmodel: hermes-agentand the standard shell-loop prompt. Evidence:/sandbox/.hermes/config.yaml,/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json,/sandbox/.hermes/sessions/request_dump_api-9816e26b83c423bc_20260507_023806_692889.json,/sandbox/.hermes/logs/agent.log, and/sandbox/.hermes/logs/errors.log. NemoClaw generatedmodel.provider: custom,model.base_url: "https://inference.local", and noapi_mode; Hermes therefore calledhttps://inference.local/chat/completions. The API returned HTTP 200 in about2swith assistant textError code: 403 - {'error': 'connection not allowed by policy'}. Tool-call count:0; no final model summary; multi-turn not attempted because one-shot fails before tool use. Required affordance: Hermes provider-config/transport behavior for Anthropic Messages, not model-specific setup. Registry decision: refactor: add agent-scoped model setup registry #3121 v1 cannot express this runtime API-mode/provider transport fix cleanly; no manifest.claude-haiku-4-5—pass. Validated on 2026-05-07 UTC onmaind98dd8c97d1ddddfd7b6d82962934493dd6e139fwith sandboxanth-haiku-openclaw-audit-0507, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider keyanthropic, primary modelanthropic/claude-haiku-4-5, and APIanthropic-messagesviahttps://inference.local. Workflow:ANTHROPIC_API_KEY=<redacted> NEMOCLAW_PROVIDER=anthropic NEMOCLAW_MODEL=claude-haiku-4-5 ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name anth-haiku-openclaw-audit-0507 --agent openclaw --fresh --recreate-sandbox, thennemoclaw-start openclaw agent --agent main --json --thinking off --session-id ... -m <standard and multi-turn prompts>. Evidence:/sandbox/.openclaw/agents/main/sessions/anth-haiku-openclaw-oneshot-1778120014.trajectory.jsonl,.jsonl,/sandbox/.openclaw/agents/main/sessions/anth-haiku-openclaw-multiturn-1778120085.trajectory.jsonl, and.jsonl. One-shot recorded three structuredexeccalls and a final assistant summary. Multi-turn turn 1 returnedHOSTNAME=anth-haiku-openclaw-audit-0507; turn 2 ranecho "seen:anth-haiku-openclaw-audit-0507", did not re-runhostname, and summarized. Latency:39.982smodel duration one-shot,43.531sturn 1,39.409sturn 2. Tool/result correlation used native Anthropictool_useIDs mapped to OpenClawtoolResultentries; no prompt error or timeout observed. Required affordance: none; registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.claude-haiku-4-5—blocked. Validated on 2026-05-07 UTC onmaind98dd8c97d1ddddfd7b6d82962934493dd6e139fwith sandboxanth-haiku-hermes-audit-0507, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), and Hermes API server127.0.0.1:18642. Workflow: same Hermes onboarding/API path as Sonnet withNEMOCLAW_MODEL=claude-haiku-4-5. Evidence:/sandbox/.hermes/config.yaml,/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json,/sandbox/.hermes/sessions/request_dump_api-9816e26b83c423bc_20260507_024058_994265.json,/sandbox/.hermes/logs/agent.log, and/sandbox/.hermes/logs/errors.log. Generated config wasmodel.provider: custom,model.base_url: "https://inference.local", noapi_mode; request dump showed upstream URLhttps://inference.local/chat/completions. Hermes API returned HTTP 200 in about1swith assistant textError code: 403 - {'error': 'connection not allowed by policy'}. Tool-call count:0; no final model summary; multi-turn not attempted. Required affordance: Hermes Anthropic Messages provider-config/transport behavior, not model-specific setup. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.claude-opus-4-6—pass. Validated on 2026-05-07 UTC onmaind98dd8c97d1ddddfd7b6d82962934493dd6e139fwith sandboxanth-opus-openclaw-audit-0507, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider keyanthropic, primary modelanthropic/claude-opus-4-6, and APIanthropic-messagesviahttps://inference.local. Workflow: same OpenClaw onboarding/agent path as Sonnet withNEMOCLAW_MODEL=claude-opus-4-6; a transient OpenShelltls handshake eofduring the first create was cleared by restarting the intendednemoclawgateway and resuming onboarding. Evidence:/sandbox/.openclaw/agents/main/sessions/5795ee4f-ec16-4c6c-9c12-dcf0c0988096.trajectory.jsonl,.jsonl,/sandbox/.openclaw/agents/main/sessions/d00fc1b7-1f89-416e-bbbf-daafd363db77.trajectory.jsonl, and.jsonl; session keys wereanth-opus-openclaw-oneshot-1778121149andanth-opus-openclaw-multiturn-1778121149. One-shot recorded three structuredexeccalls and a final assistant summary; multi-turn turn 1 returnedHOSTNAME=anth-opus-openclaw-audit-0507, and turn 2 ranecho "seen:anth-opus-openclaw-audit-0507"without re-runninghostname. Tool-result correlation was correct (toolu_01QnTcTFxoYqgJUc6ZNunTMf->toolResult, thentoolu_01RcMUVDog12AhCd6BVkvJN3->toolResult). Latency:36.108smodel duration one-shot,9.253sturn 1,6.322sturn 2. Required affordance: none; registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.claude-opus-4-6—blocked. Validated on 2026-05-07 UTC onmaind98dd8c97d1ddddfd7b6d82962934493dd6e139fwith sandboxanth-opus-hermes-audit-0507, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), and Hermes API server127.0.0.1:18642. Workflow: same Hermes onboarding/API path as Sonnet withNEMOCLAW_MODEL=claude-opus-4-6. Evidence:/sandbox/.hermes/config.yaml,/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json,/sandbox/.hermes/sessions/request_dump_api-9816e26b83c423bc_20260507_024350_714854.json,/sandbox/.hermes/logs/agent.log, and/sandbox/.hermes/logs/errors.log. Generated config wasmodel.provider: custom,model.base_url: "https://inference.local", noapi_mode; request dump showed upstream URLhttps://inference.local/chat/completions. Hermes API returned HTTP 200 in about2swith assistant textError code: 403 - {'error': 'connection not allowed by policy'}. Tool-call count:0; no final model summary; multi-turn not attempted. Required affordance: Hermes Anthropic Messages provider-config/transport behavior, not model-specific setup. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.Additional Anthropic setup audit evidence:
Anthropic direct API preflight succeeded for all three curated IDs with the supplied temporary key; the agent-surface failures above are therefore NemoClaw/Hermes routing behavior, not unavailable models. Anthropic's tool-use docs require clients to parse
tool_useblocks and returntool_resultblocks whosetool_use_idmatches the original tool-useid, with tool-result blocks immediately following the assistant tool-use message. See https://platform.claude.com/docs/en/agents-and-tools/tool-use/handle-tool-calls and https://platform.claude.com/docs/en/agents-and-tools/tool-use/define-tools.Extended-thinking docs matter for future Claude 4 agent work: tool use with thinking requires preserving returned thinking blocks, and Sonnet 4.6 / Opus 4.6 have interleaved-thinking behavior under Anthropic's current docs. This audit ran OpenClaw with
--thinking off, so no new thinking-state preservation affordance was required for these pass rows. See https://platform.claude.com/docs/en/build-with-claude/extended-thinking.Static NemoClaw inspection matched the runtime results:
src/lib/onboard-providers.tsmaps Anthropic sandbox config to provider keyanthropic, primary modelanthropic/<model>, base URLhttps://inference.local, andinferenceApi: anthropic-messages; OpenClaw consumes that route correctly.agents/hermes/generate-config.ts/agents/hermes/config/hermes-config.tscurrently emit Hermesprovider: customplusbase_url: https://inference.localwith no Anthropicapi_mode, so Hermes cannot express the native Anthropic Messages route today. This is provider-adapter/config-path follow-up work, not a per-model registry entry.OpenClaw / NVIDIA Endpoints /
minimaxai/minimax-m2.7—pass. Validated on 2026-05-07 UTC onmainfa99a37065664f2a4c2af16a0bfc3bb4fac2d605with local sandboxminimax-openclaw-audit-0507, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider/modelinference/minimaxai/minimax-m2.7, and APIopenai-completionsviahttps://inference.local/v1. Workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=minimaxai/minimax-m2.7 NEMOCLAW_PREFERRED_API=openai-completions ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name minimax-openclaw-audit-0507 --agent openclaw --fresh --recreate-sandbox, thenopenshell sandbox exec -n minimax-openclaw-audit-0507 --timeout 900 -- /usr/local/bin/nemoclaw-start openclaw agent --agent main --json --thinking off --session-id ... -m <standard and multi-turn prompts>. Evidence:/sandbox/.openclaw/agents/main/sessions/minimax-openclaw-oneshot-1778116787.trajectory.jsonl,/sandbox/.openclaw/agents/main/sessions/minimax-openclaw-oneshot-1778116787.jsonl,/sandbox/.openclaw/agents/main/sessions/minimax-openclaw-multiturn-1778116939.trajectory.jsonl, and/sandbox/.openclaw/agents/main/sessions/minimax-openclaw-multiturn-1778116939.jsonl. One-shot recordedfinalStatus: success,timedOut: false, no prompt error, three structuredexectool calls (hostname,date,uptime), and a final assistant summary. Multi-turn reused the same OpenClaw session: turn 1 ran oneexeccall forhostnameand returnedHOSTNAME=minimax-openclaw-audit-0507; turn 2 ran one newexeccallecho "seen:minimax-openclaw-audit-0507" > /tmp/seen_hostname.txt, did not re-runhostname, wrote the expected value, and summarized successfully. Latency was inside timeout:122.074sone-shot,81.676sturn 1, and49.817sturn 2. Tool-call shape was structured OpenAI-compatible tool calls, not raw tool text; MiniMax thinking was present as OpenClawthinkingblocks withthinkingSignature: reasoning_content, and final assistant text was non-empty after tool results. Operational note: the CLI printed a gateway websocket1006close and used OpenClaw's embedded runner, but the model/provider run completed successfully and persisted the normal trajectory. Required affordance: none beyond the generic OpenClaw--thinking off/thinkingDefault: offpath already used for these sandbox smoke runs; no MiniMax-specific request mutation, response parser, shell rewriter, or plugin is justified. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest because there is no concrete MiniMax-specific setup behavior to express; if a future MiniMax issue required request-body mutation such as explicitreasoning_split, registry v1 would not express that class cleanly. External docs checked: NVIDIA MiniMax M2.7 model card, NVIDIA MiniMax M2.7 infer reference, MiniMax Tool Use & Interleaved Thinking guide, MiniMax OpenAI-compatible chat docs, and MiniMax-M2 GitHub README.Hermes / NVIDIA Endpoints /
minimaxai/minimax-m2.7—pass. Validated on 2026-05-07 UTC onmainfa99a37065664f2a4c2af16a0bfc3bb4fac2d605with local sandboxminimax-hermes-audit-0507, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), Hermes configprovider: custom,base_url: https://inference.local/v1, and modelminimaxai/minimax-m2.7. Workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=minimaxai/minimax-m2.7 NEMOCLAW_PREFERRED_API=openai-completions ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name minimax-hermes-audit-0507 --agent hermes --fresh --recreate-sandbox, then Hermes' own OpenAI-compatible API inside the sandbox:POST http://127.0.0.1:18642/v1/chat/completionswithmodel: hermes-agentfor the standard shell-loop prompt andPOST http://127.0.0.1:18642/v1/responseswithprevious_response_idfor server-side multi-turn continuation. Evidence:/sandbox/.hermes/config.yaml,/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json,/sandbox/.hermes/sessions/session_efda2305-58cd-487a-aecc-aba1b0a646b3.json,/sandbox/.hermes/logs/agent.log, and/sandbox/.hermes/logs/errors.log. Chat-completions one-shot returnedHTTP 200in29.814s, recorded three structuredterminalfunction calls (hostname,date,uptime) with successful tool results, and returned a final assistant summary. Responses multi-turn returnedHTTP 200in13.774sfor turn 1 and17.686sfor turn 2; turn 1 stored one structuredterminalhostnamecall andHOSTNAME=minimax-hermes-audit-0507, while turn 2 chained fromresp_a88cc8e84c644fb2a918a3b037d0, made one newterminalcallecho "seen:minimax-hermes-audit-0507", did not make a newhostnamecall in the persisted session, and summarized successfully. Tool-call shape was structured OpenAI-compatible function calling, not raw tool text; MiniMax reasoning was stored in Hermesreasoning_contentfields and did not break tool-result continuation.agent.logcontained non-blocking context-length autodetect warnings that defaulted the model to 128,000 tokens;errors.logcontained only startup warnings about no API-server key/user allowlist. Required affordance: none; no Hermes MiniMax manifest, runtime shim, request mutation, generic parser, or shell rewrite is justified. Registry decision: refactor: add agent-scoped model setup registry #3121 v1 could express declarative Hermes compat if a concrete behavior existed, but this audit found none to record.OpenClaw / NVIDIA Endpoints /
z-ai/glm-5.1—pass. Validated on 2026-05-07 UTC onmain09b66c68384e16e828917b8d7afdbc61893cd4a4with local sandboxglm-openclaw-audit-0507, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider/modelinference/z-ai/glm-5.1, and APIopenai-completionsviahttps://inference.local/v1. Workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=z-ai/glm-5.1 NEMOCLAW_PREFERRED_API=openai-completions ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name glm-openclaw-audit-0507 --agent openclaw --fresh --recreate-sandbox, thennemoclaw-start openclaw agent --agent main --json --thinking off --session-id ... -m <standard and multi-turn prompts>. Evidence:/sandbox/.openclaw/agents/main/sessions/glm-openclaw-oneshot-1778114960.trajectory.jsonl,/sandbox/.openclaw/agents/main/sessions/glm-openclaw-oneshot-1778114960.jsonl,/sandbox/.openclaw/agents/main/sessions/glm-openclaw-multiturn-1778115140.trajectory.jsonl, and/sandbox/.openclaw/agents/main/sessions/glm-openclaw-multiturn-1778115140.jsonl. One-shot recordedfinalStatus: success,timedOut: false, no prompt error, three structuredexectool calls (hostname,date,uptime), and a final assistant summary; no raw tool-call text was persisted as assistant prose. Multi-turn reused the same OpenClaw session: turn 1 ran oneexeccall forhostnameand returnedHOSTNAME=glm-openclaw-audit-0507; turn 2 ran oneexeccallecho 'seen:glm-openclaw-audit-0507', did not re-runhostname, and summarized successfully. Latency was high but inside timeout: about 69s one-shot, about 147s turn 1, and about 104s turn 2. Required affordance: none beyond generic OpenClaw--thinking off/thinkingDefault: offbehavior already used for sandbox smoke paths; no GLM-specific request mutation, plugin, shell rewriter, or manifest is justified. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest for GLM because there is no concrete GLM-specific behavior to express. External docs checked: NVIDIA GLM-5.1 model card, NVIDIA GLM-5.1 infer reference, Z.ai GLM-5.1 overview, and Z.ai thinking mode/tool-result guidance.Hermes / NVIDIA Endpoints /
z-ai/glm-5.1—pass. Validated on 2026-05-07 UTC onmain09b66c68384e16e828917b8d7afdbc61893cd4a4with local sandboxglm-hermes-audit-0507, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), Hermes configprovider: custom,base_url: https://inference.local/v1, and modelz-ai/glm-5.1. Workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=z-ai/glm-5.1 NEMOCLAW_PREFERRED_API=openai-completions ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name glm-hermes-audit-0507 --agent hermes --fresh --recreate-sandbox, then Hermes own OpenAI-compatible API inside the sandbox,POST http://127.0.0.1:18642/v1/chat/completionswithmodel: hermes-agentfor the standard shell-loop prompt andPOST http://127.0.0.1:18642/v1/responseswithprevious_response_idfor server-side multi-turn continuation. Evidence:/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json,/sandbox/.hermes/sessions/session_0441870a-3401-4475-a959-0928220483d5.json,/sandbox/.hermes/logs/agent.log, and/sandbox/.hermes/logs/errors.log. Chat-completions one-shot returnedHTTP 200in88.693816s, recorded three structuredterminalfunction calls (hostname,date,uptime) with successful tool results, and returned a final assistant summary. Responses multi-turn returnedHTTP 200in82.152558sfor turn 1 and65.737139sfor turn 2; turn 1 stored a structuredterminalhostnamecall and function-call output, and turn 2 chained fromresp_f44117b7f73e4ac788e3429d2be4, made one newterminalcallecho 'seen:glm-hermes-audit-0507', did not make a newhostnamecall, and summarized successfully.errors.logcontained only startup warnings about no API-server key/user allowlist; no model prompt/runtime errors were observed. Required affordance: none; no Hermes GLM manifest, runtime shim, request mutation, generic parser, or shell rewrite is justified. Registry decision: refactor: add agent-scoped model setup registry #3121 v1 could express declarative Hermes compat if a concrete behavior existed, but this audit found none to record.OpenClaw / NVIDIA Endpoints /
moonshotai/kimi-k2.6—pass-with-affordance. Fixed by fix: support reasoning models in the OpenClaw harness #3046; PR refactor: add agent-scoped model setup registry #3121 moves the activation into the agent-scoped model-specific setup registry.Hermes / NVIDIA Endpoints /
moonshotai/kimi-k2.6—pass. Validated on PR refactor: add agent-scoped model setup registry #3121 headbe8c398bdaba7e1b9d86501515f5ec1ece6a4f3fusing a rebuilt local Hermes sandbox (hermes-kimi-audit-0506) and Hermes own OpenAI-compatible API on127.0.0.1:18642, not a directinference.localcurl. The acceptance prompt produced separate terminal tool calls forhostname,date, anduptime, then a final response. No Hermes Kimi manifest or runtime shim is justified from this evidence. PR evidence: refactor: add agent-scoped model setup registry #3121 (comment)OpenClaw / NVIDIA Endpoints /
deepseek-ai/deepseek-v4-pro—pass-with-affordance. Validated on PR refactor: add agent-scoped model setup registry #3121 headbe8c398bdaba7e1b9d86501515f5ec1ece6a4f3f(merged intomainby97ae39d4a16472eabb81d0c2e82e36eb6a62d6e9) with local OpenClaw sandboxdeepseek-openclaw-audit-0506, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider/modelinference/deepseek-ai/deepseek-v4-pro, and APIopenai-completionsviahttps://inference.local/v1. Workflow:node bin/nemoclaw.js onboard ... --agent openclaw, thennemoclaw-start openclaw agent --agent main --json --thinking off --session-id deepseek-openclaw-audit-1778090935 -m <standard shell prompt>. Evidence:/sandbox/.openclaw/agents/main/sessions/f7d14bbc-0312-4f5a-b1be-ca17e20a0612.trajectory.jsonlrecordedfinalStatus: success,timedOut: false,toolMetasforhostname,date, anduptime, and a final assistant summary. Duration:99,016ms. Required affordance remains the existing OpenClaw startup preload request mutation that injectschat_template_kwargs.thinking = falsefor exact modeldeepseek-ai/deepseek-v4-pro; refactor: add agent-scoped model setup registry #3121 registry v1 cannot express request mutation, so no DeepSeek manifest was added.Hermes / NVIDIA Endpoints /
deepseek-ai/deepseek-v4-pro—pass. Validated onmain97ae39d4a16472eabb81d0c2e82e36eb6a62d6e9with local Hermes sandboxdeepseek-hermes-audit-0506, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), configprovider: custom,base_url: https://inference.local/v1, and modeldeepseek-ai/deepseek-v4-pro. Workflow:node bin/nemohermes.js onboard ... --agent hermes, then Hermes own APIPOST http://127.0.0.1:18642/v1/chat/completionswithmodel: hermes-agentand the standard shell prompt. Evidence:/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.jsonrecorded three separateterminaltool calls (hostname,date,uptime) with successful tool results and final assistant summary;/sandbox/.hermes/logs/agent.logrecorded main providercustom (deepseek-ai/deepseek-v4-pro). API returned200,finish_reason: stop, usage48,431tokens. No Hermes DeepSeek affordance is justified.OpenClaw / NVIDIA Endpoints /
nvidia/nemotron-3-super-120b-a12b-pass-with-affordance. Validated on 2026-05-06 after the local OpenShell DiskPressure condition cleared, onmain3477ab7da13c51749eedef1662aa4e998ae0feb2with local sandboxnemotron-super-openclaw-audit2-0506, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider/modelinference/nvidia/nemotron-3-super-120b-a12b, and APIopenai-completionsviahttps://inference.local/v1. Workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=nvidia/nemotron-3-super-120b-a12b ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name nemotron-super-openclaw-audit2-0506 --agent openclaw --fresh --recreate-sandbox, thennemoclaw-start openclaw agent --agent main --json --thinking off --session-id nemotron-super-openclaw-audit2-1778103869 -m <standard shell prompt>. Evidence:/sandbox/.openclaw/agents/main/sessions/aa8473de-504f-4fe3-aaf5-554dd13042a4.trajectory.jsonlrecordedfinalStatus: success; the session recorded three separateexectool calls forhostname,date, anduptime, followed by a final assistant summary. Duration:44,400ms. Required affordance remains the existing OpenClaw startup preload request mutation that injectschat_template_kwargs.force_nonempty_content = truefor Nemotron chat-completions requests; refactor: add agent-scoped model setup registry #3121 registry v1 cannot express request mutation, so no Nemotron manifest should be added yet.Hermes / NVIDIA Endpoints /
nvidia/nemotron-3-super-120b-a12b-pass. Validated on 2026-05-06 onmain3477ab7da13c51749eedef1662aa4e998ae0feb2with local Hermes sandboxnemotron-super-hermes-audit2-0506, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), providercustom, base URLhttps://inference.local/v1, and modelnvidia/nemotron-3-super-120b-a12b. Workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=nvidia/nemotron-3-super-120b-a12b ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name nemotron-super-hermes-audit2-0506 --agent hermes --fresh --recreate-sandbox, then Hermes own OpenAI-compatible API inside the sandbox,POST http://127.0.0.1:18642/v1/chat/completionswithmodel: hermes-agentand the standard shell prompt. Evidence:/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.jsonrecorded three actualterminaltool calls forhostname,date, anduptime; the API returnedHTTP 200in34.680551swith a final assistant summary. No Hermes Nemotron affordance is justified by this row.OpenClaw / NVIDIA Endpoints /
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-blockedby observed OpenClaw agent behavior, not by local infrastructure. Validated on 2026-05-06 onmain3477ab7da13c51749eedef1662aa4e998ae0feb2with sandboxnemotron-omni-openclaw-audit2-0506, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider/modelinference/nvidia/nemotron-3-nano-omni-30b-a3b-reasoning, and APIopenai-completions. Onboard workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=nvidia/nemotron-3-nano-omni-30b-a3b-reasoning ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name nemotron-omni-openclaw-audit2-0506 --agent openclaw --fresh --recreate-sandbox. First standard-prompt run, sessionnemotron-omni-openclaw-audit2-1778104704, evidence/sandbox/.openclaw/agents/main/sessions/573b888d-639f-4440-956e-8f0788d176d5.trajectory.jsonl, made malformed or unsupported tool attempts aroundexechost selection, read/etc/hostname, and stopped without completingdateoruptime. Clean retry, sessionnemotron-omni-openclaw-retry-1778104885, evidence/sandbox/.openclaw/agents/main/sessions/8a4cdc8a-0765-457f-a5ac-2432be5d4820.trajectory.jsonl, made three separate successfulexeccalls forhostname,date, anduptimewithtoolSummary.failures: 0and duration31,821ms, but the final assistant response wasNO_REPLY/thinking-only instead of the requested summary. The existingforce_nonempty_contentrequest mutation is still relevant but insufficient to pass the full OpenClaw shell-loop acceptance scenario for this Omni reasoning model.Hermes / NVIDIA Endpoints /
nvidia/nemotron-3-nano-omni-30b-a3b-reasoning-pass. Validated on 2026-05-06 onmain3477ab7da13c51749eedef1662aa4e998ae0feb2with local Hermes sandboxnemotron-omni-hermes-audit2-0506, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), providercustom, base URLhttps://inference.local/v1, and modelnvidia/nemotron-3-nano-omni-30b-a3b-reasoning. Workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=nvidia/nemotron-3-nano-omni-30b-a3b-reasoning ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name nemotron-omni-hermes-audit2-0506 --agent hermes --fresh --recreate-sandbox, then Hermes own OpenAI-compatible API inside the sandbox,POST http://127.0.0.1:18642/v1/chat/completionswithmodel: hermes-agentand the standard shell prompt. Evidence:/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.jsonrecorded three actualterminaltool calls forhostname,date, anduptime;/sandbox/.hermes/logs/agent.logrecorded main providercustom (nvidia/nemotron-3-nano-omni-30b-a3b-reasoning). The API returnedHTTP 200in28.026483swith a final assistant summary. No Hermes Nemotron affordance is justified by this row.OpenClaw / NVIDIA Endpoints /
openai/gpt-oss-120b-degraded. Validated on 2026-05-06 on currentmainca1d6b84a5c938611be412239718f1e46963d8d0after refactor: add agent-scoped model setup registry #3121 was already merged, with local sandboxgpt-oss-openclaw-audit-0506, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider routenvidia-prod, modelinference/openai/gpt-oss-120b, and APIopenai-completionsviahttps://inference.local/v1(NVIDIA Endpoints route tohttps://integrate.api.nvidia.com/v1). Onboard workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=openai/gpt-oss-120b ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name gpt-oss-openclaw-audit-0506 --agent openclaw --fresh --recreate-sandbox. One-shot workflow:openshell sandbox exec -n gpt-oss-openclaw-audit-0506 --timeout 900 -- /usr/local/bin/nemoclaw-start openclaw agent --agent main --json --thinking off --session-id gpt-oss-openclaw-oneshot-1778106366 -m <standard shell prompt>. Evidence:/sandbox/.openclaw/agents/main/sessions/8cc780fa-23ac-41dd-80c3-146393e39e00.trajectory.jsonlrecordedfinalStatus: success,timedOut: false, nopromptError,finishReason: stop, and final assistant text. Tool behavior: structured OpenAI-compatible tool calls, not raw Harmony/tool text;thinkingcontent was stored as thinking metadata, not assistant prose. Tool count: 4execattempts in one-shot (hostnamewithsecurity: allowlistdenied, then successfulhostname,date,uptimewithsecurity: full), so the target commands completed but with one extra denied retry; duration31,922ms. Multi-turn workflow used sessiongpt-oss-openclaw-multiturn-1778106429; evidence/sandbox/.openclaw/agents/main/sessions/7fb1d1b4-967c-45e9-b3bd-3e05a5989292.trajectory.jsonlrecorded turn 1hostname->HOSTNAME=gpt-oss-openclaw-audit-0506in4,767ms, then turn 2 did not re-runhostname, made one shell callecho "seen:gpt-oss-openclaw-audit-0506" > seen.txt, and finalized successfully in4,104ms. Required affordance: none. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest; there is no model-specific setup effect to express, and the only observed limitation is an OpenClaw tool-argument/security retry rather than request mutation, response normalization, or Harmony parsing.Hermes / NVIDIA Endpoints /
openai/gpt-oss-120b-pass. Validated on 2026-05-06 on currentmainca1d6b84a5c938611be412239718f1e46963d8d0with local sandboxgpt-oss-hermes-audit-0506, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), configprovider: custom,base_url: https://inference.local/v1, and modelopenai/gpt-oss-120b. Onboard workflow:NEMOCLAW_PROVIDER=build NEMOCLAW_MODEL=openai/gpt-oss-120b ./bin/nemohermes.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name gpt-oss-hermes-audit-0506 --agent hermes --fresh --recreate-sandbox; config evidence:/sandbox/.hermes/config.yaml. API workflow: Hermes own OpenAI-compatible API,POST http://127.0.0.1:18642/v1/chat/completionswithmodel: hermes-agent, not a directinference.localcurl. One-shot evidence:/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json,/sandbox/.hermes/logs/agent.log,/sandbox/.hermes/logs/errors.log, and/tmp/gateway.log; API returnedHTTP 200in7.222s,finish_reason: stop, three structuredterminaltool calls (hostname,date,uptime), successful tool results, and a final assistant summary. Multi-turn evidence: sessionapi-92e4a54a694502a8in/sandbox/.hermes/sessions/session_api-92e4a54a694502a8.jsonplus/sandbox/.hermes/state.db; turn 1 returnedHOSTNAME=gpt-oss-hermes-audit-0506in2.196s, and turn 2 returnedHTTP 200in4.011swith one structuredterminalcallecho "seen:gpt-oss-hermes-audit-0506"and no secondhostnamecall. State DB recordedmessage_count: 8,tool_call_count: 2, modelopenai/gpt-oss-120b, sourceapi_server; final response summarized theseen:output. Raw Harmony markers (<|...|>) were absent from persisted message/tool fields; reasoning content was stored separately inreasoning/reasoning_content. Latency/operability note: the first Hermes sandbox start was interrupted by local OpenShell gateway TLS/ephemeral-storage recovery and a local-image reimport, but the agent/API run passed after the sandbox reachedReady; this was local infrastructure, not model behavior. Required affordance: none. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest; no Hermes-specific setup, parser, request mutation, or response normalization is justified by this evidence.Additional GPT-OSS setup audit evidence:
2026-05-06 source/docs audit on
mainca1d6b84a5c938611be412239718f1e46963d8d0:openai/gpt-oss-120bis a curated NVIDIA Endpoints model insrc/lib/inference-config.ts. The NVIDIA Endpoints provider path resolves to chat completions (openai-completions) throughhttps://inference.local/v1inside the sandbox, while the gateway route targets NVIDIA Endpoints.src/lib/onboard-inference-probes.tsuses the generic chat-completions probe for this model;scripts/nemoclaw-start.shonly preloads the current Nemotron/DeepSeek request mutations;agents/hermes/start.shandagents/hermes/generate-config.tsadd no GPT-OSS-specific handling; andnemoclaw-blueprint/model-specific-setup/**contains no GPT-OSS manifest.External source notes: OpenAI's GPT-OSS/Harmony documentation describes Harmony as the chat/reasoning/tool-call format for raw GPT-OSS serving, and the OpenAI vLLM cookbook documents GPT-OSS serving with
--tool-call-parser openaiand--reasoning-parser openai_gptossfor OpenAI-compatible Chat Completions. NVIDIA NIM reasoning-model documentation similarly treats reasoning/parser/template behavior as serving-stack configuration. Relevant docs inspected: https://platform.openai.com/docs/models/gpt-oss, https://cookbook.openai.com/articles/openai-harmony/, https://cookbook.openai.com/articles/gpt-oss/run-vllm/, and https://docs.nvidia.com/nim/large-language-models/1.15.0/reasoning-model.html.NemoClaw runtime evidence did not show raw Harmony text from NVIDIA Endpoints for
openai/gpt-oss-120b. Both OpenClaw and Hermes received structured OpenAI-compatible tool calls with separate reasoning fields and final assistant answers after tool results. Therefore, a generic Harmony parser, shell command rewriter, or refactor: add agent-scoped model setup registry #3121 v1 registry manifest should not be added for this model/provider based on this audit. If a future provider returns raw Harmony/tool text, that would be response normalization or serving-template/parser policy, not a v1 manifest effect.OpenClaw / Gemini /
gemini-3.1-pro-preview-blocked. Re-run on 2026-05-06/2026-05-07 UTC on currentmainf586cc59131ec396cfcaab3b915ad76f001210caafter refactor: add agent-scoped model setup registry #3121 was merged, with OpenShell0.0.36and OpenClaw2026.4.24(cbcfdf6). Provider path: NemoClaw providergemini, OpenShell providergemini-api, Google OpenAI-compatible base URLhttps://generativelanguage.googleapis.com/v1beta/openai/, sandbox routehttps://inference.local/v1, model refinference/gemini-3.1-pro-preview, APIopenai-completions,supportsStore: false, and OpenClaw configthinkingDefault: off. Onboard workflow:NEMOCLAW_PROVIDER=gemini NEMOCLAW_MODEL=gemini-3.1-pro-preview ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name gemini31-pro-openclaw-audit-0506b --agent openclaw --fresh --recreate-sandbox. Standard one-shot workflow:nemoclaw-start openclaw agent --agent main --json --session-id gemini31-openclaw-oneshot-1778111108 -m <standard shell prompt>. Evidence captured before a later gateway restart removed that sandbox:/sandbox/.openclaw/agents/main/sessions/16c5d6c7-5096-4ab4-80c9-494d73d74c42.jsonland.trajectory.jsonl. Result: the model emitted structured OpenAI-compatibleexectool calls forhostname,date, anduptimeand all three tool results completed, but the continuation/final-answer request after tool results returned400 status code (no body), leaving no final assistant summary. The persisted OpenClaw session contained nothought_signatureorextra_contentfields. Multi-turn turn 1 in sessiongemini31-openclaw-multiturn-1778111208similarly made onehostnametool call and then failed with400 status code (no body), so turn 2 could not run. Follow-up retry on 2026-05-07 in sandboxgemini31-pro-openclaw-audit-0506cfirst saw stale sandbox DNS/proxy causing503 "inference service unavailable"; after./bin/nemoclaw.js internal dns setup-proxy nemoclaw gemini31-pro-openclaw-audit-0506crewired DNS to10.200.0.1:53 -> 10.42.0.17, the route was reachable again. Retry sessiongemini31-openclaw-retry-dnsfix-1778114081, log/tmp/gemini31-openclaw-retry-dnsfix-1778114081.log, evidence/sandbox/.openclaw/agents/main/sessions/gemini31-openclaw-retry-dnsfix-1778114081.jsonland.trajectory.jsonl, emitted three structuredexectool calls (hostname,date,uptime) and completed all tool results, then reproduced400 status code (no body)with an empty final assistant message; duration32,162ms; nothought_signatureorextra_contentpersisted. Final classification remains blocked by the observed OpenClaw Gemini 3.1 tool-result continuation/state-preservation failure, not by provider availability. Required affordance/fix class: OpenClaw/provider adapter response-history preservation fortool_calls[].extra_content.google.thought_signatureor equivalent Gemini 3 function-call state handling. Registry decision: do not add a refactor: add agent-scoped model setup registry #3121 v1 manifest; this is response/history preservation or adapter behavior, not declarative setup.Hermes / Gemini /
gemini-3.1-pro-preview-pass. Validated on 2026-05-07 UTC onmainf586cc59131ec396cfcaab3b915ad76f001210cawith sandboxgemini31-pro-hermes-audit-0506b, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), configprovider: custom,base_url: https://inference.local/v1, modelgemini-3.1-pro-preview, and API server forwarded athttp://127.0.0.1:8642/v1. Config evidence:/sandbox/.hermes/config.yaml; logs:/sandbox/.hermes/logs/agent.logand/sandbox/.hermes/logs/errors.log. API workflow used Hermes' own OpenAI-compatible API, not directinference.local:POST http://127.0.0.1:8642/v1/chat/completionswithmodel: hermes-agent. One-shot prompt returnedHTTP 200in22.920s,finish_reason: stop, and session headerX-Hermes-Session-Id: api-9816e26b83c423bc; evidence/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.jsonrecorded three structuredterminaltool calls (hostname,date,uptime) plus final summary. That session persistedtool_calls[].extra_content.google.thought_signatureon the first Gemini 3.1 tool call, proving Hermes preserved the Google-specific state that OpenClaw dropped. Multi-turn used the same API with explicit message history because this sandbox had no API server key configured and therefore rejectsX-Hermes-Session-Idcontinuation; turn 1 returnedHOSTNAME=gemini31-pro-hermes-audit-0506bin7.933s, and turn 2 returnedHTTP 200in12.486swith the same derived session headerapi-92e4a54a694502a8. Evidence/sandbox/.hermes/sessions/session_api-92e4a54a694502a8.jsonrecorded oneterminalcallecho 'seen:gemini31-pro-hermes-audit-0506b' > seen.txt && cat seen.txt, no secondhostnamecall, a successful tool result, and a final summary. Required affordance: none for Hermes. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.OpenClaw / Gemini /
gemini-2.5-pro-pass. Validated on 2026-05-07 UTC onmainf586cc59131ec396cfcaab3b915ad76f001210cawith live sandboxgemini25-pro-openclaw-audit-0506c, OpenShell0.0.36, OpenClaw2026.4.24(cbcfdf6), provider routegemini-api, modelinference/gemini-2.5-pro, APIopenai-completions, andthinkingDefault: off. Onboard workflow:NEMOCLAW_PROVIDER=gemini NEMOCLAW_MODEL=gemini-2.5-pro ./bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --name gemini25-pro-openclaw-audit-0506c --agent openclaw --fresh --recreate-sandbox. One-shot workflow:nemoclaw-start openclaw agent --agent main --local --json --session-id gemini25-openclaw-oneshot-c-1778113536 -m <standard shell prompt>. Evidence:/sandbox/.openclaw/agents/main/sessions/gemini25-openclaw-oneshot-c-1778113536.jsonland.trajectory.jsonlrecordedfinalStatus: success,timedOut: false, no prompt error, three separate structuredexectool calls (hostname,date,uptime),toolMetasfor those commands, and a final summary; duration26.930s. Multi-turn workflow used sessiongemini25-openclaw-multiturn-c-1778113583; evidence/sandbox/.openclaw/agents/main/sessions/gemini25-openclaw-multiturn-c-1778113583.jsonland.trajectory.jsonlrecorded turn 1hostname->HOSTNAME=gemini25-pro-openclaw-audit-0506cin25.913s, then turn 2 made oneexeccallecho 'seen:gemini25-pro-openclaw-audit-0506c' > hostname.txtwithout re-runninghostname, and returned a final summary in28.515s. No raw function-call text,thought_signature, orextra_contentfields were persisted for Gemini 2.5 in OpenClaw. Required affordance: none. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.Hermes / Gemini /
gemini-2.5-pro-pass. Validated on 2026-05-07 UTC onmainf586cc59131ec396cfcaab3b915ad76f001210cawith sandboxgemini25-pro-hermes-audit-0506b, OpenShell0.0.36, Hermes Agentv0.11.0 (2026.4.23), configprovider: custom,base_url: https://inference.local/v1, modelgemini-2.5-pro, and API server forwarded athttp://127.0.0.1:8642/v1. Config evidence:/sandbox/.hermes/config.yaml; logs:/sandbox/.hermes/logs/agent.logand/sandbox/.hermes/logs/errors.log. API workflow used Hermes' own OpenAI-compatible API,POST http://127.0.0.1:8642/v1/chat/completionswithmodel: hermes-agent. One-shot returnedHTTP 200in12.366s,finish_reason: stop, and session headerapi-9816e26b83c423bc; evidence/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.jsonrecorded three structuredterminaltool calls (hostname,date,uptime) and a final summary. Multi-turn used explicit message history; retry evidence/sandbox/.hermes/sessions/session_api-92e4a54a694502a8.jsonrecorded turn 1HOSTNAME=gemini25-pro-hermes-audit-0506b, then turn 2 made oneterminalcallecho 'seen:gemini25-pro-hermes-audit-0506b' > hostname.logwithout re-runninghostname, and returned a final summary. Turn latencies on the passing retry were5.985sand8.550s. Nothought_signatureorextra_contentfields were observed for Gemini 2.5 Hermes sessions. Operational note: an earlier 2.5 Hermes multi-turn attempt returnedHTTP 200but wrote a typoedseen:gemini2s...command while the final text claimed the correct hostname; the immediate retry passed, and this was a model/tool-argument accuracy hiccup rather than a continuation or thought-signature failure. Required affordance: none. Registry decision: no refactor: add agent-scoped model setup registry #3121 v1 manifest.Additional Gemini setup audit evidence:
mainf586cc59131ec396cfcaab3b915ad76f001210ca: Gemini curated onboarding models live insrc/lib/model-prompts.ts, includinggemini-3.1-pro-previewandgemini-2.5-pro.src/lib/onboard-providers.tswires Google Gemini as OpenAI-compatible providergemini-apiwithGEMINI_API_KEYandhttps://generativelanguage.googleapis.com/v1beta/openai/.src/lib/onboard-providers.tsmaps Gemini sandbox config to provider keyinference, primary modelinference/<model>,https://inference.local/v1,openai-completions, andinferenceCompat.supportsStore = false.src/lib/validation.tsskips the Responses API forgemini-api, andsrc/lib/onboard-inference-probes.tssends Bearer auth to the OpenAI-compatible endpoint.scripts/nemoclaw-start.sh,scripts/generate-openclaw-config.py,agents/hermes/generate-config.ts, andagents/hermes/start.shcurrently add no Gemini-specific thought-signature handling. No Gemini manifests exist undernemoclaw-blueprint/model-specific-setup/**.503 "inference service unavailable"fromhttps://inference.local/v1/chat/completions; this was cleared by refreshing the sandbox DNS/proxy with./bin/nemoclaw.js internal dns setup-proxy, after which the route was reachable and the same post-tool400 status code (no body)reproduced. Gemini 2.5 remained available during the final OpenClaw and Hermes runs.https://generativelanguage.googleapis.com/v1beta/openai/, BearerGEMINI_API_KEY, and/chat/completionsfunction calling. Google thought-signature docs state that thinking models in the Gemini 3 and 2.5 series may return thought signatures, that signatures should be passed back exactly in conversation history, and that Gemini 3 models require thought signatures during function calling or a 4xx validation error can result. For OpenAI-compatible chat completions, Google represents signatures undertool_calls[].extra_content.google.thought_signature; Gemini 3 requires the first function call signature to be returned, while Gemini 2.5 signature return is documented as optional for function calls. Docs inspected: https://ai.google.dev/gemini-api/docs/openai, https://ai.google.dev/gemini-api/docs/function-calling, https://ai.google.dev/gemini-api/docs/thought-signatures, and https://ai.google.dev/gemini-api/docs/thinking.extra_content.google.thought_signatureand passed; OpenClaw did not persistthought_signature/extra_contentand failed the post-tool continuation with Google-route400 status code (no body); a later stale-DNS/proxy503recreate was cleared and the same post-tool400reproduced. Gemini 2.5 Pro passed both OpenClaw and Hermes without a model-specific affordance. If OpenClaw Gemini 3.1 is fixed later, the fix should be scoped to Google/Gemini OpenAI-compatible tool-call state preservation or agent adapter behavior. refactor: add agent-scoped model setup registry #3121 registry v1 cannot express that class cleanly, so no manifest should be added based on this audit.Additional Nemotron setup audit evidence:
main3477ab7da13c51749eedef1662aa4e998ae0feb2: current OpenClaw behavior remains a runtime request mutation inscripts/nemoclaw-start.sh, not a refactor: add agent-scoped model setup registry #3121 v1 manifest. The preload wraps Node HTTP(S)POST /v1/chat/completionscalls and injectschat_template_kwargs.force_nonempty_content = truefor model IDs matching/nemotron/i. Registry v1 can match exact route metadata and apply config/plugin effects, but it cannot express request-body mutations, so no Nemotron registry manifest should be added yet. Future registry support should model request mutations as an explicit OpenClaw-owned effect and should prefer exact supported IDs or provider-class policy once the provider boundary is proven.nvidia/nemotron-3-super-120b-a12bandnvidia/nemotron-3-nano-omni-30b-a3b-reasoning, the managed local vLLM Linux profilenvidia/NVIDIA-Nemotron-3-Nano-4B-FP8, and local/custom routes whose selected model ID containsnemotron, including the Ollama defaultnemotron-3-nano:30bwhen OpenClaw sends chat-completions for that model. The broad regex is not ideal long-term, but narrowing it now would drop documented compatible-endpoint/NIM/vLLM Nemotron routes and the historicalnvidia/llama-3.3-nemotron-super-49b-v1failure family without a replacement request-mutation registry capability.nvidia/nemotron-3-nano-omni-30b-a3b-reasoningreturned HTTP 200 withcontent: nulland reasoning-only output withoutforce_nonempty_content, then returned non-nullcontentwithchat_template_kwargs.force_nonempty_content = true.nvidia/nemotron-3-super-120b-a12breturned HTTP 200 with non-empty content in this simple raw probe, but historical issues openclaw agent returns empty content when model makes tool calls instead of text responses #1193 and Nemotron-3-Super:120b and nemoclaw stalls due to OpenClaw's interpretation of end-of-turn #2051 plus the OpenClaw tool-bearing request shape still justify retaining the mutation until a full agent run proves it obsolete.qwen2.5:0.5bandnemotron-mini:latestinstalled, notnemotron-3-nano:30b, and no NVIDIA GPU was available for local vLLM. Code inspection shows OpenClaw would currently send the extrachat_template_kwargsfield for Ollama model IDs containingnemotron; future exact manifests/request-mutation metadata should exclude Ollama unless a local Ollama run proves it both accepts and needs the field.Additional DeepSeek follow-up evidence:
2026-05-06 multi-turn continuation attempt —
blockedby NVIDIA Endpoints rate limiting, not by observed agent/session request-shape behavior. OpenClaw sandboxdeepseek-openclaw-audit-0506onmain97ae39d4a16472eabb81d0c2e82e36eb6a62d6e9completed turn 1 in persistent sessiondeepseek-openclaw-multiturn-1778091696with oneexectool call (hostname) andHOSTNAME=deepseek-openclaw-audit-0506; turn 2 and retry both failed before any model output/tool call with provider429 status code (no body). Evidence:/sandbox/.openclaw/agents/main/sessions/21f437f3-5c42-4b49-b2d1-08d3def4b6b2.trajectory.jsonl,/tmp/gateway.log,/tmp/nemoclaw-start.log. Hermes sandboxdeepseek-hermes-audit-0506retried the same multi-turn shape through Hermes own API with conversationdeepseek-hermes-multiturn-1778092277; turn 1 failed before terminal tool use withHTTP 429: Too Many Requestsafter 3 retries. Evidence:/sandbox/.hermes/sessions/request_dump_api-57283730231debee_20260506_183130_753804.json,/sandbox/.hermes/logs/agent.log,/sandbox/.hermes/logs/errors.log. Re-run needed after endpoint quota resets to prove multi-turn continuation.2026-05-06 19:19-19:26 UTC retry — OpenClaw multi-turn continuation
pass; Hermes multi-turn stillblockedby NVIDIA Endpoints 429. Readiness first cleared at2026-05-06T19:19:49Zwith rawPOST https://inference.local/v1/chat/completionsreturningHTTP 200,nvcf-status: fulfilled, and contentOK. OpenClaw sandboxdeepseek-openclaw-audit-0506then completed persistent sessiondeepseek-openclaw-multiturn-pass-1778095200: turn 1 made oneexectool call forhostnameand returnedHOSTNAME=deepseek-openclaw-audit-0506; turn 2 reused that hostname without re-runninghostname, made oneexectool callprintf 'seen:deepseek-openclaw-audit-0506\n', and finalized successfully. Evidence:/sandbox/.openclaw/agents/main/sessions/550bbc05-d91d-4c54-a127-25a61f9c24e3.jsonland.trajectory.jsonl(finalStatus: success,toolMetascontains theprintfcommand). Hermes sandboxdeepseek-hermes-audit-0506turn 1 through Hermes own API returnedHOSTNAME=deepseek-hermes-audit-0506; turn 2 and a retry failed before final continuation withHTTP 429: Too Many Requests, and raw one-token readiness was also back toHTTP 429at2026-05-06T19:26:05Z. Evidence:/sandbox/.hermes/sessions/session_api-755da3cf323317c1.json,/sandbox/.hermes/sessions/request_dump_api-f14936ec1fec0f20_20260506_192147_850678.json,/sandbox/.hermes/logs/agent.log,/sandbox/.hermes/logs/errors.log.Additional OpenAI runtime audit evidence (refreshed 2026-05-07 UTC):
mainat3351fbdd4eb7d9b80ec471545083956327da2b10; PR refactor: add agent-scoped model setup registry #3121 is merged, so currentmainwas used. Checkout was clean onmain...origin/main. Identity/signing audit before any commit path: globaluser.name Aaron Erickson,user.email aerickson@nvidia.com, SSH signing key configured,commit.gpgsign true. No repo code changes were made.v0.0.35-40-g3351fbdd; OpenShell0.0.36; OpenClaw2026.4.24 (cbcfdf6); Hermes Agentv0.11.0 (2026.4.23)with OpenAI SDK2.24.0.v1/responsesandv1/chat/completionswith function calling; OpenAI documentsprevious_response_idwithstore: true, and stateless reasoning requires encrypted reasoning items; direct live probe confirmedgpt-5.4-pro-2026-03-05works on/v1/responsesbut returns 404 on/v1/chat/completions../bin/nemoclaw.js onboard --non-interactive --yes --yes-i-accept-third-party-software --no-gpu --agent openclaw --fresh --recreate-sandboxwithNEMOCLAW_PROVIDER=openai, thenopenshell sandbox exec ... /usr/local/bin/nemoclaw-start openclaw agent --agent main --json --thinking off --session-id <id> -m <prompt>. Hermes used./bin/nemohermes.js onboard ... --agent hermes, then Hermes' own local API athttp://127.0.0.1:18642:/v1/chat/completionswithmodel: hermes-agentfor the one-shot prompt, and/v1/responses+previous_response_idfor same-conversation continuation.openai-api-> sandbox provideropenai/<model>viahttps://inference.local/v1, with generatedopenclaw.jsonapi: openai-responsesfor all four models. Hermes generatedmodel.provider: custom,model.base_url: https://inference.local/v1, andmodel.default: <model>; Hermes did not receive an OpenAI-specific provider/API-mode config from NemoClaw.tls handshake eofand later k3s disk-pressure/image-pull recovery during sandbox creation; those were resolved by gateway restart and pruning stale generated sandbox images. They are not counted as model failures. OpenClaw mini had one denied allowlist attempt before retryinghostnamewith full exec in the multi-turn first turn; it still completed correctly and is not a model-specific blocker.gpt-5.4openai-api; sandboxopenai/gpt-5.4;openai-responsespassoa54-openclaw-audit-0507; one-shot/sandbox/.openclaw/agents/main/sessions/92728641-7ca3-4898-bae5-6620d5e2c1eb.trajectory.jsonl; multi/sandbox/.openclaw/agents/main/sessions/0a61dbbb-217f-4e23-8376-328e964fa07c.trajectory.jsonlexeccalls, not raw tool text. One-shot issued 3 shell calls (hostname,date,uptime) and produced final summary. Observed one-shot duration about 33.1s from CLI output.hostnameand repliedHOSTNAME=oa54-openclaw-audit-0507; turn 2 did not re-runhostname, ran a shell command forseen:oa54-openclaw-audit-0507, then summarized.gpt-5.4/v1/chat/completionsand/v1/responses; upstream configcustom->https://inference.local/v1, modelgpt-5.4passoa54-hermes-audit-0507; one-shot/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json; multi/sandbox/.hermes/sessions/session_44c94f31-940a-4887-919d-40f01d0328ad.json; logs/sandbox/.hermes/logs/agent.log,/sandbox/.hermes/logs/errors.logterminalcalls. One-shot issued 3 terminal calls (hostname,date,uptime) and returned a final assistant summary. Latency about 12.1s./v1/responsescontinuation issued 2 terminal calls total:hostname, thenprintf ... seen:oa54-hermes-audit-0507; no hostname re-run in turn 2; final summary present.API_SERVER_KEY, but Responses continuation works. No registry manifest.gpt-5.4-miniopenai-api; sandboxopenai/gpt-5.4-mini;openai-responsespassoa54mini-openclaw-audit-0507; one-shot/sandbox/.openclaw/agents/main/sessions/7e0d64e2-89df-41c4-8100-c884f95c159f.trajectory.jsonl; multi/sandbox/.openclaw/agents/main/sessions/46776da6-ee31-4513-9091-bc6dd9d6ebe0.trajectory.jsonlexeccalls. One-shot issued 3 shell calls and summarized hostname/date/uptime. Runtime about 6.3s.hostnameand got allowlist denial, retried fullhostname, then turn 2 ranprintf 'seen:%s\n' 'oa54mini-openclaw-audit-0507'without re-running hostname; final summary present.gpt-5.4-mini/v1/chat/completionsand/v1/responses; upstream configcustom->https://inference.local/v1, modelgpt-5.4-minipassoa54mini-hermes-audit-0507; one-shot/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json; multi/sandbox/.hermes/sessions/session_122350b7-f803-4776-9a68-947ee6d78231.json; logs/sandbox/.hermes/logs/agent.log,/sandbox/.hermes/logs/errors.logskill_viewprelude, then final summary. Latency about 9.1s.hostname, then wroteseen:oa54mini-hermes-audit-0507to/tmp/seen_hostname.txtwithout re-running hostname; final summary present.gpt-5.4-nanoopenai-api; sandboxopenai/gpt-5.4-nano;openai-responsespassoa54nano-openclaw-audit-0507; one-shot/sandbox/.openclaw/agents/main/sessions/303e744e-7934-4a13-99cd-cc59dab18478.trajectory.jsonl; multi/sandbox/.openclaw/agents/main/sessions/3b89aa65-e644-4c15-bc49-ec1dc0a2adf3.trajectory.jsonlexeccalls. One-shot issued 3 shell calls and summarized hostname/date/uptime. Runtime about 6.9s.hostname; turn 2 ranecho seen:oa54nano-openclaw-audit-0507without re-running hostname; final summary present.gpt-5.4-nano/v1/chat/completionsand/v1/responses; upstream configcustom->https://inference.local/v1, modelgpt-5.4-nanopassoa54nano-hermes-audit-0507; one-shot/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json; multi/sandbox/.hermes/sessions/session_1769f4fb-5ef0-4bed-ba6b-47d57c6d3366.json; logs/sandbox/.hermes/logs/agent.log,/sandbox/.hermes/logs/errors.logterminalcalls. One-shot issued 3 terminal calls and summarized hostname/date/uptime. Latency about 8.3s.hostname, thenprintf 'seen:%s\n' 'oa54nano-hermes-audit-0507'; no hostname re-run in turn 2; final summary present.gpt-5.4-pro-2026-03-05openai-api; sandboxopenai/gpt-5.4-pro-2026-03-05;openai-responsesblockedoa54pro-openclaw-audit-0507; one-shot/sandbox/.openclaw/agents/main/sessions/73b6a9aa-80b4-4991-91fa-7ce77c56e880.trajectory.jsonl; multi turn-1/sandbox/.openclaw/agents/main/sessions/fd0ac825-0b24-4280-8245-b3c2e6fe45ad.trajectory.jsonlexec hostnametool call, not raw text, but after the tool result OpenAI returned404 Item with id ... not found. Items are not persisted when store is set to false. One-shot stopped after 1 tool call; nodate, nouptime, no final assistant summary. CLI returned nonzero; latency about 44.2s.hostnamecall, so turn 2 was not meaningful.store:truewhen carrying response item ids, or encrypted reasoning items when usingstore:false. This is provider transport/request-response behavior, not a JSON setup manifest. #3121 registry v1 cannot express it.gpt-5.4-pro-2026-03-05custompath calledhttps://inference.local/v1/chat/completionswith modelgpt-5.4-pro-2026-03-05blockedoa54pro-hermes-audit-0507; one-shot/sandbox/.hermes/sessions/session_api-9816e26b83c423bc.json; multi/sandbox/.hermes/sessions/session_fbb5c5f1-3ee5-4929-85b1-07ca0bd0f953.json; request dumps under/sandbox/.hermes/sessions/request_dump_*042156*.json,*042204*.json,*042215*.json; logs/sandbox/.hermes/logs/agent.log,/sandbox/.hermes/logs/errors.logHTTP 404: This is not a chat model and thus not supported in the v1/chat/completions endpoint. One-shot local API status was 200 but semantic result is blocked. Latency about 9.8s./v1/responsescontinuation also failed before any tool call because the Hermes agent's upstream model call still used/v1/chat/completions; both turns returned the same 404 error text.customagainsthttps://inference.local) but the failure mode differs: OpenAI pro returns model/endpoint 404 rather than provider-policy 403. This is provider/agent config work, not a per-model setup manifest. #3121 registry v1 cannot express it.Overall OpenAI verdict:
gpt-5.4,gpt-5.4-mini, andgpt-5.4-nanoare valid curated agent models for both OpenClaw and Hermes with no model-specific registry affordance.gpt-5.4-pro-2026-03-05is blocked on both surfaces for different OpenAI Responses integration reasons: OpenClaw reaches Responses and tool calls but mishandles reasoning/state continuation withstore:false; Hermes routes through a custom chat-completions upstream path that the pro snapshot rejects. No code change was made in this audit; if fixed later, the work should be scoped to provider transport/API mode or agent adapter behavior, not registry v1 manifests.Initial risk classification
Already has model-aware behavior in code
moonshotai/kimi-k2.6pass-with-affordance; fixed by fix: support reasoning models in the OpenClaw harness #3046 and tracked for registry refactor in refactor: add agent-scoped model setup registry #3121.pass; validated on PR refactor: add agent-scoped model setup registry #3121 headbe8c398bdaba7e1b9d86501515f5ec1ece6a4f3fwith no Hermes-specific affordance needed.deepseek-ai/deepseek-v4-propass-with-affordance; exact-model runtime request mutation inscripts/nemoclaw-start.shinjectschat_template_kwargs.thinking = falsefor/v1/chat/completions.pass; custom chat-completions path works with no DeepSeek-specific Hermes manifest or runtime shim.src/lib/onboard-inference-probes.tsor adjacent validation-policy metadata.chat_template_kwargs.force_nonempty_content = truefor model IDs matchingnemotron.pass-with-affordance, Hermes Super ispass, Hermes Omni ispass, and OpenClaw Omni isblockedby model/runtime response behavior (NO_REPLY/thinking-only final after tool results), not by infrastructure.High-priority discovery targets
Qwen/Qwen3.6-27B-FP8through managed local vLLMopenai/gpt-oss-120bthrough NVIDIA Endpointsblockedby tool-result continuation/state handling (400 status code (no body)after structured tool calls; stale-DNS/proxy503cleared on retry), Hermespasswithextra_content.google.thought_signaturepreserved. Gemini 2.5 Pro: OpenClawpass, Hermespass.z-ai/glm-5.1minimaxai/minimax-m2.7Provider-class policy, not necessarily model-specific setup
/v1/chat/completions/v1/chat/completions/v1/chat/completions/v1/chat/completionstoolscapability gateRequired audit scenarios
Each model/provider/agent combination should be classified with evidence from a repeatable scenario set.
Baseline chat
Shell tool loop
Use a standard prompt such as:
Required checks:
hostname; date; uptimecommand appears unless that is the explicit expected behavior for the test.promptError.Multi-turn continuation
Sub-agent delegation
sessions_spawnrequest is structured correctly.Hermes path
Performance and operability
Track at least:
Result states
Every row in the audit matrix should end in one of these states:
pass: works without model-specific changespass-with-affordance: works with a documented model/provider affordancedegraded: usable but has documented limitations or performance concernsblocked: cannot complete required scenarios; follow-up issue requiredunsupported: not a supported target for this agent surfacenot-yet-run: still pendingEvidence requirements
Every completed row should include:
Acceptance criteria for this tracker
Non-goals
Related work