fix(inference): apply model-specific compat during inference set for z-ai/glm-5.1 by hunglp6d · Pull Request #3778 · NVIDIA/NemoClaw

hunglp6d · 2026-05-19T01:34:38Z

Summary

nemoclaw inference set clears model compat flags (delete firstExistingModel.compat)
but never consults the nemoclaw-blueprint/model-specific-setup/ registry to re-apply
them for the target model. When switching to z-ai/glm-5.1, the agent turn hangs
because OpenClaw sends max_completion_tokens — a parameter the NVIDIA-proxied GLM
endpoint does not support — instead of max_tokens.

Changes

New: nemoclaw-blueprint/model-specific-setup/openclaw/glm-5.1-managed-inference.json
— declares maxTokensField: "max_tokens" and requiresStringContent: true for
z-ai/glm-5.1 through managed inference.local, matching the pattern established
by the existing kimi-k2.6-managed-inference.json manifest.
New: src/lib/inference/model-specific-setup.ts — reads model-specific setup
manifests from the blueprint registry on disk and returns matching OpenClaw compat
effects. Mirrors the Python registry logic in generate-openclaw-config.py.
Modified: src/lib/actions/inference-set.ts — after patching openclaw.json,
calls loadOpenClawModelCompat() to apply model-specific compat from matching
manifests. This ensures runtime switches receive the same compat flags that
generate-openclaw-config.py applies at build time.

Root cause

The openclaw-inference-switch-e2e nightly job switches a running OpenClaw sandbox
from nvidia/nemotron-3-super-120b-a12b to nvidia-prod / z-ai/glm-5.1. The
inference.local PONG test passes (direct curl with max_tokens), but the
openclaw agent turn times out after 120 s with empty output (exit 124) because
OpenClaw's transport layer sends max_completion_tokens by default for models
without a maxTokensField compat override.

Evidence

Signal	Result
`check_sandbox_inference` (PONG via curl with `max_tokens: 100`)	✅ PASS
`check_openclaw_agent_turn` (openclaw agent through gateway)	❌ exit 124, empty reply
Kimi K2.6 manifest with identical compat flags	exists and passing
`inference-set.ts` reads model-specific registry	❌ missing

Validation

A focused custom-e2e.yaml validation workflow was prepared but could not be pushed
because the available token lacks the workflow scope required to create workflow
files. The fix can be validated by re-running openclaw-inference-switch-e2e on this
branch:

gh workflow run nightly-e2e.yaml --repo NVIDIA/NemoClaw \
  --ref fix/nightly-e2e-glm-compat-inference-set-5a03166 \
  -f jobs=openclaw-inference-switch-e2e

Original failing run: 26068303685 on 5a03166

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Signed-off-by: Hung Le hple@nvidia.com

Fixes #3779

…z-ai/glm-5.1 The `nemoclaw inference set` command clears model compat flags but never consults the blueprint model-specific-setup registry to re-apply them. When switching to z-ai/glm-5.1 (which requires maxTokensField=max_tokens and requiresStringContent=true), the agent turn hangs because OpenClaw sends max_completion_tokens — a parameter the NVIDIA-proxied GLM endpoint does not support. Add a glm-5.1-managed-inference manifest and teach inference-set.ts to read matching manifests at switch time so runtime switches receive the same compat flags that generate-openclaw-config.py applies at build time. Fixes openclaw-inference-switch-e2e nightly failure (run 26068303685). Signed-off-by: Hung Le <hple@nvidia.com>

copy-pr-bot · 2026-05-19T01:34:42Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-19T01:34:47Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a2aeebb3-c39e-4212-9fbe-0733520dd365

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/nightly-e2e-glm-compat-inference-set-5a03166

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-19T01:36:28Z

E2E Advisor Recommendation

Required E2E: openclaw-inference-switch-e2e
Optional E2E: kimi-inference-compat-e2e, messaging-compatible-endpoint-e2e

Dispatch hint: openclaw-inference-switch-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

openclaw-inference-switch-e2e (~45 min): Directly exercises nemoclaw inference set against a running OpenClaw sandbox, verifies OpenShell route, openclaw.json patching, config hash, and live requests. The existing script defaults NEMOCLAW_SWITCH_MODEL to z-ai/glm-5.1, matching this PR’s new GLM-5.1 compat manifest.

Optional E2E

kimi-inference-compat-e2e (~45 min): Useful regression check for the existing OpenClaw model-specific setup registry path because the new runtime loader reads the same manifest directory that already contains Kimi K2.6 compatibility assets.
messaging-compatible-endpoint-e2e (~45 min): Adjacent confidence for managed inference.local OpenAI-compatible provider shape and OpenClaw agent requests through the gateway provider route, though it does not specifically validate GLM-5.1 model compat.

New E2E recommendations

openclaw-model-compat (high): The existing OpenClaw inference switch E2E uses GLM-5.1 by default and performs live request validation, but it does not explicitly assert that openclaw.json contains the merged compat flags from the model-specific setup manifest after nemoclaw inference set.
- Suggested test: Extend test/e2e/test-openclaw-inference-switch.sh to assert the switched provider model has compat.requiresStringContent=true and compat.maxTokensField="max_tokens" for z-ai/glm-5.1.
model-specific-setup-runtime-loader (medium): The new loadOpenClawModelCompat() filesystem lookup has several deployment-sensitive search paths and merge semantics that are not covered by an existing E2E name.
- Suggested test: Add a lightweight runtime-loader E2E or expand the OpenClaw inference switch E2E to prove model-specific setup manifests are available both from source checkout and installed/compiled CLI locations.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: openclaw-inference-switch-e2e

hunglp6d · 2026-05-25T08:23:29Z

Closing — openclaw-inference-switch-e2e is green on the latest nightly. Treating as flaky.

hunglp6d mentioned this pull request May 19, 2026

nightly-e2e: openclaw-inference-switch-e2e fails — z-ai/glm-5.1 missing model compat in inference set #3779

Closed

2 tasks

wscurran mentioned this pull request May 19, 2026

Architecture: organize model-specific sandbox setup #3120

Closed

wscurran added ci-failure Auto-created by nemoclaw-diagnosis skill configuration integration: openclaw OpenClaw integration behavior and removed ci-failure Auto-created by nemoclaw-diagnosis skill labels May 19, 2026

wscurran mentioned this pull request May 19, 2026

Model performance / capability audit across supported agents #3123

Open

hunglp6d closed this May 25, 2026

wscurran added area: inference Inference routing, serving, model selection, or outputs bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed fix feature PR adds or expands user-visible functionality labels Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference): apply model-specific compat during inference set for z-ai/glm-5.1#3778

fix(inference): apply model-specific compat during inference set for z-ai/glm-5.1#3778
hunglp6d wants to merge 1 commit into
mainfrom
fix/nightly-e2e-glm-compat-inference-set-5a03166

hunglp6d commented May 19, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026

Review skipped

Uh oh!

github-actions Bot commented May 19, 2026

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

hunglp6d commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hunglp6d commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Root cause

Evidence

Validation

Type of Change

Uh oh!

copy-pr-bot Bot commented May 19, 2026

Uh oh!

coderabbitai Bot commented May 19, 2026

Review skipped

Uh oh!

github-actions Bot commented May 19, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

hunglp6d commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hunglp6d commented May 19, 2026 •

edited

Loading