Skip to content

fix(inference): apply model-specific compat during inference set for z-ai/glm-5.1#3778

Closed
hunglp6d wants to merge 1 commit into
mainfrom
fix/nightly-e2e-glm-compat-inference-set-5a03166
Closed

fix(inference): apply model-specific compat during inference set for z-ai/glm-5.1#3778
hunglp6d wants to merge 1 commit into
mainfrom
fix/nightly-e2e-glm-compat-inference-set-5a03166

Conversation

@hunglp6d

@hunglp6d hunglp6d commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

nemoclaw inference set clears model compat flags (delete firstExistingModel.compat)
but never consults the nemoclaw-blueprint/model-specific-setup/ registry to re-apply
them for the target model. When switching to z-ai/glm-5.1, the agent turn hangs
because OpenClaw sends max_completion_tokens — a parameter the NVIDIA-proxied GLM
endpoint does not support — instead of max_tokens.

Changes

  • New: nemoclaw-blueprint/model-specific-setup/openclaw/glm-5.1-managed-inference.json
    — declares maxTokensField: "max_tokens" and requiresStringContent: true for
    z-ai/glm-5.1 through managed inference.local, matching the pattern established
    by the existing kimi-k2.6-managed-inference.json manifest.

  • New: src/lib/inference/model-specific-setup.ts — reads model-specific setup
    manifests from the blueprint registry on disk and returns matching OpenClaw compat
    effects. Mirrors the Python registry logic in generate-openclaw-config.py.

  • Modified: src/lib/actions/inference-set.ts — after patching openclaw.json,
    calls loadOpenClawModelCompat() to apply model-specific compat from matching
    manifests. This ensures runtime switches receive the same compat flags that
    generate-openclaw-config.py applies at build time.

Root cause

The openclaw-inference-switch-e2e nightly job switches a running OpenClaw sandbox
from nvidia/nemotron-3-super-120b-a12b to nvidia-prod / z-ai/glm-5.1. The
inference.local PONG test passes (direct curl with max_tokens), but the
openclaw agent turn times out after 120 s with empty output (exit 124) because
OpenClaw's transport layer sends max_completion_tokens by default for models
without a maxTokensField compat override.

Evidence

Signal Result
check_sandbox_inference (PONG via curl with max_tokens: 100) ✅ PASS
check_openclaw_agent_turn (openclaw agent through gateway) ❌ exit 124, empty reply
Kimi K2.6 manifest with identical compat flags exists and passing
inference-set.ts reads model-specific registry ❌ missing

Validation

A focused custom-e2e.yaml validation workflow was prepared but could not be pushed
because the available token lacks the workflow scope required to create workflow
files. The fix can be validated by re-running openclaw-inference-switch-e2e on this
branch:

gh workflow run nightly-e2e.yaml --repo NVIDIA/NemoClaw \
  --ref fix/nightly-e2e-glm-compat-inference-set-5a03166 \
  -f jobs=openclaw-inference-switch-e2e

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Signed-off-by: Hung Le hple@nvidia.com

Fixes #3779

…z-ai/glm-5.1

The `nemoclaw inference set` command clears model compat flags but never
consults the blueprint model-specific-setup registry to re-apply them.
When switching to z-ai/glm-5.1 (which requires maxTokensField=max_tokens
and requiresStringContent=true), the agent turn hangs because OpenClaw
sends max_completion_tokens — a parameter the NVIDIA-proxied GLM endpoint
does not support.

Add a glm-5.1-managed-inference manifest and teach inference-set.ts to
read matching manifests at switch time so runtime switches receive the
same compat flags that generate-openclaw-config.py applies at build time.

Fixes openclaw-inference-switch-e2e nightly failure (run 26068303685).

Signed-off-by: Hung Le <hple@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented May 19, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a2aeebb3-c39e-4212-9fbe-0733520dd365

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/nightly-e2e-glm-compat-inference-set-5a03166

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: openclaw-inference-switch-e2e
Optional E2E: kimi-inference-compat-e2e, messaging-compatible-endpoint-e2e

Dispatch hint: openclaw-inference-switch-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • openclaw-inference-switch-e2e (~45 min): Directly exercises nemoclaw inference set against a running OpenClaw sandbox, verifies OpenShell route, openclaw.json patching, config hash, and live requests. The existing script defaults NEMOCLAW_SWITCH_MODEL to z-ai/glm-5.1, matching this PR’s new GLM-5.1 compat manifest.

Optional E2E

  • kimi-inference-compat-e2e (~45 min): Useful regression check for the existing OpenClaw model-specific setup registry path because the new runtime loader reads the same manifest directory that already contains Kimi K2.6 compatibility assets.
  • messaging-compatible-endpoint-e2e (~45 min): Adjacent confidence for managed inference.local OpenAI-compatible provider shape and OpenClaw agent requests through the gateway provider route, though it does not specifically validate GLM-5.1 model compat.

New E2E recommendations

  • openclaw-model-compat (high): The existing OpenClaw inference switch E2E uses GLM-5.1 by default and performs live request validation, but it does not explicitly assert that openclaw.json contains the merged compat flags from the model-specific setup manifest after nemoclaw inference set.
    • Suggested test: Extend test/e2e/test-openclaw-inference-switch.sh to assert the switched provider model has compat.requiresStringContent=true and compat.maxTokensField="max_tokens" for z-ai/glm-5.1.
  • model-specific-setup-runtime-loader (medium): The new loadOpenClawModelCompat() filesystem lookup has several deployment-sensitive search paths and merge semantics that are not covered by an existing E2E name.
    • Suggested test: Add a lightweight runtime-loader E2E or expand the OpenClaw inference switch E2E to prove model-specific setup manifests are available both from source checkout and installed/compiled CLI locations.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: openclaw-inference-switch-e2e

@wscurran wscurran added ci-failure Auto-created by nemoclaw-diagnosis skill configuration integration: openclaw OpenClaw integration behavior and removed ci-failure Auto-created by nemoclaw-diagnosis skill labels May 19, 2026
@hunglp6d

Copy link
Copy Markdown
Contributor Author

Closing — openclaw-inference-switch-e2e is green on the latest nightly. Treating as flaky.

@hunglp6d hunglp6d closed this May 25, 2026
@wscurran wscurran added area: inference Inference routing, serving, model selection, or outputs bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed fix feature PR adds or expands user-visible functionality labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: inference Inference routing, serving, model selection, or outputs bug-fix PR fixes a bug or regression integration: openclaw OpenClaw integration behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nightly-e2e: openclaw-inference-switch-e2e fails — z-ai/glm-5.1 missing model compat in inference set

2 participants