fix(inference): apply model-specific compat during inference set for z-ai/glm-5.1#3778
fix(inference): apply model-specific compat during inference set for z-ai/glm-5.1#3778hunglp6d wants to merge 1 commit into
Conversation
…z-ai/glm-5.1 The `nemoclaw inference set` command clears model compat flags but never consults the blueprint model-specific-setup registry to re-apply them. When switching to z-ai/glm-5.1 (which requires maxTokensField=max_tokens and requiresStringContent=true), the agent turn hangs because OpenClaw sends max_completion_tokens — a parameter the NVIDIA-proxied GLM endpoint does not support. Add a glm-5.1-managed-inference manifest and teach inference-set.ts to read matching manifests at switch time so runtime switches receive the same compat flags that generate-openclaw-config.py applies at build time. Fixes openclaw-inference-switch-e2e nightly failure (run 26068303685). Signed-off-by: Hung Le <hple@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
|
Closing — openclaw-inference-switch-e2e is green on the latest nightly. Treating as flaky. |
Summary
nemoclaw inference setclears model compat flags (delete firstExistingModel.compat)but never consults the
nemoclaw-blueprint/model-specific-setup/registry to re-applythem for the target model. When switching to
z-ai/glm-5.1, the agent turn hangsbecause OpenClaw sends
max_completion_tokens— a parameter the NVIDIA-proxied GLMendpoint does not support — instead of
max_tokens.Changes
New:
nemoclaw-blueprint/model-specific-setup/openclaw/glm-5.1-managed-inference.json— declares
maxTokensField: "max_tokens"andrequiresStringContent: trueforz-ai/glm-5.1through managedinference.local, matching the pattern establishedby the existing
kimi-k2.6-managed-inference.jsonmanifest.New:
src/lib/inference/model-specific-setup.ts— reads model-specific setupmanifests from the blueprint registry on disk and returns matching OpenClaw compat
effects. Mirrors the Python registry logic in
generate-openclaw-config.py.Modified:
src/lib/actions/inference-set.ts— after patchingopenclaw.json,calls
loadOpenClawModelCompat()to apply model-specific compat from matchingmanifests. This ensures runtime switches receive the same compat flags that
generate-openclaw-config.pyapplies at build time.Root cause
The
openclaw-inference-switch-e2enightly job switches a running OpenClaw sandboxfrom
nvidia/nemotron-3-super-120b-a12btonvidia-prod / z-ai/glm-5.1. Theinference.local PONG test passes (direct curl with
max_tokens), but theopenclaw agentturn times out after 120 s with empty output (exit 124) becauseOpenClaw's transport layer sends
max_completion_tokensby default for modelswithout a
maxTokensFieldcompat override.Evidence
check_sandbox_inference(PONG via curl withmax_tokens: 100)check_openclaw_agent_turn(openclaw agent through gateway)inference-set.tsreads model-specific registryValidation
A focused
custom-e2e.yamlvalidation workflow was prepared but could not be pushedbecause the available token lacks the
workflowscope required to create workflowfiles. The fix can be validated by re-running
openclaw-inference-switch-e2eon thisbranch:
5a03166Type of Change
Signed-off-by: Hung Le hple@nvidia.com
Fixes #3779