Skip to content

nightly-e2e: openclaw-inference-switch-e2e fails — z-ai/glm-5.1 missing model compat in inference set #3779

@hunglp6d

Description

@hunglp6d

Description

Problem Statement

The openclaw-inference-switch-e2e nightly job fails because nemoclaw inference set does not apply model-specific compatibility flags when switching to z-ai/glm-5.1. The inference.local PONG test passes (direct curl with max_tokens: 100), but the openclaw agent turn times out after 120 seconds with empty output (exit code 124).

The root cause is that patchOpenClawInferenceConfig() in inference-set.ts clears existing model compat (delete firstExistingModel.compat) and never consults the nemoclaw-blueprint/model-specific-setup/ registry. For nvidia-prod models, getSandboxInferenceConfig() returns inferenceCompat: null, so z-ai/glm-5.1 ends up without the maxTokensField: "max_tokens" flag it needs. OpenClaw then sends max_completion_tokens — a parameter the NVIDIA-proxied GLM endpoint does not support — and the agent hangs indefinitely.

Proposed Design

Add a glm-5.1-managed-inference.json manifest under nemoclaw-blueprint/model-specific-setup/openclaw/ declaring maxTokensField: "max_tokens" and requiresStringContent: true for z-ai/glm-5.1. Teach inference-set.ts to read matching model-specific manifests at switch time via a new loadOpenClawModelCompat() function in src/lib/inference/model-specific-setup.ts, ensuring runtime switches receive the same compat flags that generate-openclaw-config.py applies at build time.

See fix PR: #3778

Alternatives Considered

  • Hardcode glm-5.1 compat in getSandboxInferenceConfig() — rejected because the model-specific setup registry was designed specifically to keep compat declarations out of code conditionals.
  • Build-time-only fix (manifest without inference-set.ts change) — insufficient because the test switches inference at runtime; the manifest would only help during initial sandbox build.

Category

config_error

Reproduction Steps

  1. Re-run openclaw-inference-switch-e2e on commit 5a03166 via:
    gh workflow run nightly-e2e.yaml --repo NVIDIA/NemoClaw --ref main \
      -f jobs=openclaw-inference-switch-e2e
    
  2. Observe Phase 4 "Live requests after switch" — the check_openclaw_agent_turn assertion fails with exit 124 (timeout), empty reply.

Environment

  • OS: Ubuntu 24.04.4 LTS (GitHub-hosted runner ubuntu-latest, image 20260513.135.3)
  • Node.js: v22.22.3 (installed via nvm during test)
  • Docker: GitHub-hosted runner default
  • NemoClaw: commit 5a031660ffc9d945c9e25d8cfe26409e40b18af3 (main)
  • OpenClaw: 2026.4.24
  • OpenShell: 0.0.39
  • Other: Nightly run ID 26068303685

Debug Output

=== Phase 4: Live requests after switch ===
  PASS: Sandbox inference.local returned PONG with z-ai/glm-5.1
  FAIL: OpenClaw agent turn failed after switch (exit 124); reply='', raw=''

=== Phase 5: Cleanup ===
  Shared NemoClaw gateway preserved. Re-run 'openshell gateway remove nemoclaw' to remove it,
  or pass '--cleanup-gateway' / set NEMOCLAW_CLEANUP_GATEWAY=1 next time. (#2166)
  ✓ Sandbox 'e2e-openclaw-inference-switch' destroyed
  PASS: Sandbox e2e-openclaw-inference-switch removed

========================================
  OpenClaw inference switch E2E Results:
    Passed:  14
    Failed:  1
    Skipped: 1
    Total:   16
========================================

  1 test(s) failed.
Process completed with exit code 1.

Logs

N/A

Checklist

  • I have confirmed this issue is reproducible (linked workflow run 26068303685 demonstrates the failure)
  • I have searched for existing issues to avoid duplicates

Suggested Labels (apply manually after triage): nightly-e2e, auto-diagnosed, ci-failure, VRDC

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: inferenceInference routing, serving, model selection, or outputsintegration: openclawOpenClaw integration behavior
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions