Skip to content

Cannot use thinking models like Qwen 3.6 27B - Nemoclaw validation request token length too small. #3341

@csanadpoda

Description

@csanadpoda

Description

NemoClaw incompatible with Qwen3.6 due to too small request token length.
Using:

  • NemoClaw/OpenClaw 0.38
  • vLLM 0.20.2
  • Qwen/Qwen3.6-27B
  • OpenAI-compatible endpoint

onboarding fails during inference validation with:

OPENCLAW_CONFIG_OK inference.local response did not contain choices[0].message.content
Qwen3.6 runs in thinking mode by default and returns:

{
  "id": "chatcmpl-9d8a2524b5f62ead",
  "object": "chat.completion",
  "created": 1778497346,
  "model": "Qwen/Qwen3.6-27B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": "Here's a thinking process:\n\n1. **Analyze User Input:**\n - User says: \"Reply with exactly: PONG\"\n "
      },
      "logprobs": null,
      **"finish_reason": "length",**
      "stop_reason": null,
      "token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 16,
    "total_tokens": 48,
    "completion_tokens": 32,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "prompt_token_ids": null,
  "kv_transfer_params": null
}

NemoClaw assumes:

choices[0].message.content != null
and rejects the response even though the endpoint is functioning correctly.

This is a compatibility issue between:

  • NemoClaw/OpenClaw response validation
  • Qwen3.6 thinking-mode behavior (can only be switched off at request time, not setup time)
  • vLLM OpenAI-compatible output format

Reproduction Steps

  1. Run in one terminal:
docker run --rm -it --gpus all \
  -e HF_TOKEN=$HF_TOKEN \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  vllm/vllm-openai \
    Qwen/Qwen3.6-27B \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 1 \
    --max-model-len 262144 \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --reasoning-parser qwen3 \
    --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

  1. Run in another terminal on the same machine curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
  2. Choose 3) Other OpenAI-compatible endpoint, http://host.docker.internal:8000/v1, dummy and Qwen/Qwen3.6-27B, then continue setup as required.
  3. At Step 7 of the Nemoclaw setup, you'll be greeted by this message:
  [7/8] Setting up OpenClaw inside sandbox
  ──────────────────────────────────────────────────
  ✓ OpenClaw gateway launched inside sandbox
  Verifying compatible endpoint through the messaging sandbox...
  ⚠ Gateway provider 'compatible-endpoint' did not report the selected endpoint URL.
    Continuing to the sandbox-side inference.local smoke check.
  Compatible endpoint sandbox smoke check failed.
  Telegram provider startup is not the root cause; inference.local failed.
  OPENCLAW_CONFIG_OK inference.local response did not contain choices[0].message.content: {"id": "chatcmpl-ac1e7493d920f2b2", "object": "chat.completion", "created": 1778494319, "model": "Qwen/Qwen3.6-27B", "choices": [{"index": 0, "message": {"role": "assistant", "content": null, "refusal": null, "annotations": null, "audio": null, "function_call": null, "tool_calls": [], "reasoning": "Thinking Process:\n\n1. **Analyze the Request:** The user wants a reply with the exact string \"PONG\".\n2. **Constraint"}, "logprobs": null, "finish_reason": "length", "stop_reason": null, "token_ids": null}], "service_tier": null, "system_fingerprint": null, "usage": {"prompt_tokens": 16, "total_tokens": 48, "completion_tokens": 32, "prompt_tokens_details": null}, "prompt_logprobs": null, "prompt_token_ids": null, "kv_transfer_params": null}

Despite the endpoint working otherwise perfectly. It just never populates choices[0].message.content because the model never finishes thinking, and generation stops due to "finish_reason": "length".

Environment

  • Linux spark-fce0 6.17.0-1014-nvidia # 14-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 17 19:01:40 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux
  • v22.22.2
  • Docker version 29.2.1, build a5c7197
  • nemoclaw v0.0.38

Debug Output

Logs

Checklist

  • I confirmed this bug is reproducible
  • I searched existing issues and this is not a duplicate

Metadata

Metadata

Assignees

Labels

area: integrationsThird-party service integration behavior
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions