Cannot use thinking models like Qwen 3.6 27B - Nemoclaw validation request token length too small.

### Description

NemoClaw incompatible with Qwen3.6 due to too small request token length.
Using:

* NemoClaw/OpenClaw 0.38
* vLLM 0.20.2
* Qwen/Qwen3.6-27B
* OpenAI-compatible endpoint

onboarding fails during inference validation with:

`OPENCLAW_CONFIG_OK inference.local response did not contain choices[0].message.content
`
Qwen3.6 runs in thinking mode by default and returns:

```
{
  "id": "chatcmpl-9d8a2524b5f62ead",
  "object": "chat.completion",
  "created": 1778497346,
  "model": "Qwen/Qwen3.6-27B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": "Here's a thinking process:\n\n1. **Analyze User Input:**\n - User says: \"Reply with exactly: PONG\"\n "
      },
      "logprobs": null,
      **"finish_reason": "length",**
      "stop_reason": null,
      "token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 16,
    "total_tokens": 48,
    "completion_tokens": 32,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "prompt_token_ids": null,
  "kv_transfer_params": null
}

```
NemoClaw assumes:

`choices[0].message.content != null
`
and rejects the response even though the endpoint is functioning correctly.

This is a compatibility issue between:

* NemoClaw/OpenClaw response validation
* Qwen3.6 thinking-mode behavior (can only be switched off at request time, not setup time)
* vLLM OpenAI-compatible output format

### Reproduction Steps

1. Run in one terminal:
```
docker run --rm -it --gpus all \
  -e HF_TOKEN=$HF_TOKEN \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p 8000:8000 \
  vllm/vllm-openai \
    Qwen/Qwen3.6-27B \
    --host 0.0.0.0 \
    --port 8000 \
    --tensor-parallel-size 1 \
    --max-model-len 262144 \
    --trust-remote-code \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --reasoning-parser qwen3 \
    --speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'

```
2. Run in another terminal on the same machine `curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
`
3. Choose `3) Other OpenAI-compatible endpoint`, `http://host.docker.internal:8000/v1`, `dummy` and `Qwen/Qwen3.6-27B`, then continue setup as required.
4. At Step 7 of the Nemoclaw setup, you'll be greeted by this message:
```
  [7/8] Setting up OpenClaw inside sandbox
  ──────────────────────────────────────────────────
  ✓ OpenClaw gateway launched inside sandbox
  Verifying compatible endpoint through the messaging sandbox...
  ⚠ Gateway provider 'compatible-endpoint' did not report the selected endpoint URL.
    Continuing to the sandbox-side inference.local smoke check.
  Compatible endpoint sandbox smoke check failed.
  Telegram provider startup is not the root cause; inference.local failed.
  OPENCLAW_CONFIG_OK inference.local response did not contain choices[0].message.content: {"id": "chatcmpl-ac1e7493d920f2b2", "object": "chat.completion", "created": 1778494319, "model": "Qwen/Qwen3.6-27B", "choices": [{"index": 0, "message": {"role": "assistant", "content": null, "refusal": null, "annotations": null, "audio": null, "function_call": null, "tool_calls": [], "reasoning": "Thinking Process:\n\n1. **Analyze the Request:** The user wants a reply with the exact string \"PONG\".\n2. **Constraint"}, "logprobs": null, "finish_reason": "length", "stop_reason": null, "token_ids": null}], "service_tier": null, "system_fingerprint": null, "usage": {"prompt_tokens": 16, "total_tokens": 48, "completion_tokens": 32, "prompt_tokens_details": null}, "prompt_logprobs": null, "prompt_token_ids": null, "kv_transfer_params": null}
```
Despite the endpoint working otherwise perfectly. It just never populates `choices[0].message.content` because the model never finishes thinking, and generation stops due to `"finish_reason": "length"`.

### Environment

- Linux spark-fce0 6.17.0-1014-nvidia # 14-Ubuntu SMP PREEMPT_DYNAMIC Tue Mar 17 19:01:40 UTC 2026 aarch64 aarch64 aarch64 GNU/Linux
- v22.22.2
- Docker version 29.2.1, build a5c7197
- nemoclaw v0.0.38

### Debug Output

```shell

```

### Logs

```shell

```

### Checklist

- [x] I confirmed this bug is reproducible
- [x] I searched existing issues and this is not a duplicate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot use thinking models like Qwen 3.6 27B - Nemoclaw validation request token length too small. #3341

Description

Reproduction Steps

Environment

Debug Output

Logs

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Cannot use thinking models like Qwen 3.6 27B - Nemoclaw validation request token length too small. #3341

Description

Description

Reproduction Steps

Environment

Debug Output

Logs

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions