[Bug]: TypeError: argument 'id': StreamInput must be either an integer or a list of integers

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Collecting environment information...
==============================
        System Info
==============================
OS                           : * (Final) (x86_64)
GCC version                  : (GCC) 9.4.0
Clang version                : 18.1.8 (Red Hat 18.1.8-1.module+el8.10.0+703+ec7b33ba)
CMake version                : version 4.1.0
Libc version                 : glibc-2.28

==============================
       PyTorch Info
==============================
PyTorch version              : 2.8.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27) [GCC 11.2.0] (64-bit runtime)
Python platform              : Linux-5.4.119-19.0009.56-x86_64-with-glibc2.28

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration :
GPU 0: NVIDIA H20
GPU 1: NVIDIA H20
GPU 2: NVIDIA H20
GPU 3: NVIDIA H20
GPU 4: NVIDIA H20
GPU 5: NVIDIA H20
GPU 6: NVIDIA H20
GPU 7: NVIDIA H20

Nvidia driver version        : 570.158.01
cuDNN version                : Probably one of the following:
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn.so.9
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              384
On-line CPU(s) list: 0-383
Thread(s) per core:  2
Core(s) per socket:  96
Socket(s):           2
NUMA node(s):        2
Vendor ID:           AuthenticAMD
CPU family:          25
Model:               17
Model name:          AMD EPYC 9K84 96-Core Processor
Stepping:            1
CPU MHz:             3687.441
CPU max MHz:         2600.0000
CPU min MHz:         1500.0000
BogoMIPS:            5200.42
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            32768K
NUMA node0 CPU(s):   0-95,192-287
NUMA node1 CPU(s):   96-191,288-383

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] torch==2.8.0
[pip3] transformers==4.57.0.dev0
[pip3] triton==3.4.0

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.10.2 (also tested 0.11.0)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY
```

</details>


### 🐛 Describe the bug

When running inference with **Qwen3-Next-80B-A3B-Instruct** model using vLLM V1 engine, a `TypeError` occurs during token generation:

```
TypeError: argument 'id': StreamInput must be either an integer or a list of integers
```

**Critical constraint**: Qwen3-Next model **requires V1 engine** (has assertion `AssertionError: Qwen3Next requires VLLM_USE_V1`), so V0 engine cannot be used as a workaround.

**Error Location**: `/vllm/v1/engine/detokenizer.py`, Line 237

**Full Stack Trace**:
```python
Traceback (most recent call last):
  File "/vllm/v1/engine/output_processor.py", line 420, in process_outputs
    stop_string = req_state.detokenizer.update(
  File "/vllm/v1/engine/detokenizer.py", line 119, in update
    self.output_text += self.decode_next(new_token_id)
  File "/vllm/v1/engine/detokenizer.py", line 219, in decode_next
    token = self._protected_step(next_token_id)
  File "/vllm/v1/engine/detokenizer.py", line 237, in _protected_step
    token = self.stream.step(self.tokenizer, next_token_id)
TypeError: argument 'id': StreamInput must be either an integer or a list of integers
```

### Investigation & Debug Findings

**Debug Output**: Added logging to check the actual type of `next_token_id`:
```python
print(f"Type: {type(next_token_id)}, isinstance(int): {isinstance(next_token_id, int)}")
# Output: Type: <class 'int'>, isinstance(int): True
```

**Puzzling finding**: The value is already a Python native `int`, yet `stream.step()` still rejects it with the TypeError.

**Attempted Fixes** (All Failed):

1. **Type conversion with `.item()`**:
```python
if hasattr(next_token_id, 'item'):
    next_token_id = int(next_token_id.item())
```

2. **Explicit int() conversion**:
```python
next_token_id = int(next_token_id)
```

3. **Using operator.index()**:
```python
import operator
next_token_id = operator.index(next_token_id)
```

All attempts failed with the same error.


### Related Issues

Possibly related to:
- #26071
- #25821



### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: TypeError: argument 'id': StreamInput must be either an integer or a list of integers #26438

Your current environment

🐛 Describe the bug

Investigation & Debug Findings

Related Issues

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: TypeError: argument 'id': StreamInput must be either an integer or a list of integers #26438

Description

Your current environment

🐛 Describe the bug

Investigation & Debug Findings

Related Issues

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions