Misc. bug: Tensor parallelism causes loops

### Name and Version

❯ llama-cli --version
ggml_cuda_init: found 2 CUDA devices (Total VRAM: 64215 MiB):
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes, VRAM: 32106 MiB
  Device 1: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes, VRAM: 32109 MiB
version: 8740 (e34f04215)
built with GNU 13.3.0 for Linux x86_64



### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-cli

### Command line

```shell
llama-cli -hf unsloth/Qwen3-Coder-Next-GGUF:MXFP4_MOE --split-mode tensor --temp 1.0 --top-p 0.95 --top-k 40 --min-p 0.01 --repeat-penalty 1.0
```

### Problem description & steps to reproduce

Testing the new tensor parallelism with my two 5090's.

>llama-cli -hf unsloth/Qwen3-Coder-Next-GGUF:MXFP4_MOE --split-mode tensor --temp 1.0 --top-p 0.95 --top-k 40 --min-p 0.01 --repeat-penalty 1.0

Simple prompt:  `Write me a python snake game`

Causes looping.  Without --split-mode tensor, task completes easily.

Example:
```
food.penup()  # to prevent showing lines when turtle not using tracers
food.shapesize(0.5)  # size of the snake (0.5 is big) and by the snake (0.5 is big) and by the snake (0.5 is big) and by the snake (0.5 is big) and by the snake (0.5 is big) and by the snake (0.5 is big) and by the snake (0.5 is big) and by the snake (0.5 is big) and by the snake (0.5 is big) and by the snake (0.5 is big!0.5 is big) and by the snake (0.5 is big!0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0.5 is big?0
```

Running CUDA 13.2 and I compiled it myself

```
#!/bin/sh

rm -rf build
git pull

cmake -B build -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DGGML_CUDA=ON -DGGML_CUDA_FA=ON -DLLAMA_CURL=ON -DGGML_RPC=ON -DLLAMA_BUILD_BORINGSSL=ON -DLLAMA_BUILD_LIBRESSL=ON #-DGGML_VULKAN=1
cmake --build build --config Release -j $(nproc)
```

### First Bad Commit

One of these? https://github.com/ggml-org/llama.cpp/pull/19378

### Relevant log output

No logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Tensor parallelism causes loops #21703

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Misc. bug: Tensor parallelism causes loops #21703

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions