Misc. bug: Router mode WebUI – model selection does not update correctly when unloading + loading a new model mid-chat

### Name and Version

llama-server --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 12287 MiB):
  Device 0: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes, VRAM: 12287 MiB
load_backend: loaded RPC backend from g:\Llamacpp\bin\ggml-rpc.dll
load_backend: loaded CPU backend from g:\Llamacpp\bin\ggml-cpu-haswell.dll
version: 8709 (85d482e6b)
built with Clang 21.1.8 for Windows AMD64

### Operating systems

Windows

### Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

### Command line

```shell

```

### Problem description & steps to reproduce

### Description

In **llama-server router mode** (using the built-in WebUI), switching models mid-conversation by unloading the current model and loading a new one does **not** work as expected.

The UI is highly confusing in this scenario, especially on low-VRAM setups like an RTX 3060 12 GB where you usually cannot keep two large models loaded at the same time.

### Steps to Reproduce

1. Start `llama-server` in router mode (e.g. with a low `--models-max` value or limited VRAM).
2. In the WebUI, load and chat with Model A.
3. Unload Model A.
4. Select and start loading Model B.
5. While Model B is loading (or right after it finishes), type a new message and send it.

**What actually happens:**

- During loading of Model B, the model selector **correctly shows Model B** (sometimes with a "loading..." indicator).
- As soon as Model B finishes loading, the selector **silently jumps back** to the previously unloaded Model A.
- Any message sent during or right after loading is **automatically routed to Model A** instead of the newly loaded Model B. This causes Model A to be reloaded.
- Result: You suddenly have **two large models loaded simultaneously**, exactly what you wanted to avoid. Responses become extremely slow.

**Expected behavior:**

- Once Model B is selected and starts loading (or finishes loading), it should become the active model for the current chat.
- The selector should stay on Model B after loading completes.
- Any new message should be sent to the **currently selected model** in the UI (Model B), not silently redirected to the old one.

It is **not** obvious that you have to manually re-select Model B in the dropdown after it has fully loaded. Even though I now know this behavior, it still catches me repeatedly because it is highly **counter-intuitive**.

During the loading phase the UI visually suggests that Model B is already active, so users naturally assume the next message will go to Model B.

### Motivation / Use Cases

This workflow is very common on hardware with limited VRAM (RTX 3060 12 GB):

- Quickly testing different models in the same conversation
- Switching to a smarter (but slower) model when the current one is struggling
- Switching to a faster/smaller model when speed matters more than quality

Having to manually re-select the model after every load defeats the convenience of the Load/Unload buttons in router mode.

### First Bad Commit

_No response_

### Relevant log output




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Router mode WebUI – model selection does not update correctly when unloading + loading a new model mid-chat #21626

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Description

Steps to Reproduce

Motivation / Use Cases

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Misc. bug: Router mode WebUI – model selection does not update correctly when unloading + loading a new model mid-chat #21626

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Description

Steps to Reproduce

Motivation / Use Cases

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions