Eval bug: Memory leak on RPC CUDA backend

### Name and Version

CUDA version 8.6, build b8571

### Operating systems

Linux

### GGML backends

RPC

### Hardware

RTX 3060

### Models

OpenAI ChatGpt OSS

### Problem description & steps to reproduce

When model layers are split across local and RPC backends the RPC one leaks memory, periodically writing log message: ggml_backend_cuda_graph_compute: CUDA graph warmup complete. The local backends work without such messages and memory leaks. Is the message related to the leak, or not, isn't known, but it is a visible difference. To reproduce the case it is enough to run same task repeatedly without restarting the backend, then the warmup messages appear and Nvidia tools show memory increase of a few megabytes each time. Nothing similar is happened on local backend.

Observation: when local backend is stopped (aborted) local devices show empty memory, but the remote device shows some volume occupied. It seems the volume size closely corresponds to the leaked memory.

Remote command line: ./rpc-server -c -p port -H address
Local parameters: --tensor-split 37,28 --device CUDA0,RPC0
Environment variable: LLAMA_ARG_RPC=host:port

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Memory leak on RPC CUDA backend #21265

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: Memory leak on RPC CUDA backend #21265

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions