Skip to content

Eval bug: auto fit estimation does not account for mmproj GPU memory, causing OOM with multimodal models #19980

@fantblue

Description

@fantblue

Problem

When --fit on (default), llama_params_fit estimates GPU memory usage to decide how many layers to
offload. However, it only accounts for:

  • LLM main model weights
  • KV cache
  • Compute buffers

When a multimodal projector (mmproj) is present — either auto-detected via -hf or manually specified
via -mm/--mmproj — its GPU memory is loaded after the fit estimation runs, consuming the free
memory margin and causing OOM (ggml_cuda_pool_alloc / CUDA out of memory).

Steps to reproduce

# With any multimodal model that auto-detects mmproj via -hf,
# on a GPU where fit needs to reduce n_gpu_layers or n_ctx:
llama-server -hf Qwen/Qwen3.5-35B-A3B-gguf -c 65536

# Or with manual mmproj:
llama-server -m model.gguf -mm mmproj.gguf -c 65536

The OOM is probabilistic — it depends on GPU memory fragmentation and how close the fit result is to the
VRAM limit.

Workaround

Increase margin manually:

llama-server -hf Qwen/Qwen3.5-35B-A3B-gguf -c 65536 --fit-margin 2048

Or disable mmproj GPU offload:

llama-server -hf Qwen/Qwen3.5-35B-A3B-gguf -c 65536 --no-mmproj-offload

Possible Root cause

The initialization order is:

  1. common_init_result constructor runs (common/common.cpp:1046)
  2. Inside it, llama_params_fit() estimates GPU memory and decides n_gpu_layers
    (common/common.cpp:1053)
  3. LLM model is loaded to GPU based on the fitted parameters (common/common.cpp:1061)
  4. Constructor returns
  5. Later, mmproj is loaded to GPU via mtmd_init_from_file():
    • In server: server-context.cpp:693
    • In mtmd-cli: mtmd-cli.cpp:120init_vision_context()

llama_params_fit_impl (src/llama.cpp:159) has no knowledge of mmproj — it doesn't take any
mmproj-related parameters. Its margins parameter (default 1024 MiB per device) is the only buffer, and
mmproj can easily exceed it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions