Skip to content

feat: auto-discover LoRA adapters from models directory #14

@marksverdhei

Description

@marksverdhei

Summary

Add auto-discovery of LoRA adapter GGUF files alongside models, so that adapters placed in the --models-dir directory (or configured in --models-preset INI) are automatically detected and made available to the frontend and API without requiring explicit --lora CLI flags.

Related: #13 (frontend LoRA toggler depends on adapters being discoverable)

Can we deduce adapter↔model relationships from the files?

What metadata is available

LoRA adapter GGUF files contain:

Field Value Source
general.type "adapter" convert_lora_to_gguf.py:375
general.architecture e.g. "llama", "qwen2" convert_lora_to_gguf.py (inherited from base)
adapter.type "lora" convert_lora_to_gguf.py:376
adapter.lora.alpha float convert_lora_to_gguf.py:380
Tensor names + shapes e.g. blk.0.attn_q.lora_a From training

Model GGUF files contain:

Field Value
general.architecture e.g. "llama", "qwen2"
general.name e.g. "Llama 3.1 8B Instruct"
Tensor names + shapes Full model tensors

What's matchable

Match criterion Reliability Requires tensor load?
general.architecture must match Necessary but not sufficient — all Llama models share "llama" No (metadata only)
Tensor name compatibility Strong signal — adapter tensor names must exist in model Yes (header scan)
Tensor shape compatibility Definitive — dimension mismatch = incompatible Yes (header scan)

What's NOT in the metadata

  • No base model identifier — the conversion script reads base_model_name_or_path from adapter_config.json but does not embed it in the GGUF (convert_lora_to_gguf.py:335-345)
  • No base model hash/UUID — the GGUF spec defines general.base_model.{id}.uuid and general.base_model.{id}.name fields, and the writer has add_base_model_*() methods (gguf_writer.py:608-636), but neither convert_hf_to_gguf.py nor convert_lora_to_gguf.py actually writes them
  • No model size/layer count stored explicitly in adapter metadata

Conclusion

Architecture-level matching is feasible from metadata alone (read general.architecture from both files — cheap, no tensor loading). This narrows candidates but can't distinguish e.g. Llama-7B from Llama-70B.

Exact matching requires loading tensor headers from both files to compare names and shapes. This is what llama-adapter.cpp:330-368 already does at runtime — it throws "maybe wrong base model?" on mismatch.

Best practical approach for auto-discovery:

  1. Scan directory, identify adapters by general.type = "adapter" (metadata read only)
  2. Group adapters by general.architecture — show only adapters matching the loaded model's architecture
  3. Optionally validate tensor compatibility on first load attempt (the runtime already does this and gives clear errors)
  4. Long-term: contribute upstream to have convert_lora_to_gguf.py write general.base_model.0.name — the writer infrastructure already exists

Implementation Plan

Phase 1: Discovery in preset.cpp

Modify load_from_models_dir() (common/preset.cpp:382-445) to also scan for LoRA adapters:

  • Currently it scans for .gguf files and treats them all as models
  • Add GGUF metadata read for general.type — if "adapter", categorize as LoRA adapter instead of model
  • Also read general.architecture from adapter files for matching
  • Store discovered adapters in a separate list/map alongside models
  • May need a lightweight GGUF metadata reader (the full model loader is too heavy for scanning)

Key consideration: load_from_models_dir() currently does zero GGUF parsing — it only looks at filenames. Adding metadata reads means opening each file, which has performance implications for large directories. A naming convention fallback (e.g. files in a loras/ subdirectory, or *-lora-*.gguf pattern) could supplement or replace metadata scanning.

Phase 2: INI preset support

Add lora and lora-scaled as valid keys in the INI preset parser (common/preset.cpp:310-360):

[my-model]
model = /path/to/model.gguf
lora = /path/to/adapter1.gguf,/path/to/adapter2.gguf

This is straightforward since the INI parser already maps keys to CLI argument names, and --lora is already a valid CLI arg.

Phase 3: Router integration

Optional: --lora-dir flag

Add a dedicated --lora-dir PATH argument (parallel to --models-dir) for cases where adapters are stored separately from models. Would be added at common/arg.cpp around line 3030.

File Reference

Component File Key Lines
Model directory scanner common/preset.cpp 382-445
INI preset parser common/preset.cpp 310-360
LoRA adapter loading & validation src/llama-adapter.cpp 165-239 (metadata), 330-368 (tensor validation)
LoRA arch keys src/llama-arch.cpp 136-137, 318-322
GGUF adapter constants gguf-py/gguf/constants.py 76-85, 280-285
GGUF base_model writer methods (unused) gguf-py/gguf/gguf_writer.py 608-636
LoRA conversion (doesn't write base_model) convert_lora_to_gguf.py 335-345, 374-403
Router model management tools/server/server-models.cpp 242-375
LoRA CLI args common/arg.cpp 2473-2496
Router CLI args common/arg.cpp 3004-3030

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions