Summary
Add auto-discovery of LoRA adapter GGUF files alongside models, so that adapters placed in the --models-dir directory (or configured in --models-preset INI) are automatically detected and made available to the frontend and API without requiring explicit --lora CLI flags.
Related: #13 (frontend LoRA toggler depends on adapters being discoverable)
Can we deduce adapter↔model relationships from the files?
What metadata is available
LoRA adapter GGUF files contain:
| Field |
Value |
Source |
general.type |
"adapter" |
convert_lora_to_gguf.py:375 |
general.architecture |
e.g. "llama", "qwen2" |
convert_lora_to_gguf.py (inherited from base) |
adapter.type |
"lora" |
convert_lora_to_gguf.py:376 |
adapter.lora.alpha |
float |
convert_lora_to_gguf.py:380 |
| Tensor names + shapes |
e.g. blk.0.attn_q.lora_a |
From training |
Model GGUF files contain:
| Field |
Value |
general.architecture |
e.g. "llama", "qwen2" |
general.name |
e.g. "Llama 3.1 8B Instruct" |
| Tensor names + shapes |
Full model tensors |
What's matchable
| Match criterion |
Reliability |
Requires tensor load? |
general.architecture must match |
Necessary but not sufficient — all Llama models share "llama" |
No (metadata only) |
| Tensor name compatibility |
Strong signal — adapter tensor names must exist in model |
Yes (header scan) |
| Tensor shape compatibility |
Definitive — dimension mismatch = incompatible |
Yes (header scan) |
What's NOT in the metadata
- No base model identifier — the conversion script reads
base_model_name_or_path from adapter_config.json but does not embed it in the GGUF (convert_lora_to_gguf.py:335-345)
- No base model hash/UUID — the GGUF spec defines
general.base_model.{id}.uuid and general.base_model.{id}.name fields, and the writer has add_base_model_*() methods (gguf_writer.py:608-636), but neither convert_hf_to_gguf.py nor convert_lora_to_gguf.py actually writes them
- No model size/layer count stored explicitly in adapter metadata
Conclusion
Architecture-level matching is feasible from metadata alone (read general.architecture from both files — cheap, no tensor loading). This narrows candidates but can't distinguish e.g. Llama-7B from Llama-70B.
Exact matching requires loading tensor headers from both files to compare names and shapes. This is what llama-adapter.cpp:330-368 already does at runtime — it throws "maybe wrong base model?" on mismatch.
Best practical approach for auto-discovery:
- Scan directory, identify adapters by
general.type = "adapter" (metadata read only)
- Group adapters by
general.architecture — show only adapters matching the loaded model's architecture
- Optionally validate tensor compatibility on first load attempt (the runtime already does this and gives clear errors)
- Long-term: contribute upstream to have
convert_lora_to_gguf.py write general.base_model.0.name — the writer infrastructure already exists
Implementation Plan
Phase 1: Discovery in preset.cpp
Modify load_from_models_dir() (common/preset.cpp:382-445) to also scan for LoRA adapters:
- Currently it scans for
.gguf files and treats them all as models
- Add GGUF metadata read for
general.type — if "adapter", categorize as LoRA adapter instead of model
- Also read
general.architecture from adapter files for matching
- Store discovered adapters in a separate list/map alongside models
- May need a lightweight GGUF metadata reader (the full model loader is too heavy for scanning)
Key consideration: load_from_models_dir() currently does zero GGUF parsing — it only looks at filenames. Adding metadata reads means opening each file, which has performance implications for large directories. A naming convention fallback (e.g. files in a loras/ subdirectory, or *-lora-*.gguf pattern) could supplement or replace metadata scanning.
Phase 2: INI preset support
Add lora and lora-scaled as valid keys in the INI preset parser (common/preset.cpp:310-360):
[my-model]
model = /path/to/model.gguf
lora = /path/to/adapter1.gguf,/path/to/adapter2.gguf
This is straightforward since the INI parser already maps keys to CLI argument names, and --lora is already a valid CLI arg.
Phase 3: Router integration
Optional: --lora-dir flag
Add a dedicated --lora-dir PATH argument (parallel to --models-dir) for cases where adapters are stored separately from models. Would be added at common/arg.cpp around line 3030.
File Reference
| Component |
File |
Key Lines |
| Model directory scanner |
common/preset.cpp |
382-445 |
| INI preset parser |
common/preset.cpp |
310-360 |
| LoRA adapter loading & validation |
src/llama-adapter.cpp |
165-239 (metadata), 330-368 (tensor validation) |
| LoRA arch keys |
src/llama-arch.cpp |
136-137, 318-322 |
| GGUF adapter constants |
gguf-py/gguf/constants.py |
76-85, 280-285 |
| GGUF base_model writer methods (unused) |
gguf-py/gguf/gguf_writer.py |
608-636 |
| LoRA conversion (doesn't write base_model) |
convert_lora_to_gguf.py |
335-345, 374-403 |
| Router model management |
tools/server/server-models.cpp |
242-375 |
| LoRA CLI args |
common/arg.cpp |
2473-2496 |
| Router CLI args |
common/arg.cpp |
3004-3030 |
Summary
Add auto-discovery of LoRA adapter GGUF files alongside models, so that adapters placed in the
--models-dirdirectory (or configured in--models-presetINI) are automatically detected and made available to the frontend and API without requiring explicit--loraCLI flags.Related: #13 (frontend LoRA toggler depends on adapters being discoverable)
Can we deduce adapter↔model relationships from the files?
What metadata is available
LoRA adapter GGUF files contain:
general.type"adapter"convert_lora_to_gguf.py:375general.architecture"llama","qwen2"convert_lora_to_gguf.py(inherited from base)adapter.type"lora"convert_lora_to_gguf.py:376adapter.lora.alphaconvert_lora_to_gguf.py:380blk.0.attn_q.lora_aModel GGUF files contain:
general.architecture"llama","qwen2"general.name"Llama 3.1 8B Instruct"What's matchable
general.architecturemust match"llama"What's NOT in the metadata
base_model_name_or_pathfromadapter_config.jsonbut does not embed it in the GGUF (convert_lora_to_gguf.py:335-345)general.base_model.{id}.uuidandgeneral.base_model.{id}.namefields, and the writer hasadd_base_model_*()methods (gguf_writer.py:608-636), but neitherconvert_hf_to_gguf.pynorconvert_lora_to_gguf.pyactually writes themConclusion
Architecture-level matching is feasible from metadata alone (read
general.architecturefrom both files — cheap, no tensor loading). This narrows candidates but can't distinguish e.g. Llama-7B from Llama-70B.Exact matching requires loading tensor headers from both files to compare names and shapes. This is what
llama-adapter.cpp:330-368already does at runtime — it throws"maybe wrong base model?"on mismatch.Best practical approach for auto-discovery:
general.type = "adapter"(metadata read only)general.architecture— show only adapters matching the loaded model's architectureconvert_lora_to_gguf.pywritegeneral.base_model.0.name— the writer infrastructure already existsImplementation Plan
Phase 1: Discovery in
preset.cppModify
load_from_models_dir()(common/preset.cpp:382-445) to also scan for LoRA adapters:.gguffiles and treats them all as modelsgeneral.type— if"adapter", categorize as LoRA adapter instead of modelgeneral.architecturefrom adapter files for matchingKey consideration:
load_from_models_dir()currently does zero GGUF parsing — it only looks at filenames. Adding metadata reads means opening each file, which has performance implications for large directories. A naming convention fallback (e.g. files in aloras/subdirectory, or*-lora-*.ggufpattern) could supplement or replace metadata scanning.Phase 2: INI preset support
Add
loraandlora-scaledas valid keys in the INI preset parser (common/preset.cpp:310-360):This is straightforward since the INI parser already maps keys to CLI argument names, and
--lorais already a valid CLI arg.Phase 3: Router integration
server-models.cpp:561)GET /v1/modelsto include available adapters per modelOptional:
--lora-dirflagAdd a dedicated
--lora-dir PATHargument (parallel to--models-dir) for cases where adapters are stored separately from models. Would be added atcommon/arg.cpparound line 3030.File Reference
common/preset.cppcommon/preset.cppsrc/llama-adapter.cppsrc/llama-arch.cppgguf-py/gguf/constants.pygguf-py/gguf/gguf_writer.pyconvert_lora_to_gguf.pytools/server/server-models.cppcommon/arg.cppcommon/arg.cpp