Summary
Add a LoRA adapter toggler/selection list in the webui that allows users to view, enable/disable, and adjust the scale of loaded LoRA adapters for the currently selected model.
This feature should only appear when LoRA adapters are available (either loaded via CLI, present in the auto-discovery directory, or configured in the models preset INI). It must not disrupt or degrade any other frontend feature when no adapters are present.
Backend Context
Existing LoRA API (fully implemented)
The server already exposes two endpoints for runtime LoRA management:
GET /lora-adapters → returns list of loaded adapters with id, path, scale, task_name, prompt_prefix, and optional alora_invocation_string
- Handler:
tools/server/server-context.cpp:3929-3955
POST /lora-adapters → sets global adapter scales at runtime (array of {id, scale})
- Handler:
tools/server/server-context.cpp:3957-3989
- Per-request
lora field in /completion and /chat/completions — overrides global scales for individual requests
- Parsed at:
tools/server/server-task.cpp:328-335
How LoRA adapters are loaded
- CLI:
--lora FNAME or --lora-scaled FNAME:SCALE,... (common/arg.cpp:2473-2496)
--lora-init-without-apply loads adapters into memory without applying them (scale=0), enabling later activation via the POST endpoint (common/arg.cpp:3110-3115)
- Data structure:
common_adapter_lora_info in common/common.h:42-50 — holds path, scale, task_name, prompt_prefix, and native llama_adapter_lora *ptr
Auto-discovery / Router mode
- Router mode activates when no model is specified on CLI (
tools/server/server.cpp:128)
- Models are auto-discovered from
--models-dir directory by scanning for .gguf files (common/preset.cpp:382-445)
- Models can also be configured via
--models-preset INI file (common/preset.cpp:310-360)
- LoRA adapters are NOT currently auto-discovered — they must be specified via CLI args
- LoRA GGUF files are distinguishable from model files by their
general.type = "adapter" metadata (src/llama-adapter.cpp:204)
- The INI preset system (
common/preset.cpp) does not currently support lora or lora-scaled keys
Key backend task types
SERVER_TASK_TYPE_GET_LORA and SERVER_TASK_TYPE_SET_LORA in tools/server/server-task.h:27-28
- Result structures:
server_task_result_get_lora (line 543) and server_task_result_apply_lora (line 554)
Frontend Context
Current state — no LoRA UI exists
- The API type definition already includes a
lora field in ApiLlamaCppServerProps.default_generation_settings.params (tools/server/webui/src/lib/types/api.d.ts:175,343), but it is not used anywhere in the UI
SettingsChatServiceOptions (tools/server/webui/src/lib/types/settings.d.ts:19-69) has no LoRA field
- No component exists for LoRA management
- The
custom?: string field in settings could theoretically pass LoRA params, but there is no UI for it
Model selector architecture (reference for LoRA UI patterns)
- Model store:
tools/server/webui/src/lib/stores/models.svelte.ts — manages model list, selection, load/unload
- Model service:
tools/server/webui/src/lib/services/models.service.ts — API calls (/v1/models)
- Model selector components:
tools/server/webui/src/lib/components/app/models/
ModelsSelector.svelte — dropdown with grouped display (loaded, favourites, by org)
ModelsSelectorSheet.svelte — mobile/sheet variant
ModelsSelectorOption.svelte — individual model item
- Chat parameter passing:
tools/server/webui/src/lib/stores/chat.svelte.ts:734-745 — builds completion options (no LoRA field currently)
Implementation Considerations
Frontend requirements
- Conditional rendering: LoRA UI should only appear when
GET /lora-adapters returns a non-empty list
- Adapter list with toggles: Show each adapter with name (derived from path), current scale, and enable/disable toggle
- Scale slider: Allow adjusting scale per adapter (0.0 to 1.0+ range)
- Global vs per-request: Decide whether to use
POST /lora-adapters (global) or pass lora field per-request. Per-request is more flexible but prevents batching
- Placement: Could be a collapsible section near the model selector, or a settings panel entry
Backend gaps to consider
- No LoRA auto-discovery:
load_from_models_dir() in common/preset.cpp does not scan for or categorize LoRA files. A --lora-dir flag or LoRA support in the INI preset would be needed for auto-discovery
- GGUF metadata: LoRA files have
general.type = "adapter" — this can be used to distinguish them during directory scanning
- Router mode: In router mode, each model spawns a child process. LoRA adapters would need to be passed to the child process args (via preset/INI)
Testing
Existing test infrastructure can be reused:
- Test LoRA adapter:
https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf (small Shakespeare-style LoRA for stories15M_MOE model)
- Base model: stories15M_MOE preset already in test utils
- Existing tests:
tools/server/tests/unit/test_lora.py — tests scale toggling, per-request config, and parallel slots
- LoRA conversion tools:
convert_lora_to_gguf.py and tools/export-lora/ for creating test adapters
File Reference
| Area |
File |
Key Lines |
| LoRA API routes |
tools/server/server.cpp |
200-201 |
| LoRA GET/POST handlers |
tools/server/server-context.cpp |
3929-3989 |
| LoRA slot application |
tools/server/server-context.cpp |
1064-1143 |
| LoRA utilities |
tools/server/server-common.cpp |
91-155 |
| LoRA data structures |
common/common.h |
42-50 |
| LoRA CLI args |
common/arg.cpp |
2473-2496, 3110-3115 |
| Model auto-discovery |
common/preset.cpp |
382-445 |
| INI preset parsing |
common/preset.cpp |
310-360 |
| Router model management |
tools/server/server-models.cpp |
242-375 |
| Frontend API types (lora field) |
tools/server/webui/src/lib/types/api.d.ts |
175, 343 |
| Frontend settings types |
tools/server/webui/src/lib/types/settings.d.ts |
19-69 |
| Frontend models store |
tools/server/webui/src/lib/stores/models.svelte.ts |
— |
| Frontend chat params |
tools/server/webui/src/lib/stores/chat.svelte.ts |
734-745 |
| Test: LoRA server tests |
tools/server/tests/unit/test_lora.py |
— |
Summary
Add a LoRA adapter toggler/selection list in the webui that allows users to view, enable/disable, and adjust the scale of loaded LoRA adapters for the currently selected model.
This feature should only appear when LoRA adapters are available (either loaded via CLI, present in the auto-discovery directory, or configured in the models preset INI). It must not disrupt or degrade any other frontend feature when no adapters are present.
Backend Context
Existing LoRA API (fully implemented)
The server already exposes two endpoints for runtime LoRA management:
GET /lora-adapters→ returns list of loaded adapters withid,path,scale,task_name,prompt_prefix, and optionalalora_invocation_stringtools/server/server-context.cpp:3929-3955POST /lora-adapters→ sets global adapter scales at runtime (array of{id, scale})tools/server/server-context.cpp:3957-3989lorafield in/completionand/chat/completions— overrides global scales for individual requeststools/server/server-task.cpp:328-335How LoRA adapters are loaded
--lora FNAMEor--lora-scaled FNAME:SCALE,...(common/arg.cpp:2473-2496)--lora-init-without-applyloads adapters into memory without applying them (scale=0), enabling later activation via the POST endpoint (common/arg.cpp:3110-3115)common_adapter_lora_infoincommon/common.h:42-50— holdspath,scale,task_name,prompt_prefix, and nativellama_adapter_lora *ptrAuto-discovery / Router mode
tools/server/server.cpp:128)--models-dirdirectory by scanning for.gguffiles (common/preset.cpp:382-445)--models-presetINI file (common/preset.cpp:310-360)general.type = "adapter"metadata (src/llama-adapter.cpp:204)common/preset.cpp) does not currently supportloraorlora-scaledkeysKey backend task types
SERVER_TASK_TYPE_GET_LORAandSERVER_TASK_TYPE_SET_LORAintools/server/server-task.h:27-28server_task_result_get_lora(line 543) andserver_task_result_apply_lora(line 554)Frontend Context
Current state — no LoRA UI exists
lorafield inApiLlamaCppServerProps.default_generation_settings.params(tools/server/webui/src/lib/types/api.d.ts:175,343), but it is not used anywhere in the UISettingsChatServiceOptions(tools/server/webui/src/lib/types/settings.d.ts:19-69) has no LoRA fieldcustom?: stringfield in settings could theoretically pass LoRA params, but there is no UI for itModel selector architecture (reference for LoRA UI patterns)
tools/server/webui/src/lib/stores/models.svelte.ts— manages model list, selection, load/unloadtools/server/webui/src/lib/services/models.service.ts— API calls (/v1/models)tools/server/webui/src/lib/components/app/models/ModelsSelector.svelte— dropdown with grouped display (loaded, favourites, by org)ModelsSelectorSheet.svelte— mobile/sheet variantModelsSelectorOption.svelte— individual model itemtools/server/webui/src/lib/stores/chat.svelte.ts:734-745— builds completion options (no LoRA field currently)Implementation Considerations
Frontend requirements
GET /lora-adaptersreturns a non-empty listPOST /lora-adapters(global) or passlorafield per-request. Per-request is more flexible but prevents batchingBackend gaps to consider
load_from_models_dir()incommon/preset.cppdoes not scan for or categorize LoRA files. A--lora-dirflag or LoRA support in the INI preset would be needed for auto-discoverygeneral.type = "adapter"— this can be used to distinguish them during directory scanningTesting
Existing test infrastructure can be reused:
https://huggingface.co/ggml-org/stories15M_MOE/resolve/main/moe_shakespeare15M.gguf(small Shakespeare-style LoRA for stories15M_MOE model)tools/server/tests/unit/test_lora.py— tests scale toggling, per-request config, and parallel slotsconvert_lora_to_gguf.pyandtools/export-lora/for creating test adaptersFile Reference
tools/server/server.cpptools/server/server-context.cpptools/server/server-context.cpptools/server/server-common.cppcommon/common.hcommon/arg.cppcommon/preset.cppcommon/preset.cpptools/server/server-models.cpptools/server/webui/src/lib/types/api.d.tstools/server/webui/src/lib/types/settings.d.tstools/server/webui/src/lib/stores/models.svelte.tstools/server/webui/src/lib/stores/chat.svelte.tstools/server/tests/unit/test_lora.py