This guide explains how to serve models from custom directories (non-HuggingFace cache locations) using vLLM-CLI. Custom directories are essential when working with:
- Fine-tuned models
- Merged models
- Models downloaded outside of HuggingFace
- LoRA adapters
- Custom model formats
-
vLLM-CLI installed:
pip install vllm-cli
-
HF-MODEL-TOOL installed (for model discovery):
pip install hf-model-tool
-
Models in supported formats:
- Must have
config.jsonfile - Model weights in
.safetensors,.bin, or.ptformat - Tokenizer files if applicable
- Must have
-
Launch vLLM-CLI:
vllm-cli
-
Navigate to Settings from the main menu
-
Select Model Directories
-
Select Add Model Directory
-
Enter the path to your custom model directory:
Path: /home/user/my-models -
Choose directory type:
- Auto-detect (recommended) - Let the tool determine the type
- Custom - For fine-tuned or merged models
- LoRA - For directories containing LoRA adapters
-
The cli will automatically scan the directory and generate a
models_manifest.jsonfile. -
You can review the manifest and edit it if needed.
-
You can also add multiple directories by clicking Add Model Directory again.
-
You can also remove a directory by clicking Remove Model Directory.
# Add directory via command line
hf-model-tool -path /home/user/my-modelsWhen you add a custom directory, a models_manifest.json file is automatically generated in the directory root. This file:
- Contains metadata for all detected models
- Is the primary source for model information
- Can be edited to customize model names and publishers
{
"version": "1.0",
"generated": "2025-08-17T19:21:18.530824",
"directory": "your-model-directory-path",
"models": [
{
"path": "qwen3-4b-sft",
"name": "Qwen3 4B SFT",
"publisher": "Qwen",
"type": "custom_model",
"notes": "Fine-tuned on xxx dataset"
}
]
}Important: Always review the auto-generated manifest to ensure accurate model information.
- Open
models_manifest.jsonin your model directory - Edit the following fields as needed:
name: Display name shown in vLLM-CLIpublisher: Organization or authortype: Model type (model, custom_model, lora_adapter)notes: Optional description
-
In vLLM-CLI, select one of the Serving options from the main menu
-
Your custom models will appear in the model selection list based on the provider defined in the manifest:
[?] Select Qwen Model (2 available): > Qwen3 4B SFT (1.32 GB) # This is the model from your custom directory Qwen3-32B (61.04 GB) ← Back -
Select your custom model
-
Choose a serving profile or configure manually
-
The model will be served with the full path to the custom directory
/home/user/my-model/
├── models_manifest.json # Auto-generated manifest
├── config.json # Model configuration
├── model.safetensors # Model weights
├── tokenizer.json # Tokenizer
└── tokenizer_config.json # Tokenizer config
Usage: Add /home/user/my-model as a custom directory
/home/user/models/
├── models_manifest.json # Manifest for all models
├── qwen3-4b-sft/
│ ├── config.json
│ └── model.safetensors
├── qwen3-7b-sft/
│ ├── config.json
│ └── model.safetensors
└── gemma-custom/
├── config.json
└── model.safetensors
/home/user/lora-adapters/
├── models_manifest.json
├── finance-lora/
│ ├── adapter_config.json
│ └── adapter_model.safetensors
└── medical-lora/
├── adapter_config.json
└── adapter_model.safetensors
Usage: Add as custom directory, LoRAs will be auto-detected
# From command line
vllm-cli serve /home/user/models/qwen3-4b-sft --profile standard-
Check directory was added successfully:
- Go to Model Manager → View Model Directories
- Verify your directory is listed
-
Verify model structure:
- Ensure
config.jsonexists - Check for model weight files
- Confirm directory permissions
- Ensure
-
Review manifest:
- Check
models_manifest.jsonwas generated - Verify model entry exists in manifest
- Check
-
Clear cache:
- Go to Model Management → Clear Cache
-
Check model compatibility:
- Verify model architecture is supported by vLLM
- Ensure CUDA/GPU requirements are met
-
Review error logs:
- Check vLLM-CLI logs in the monitoring view
- Look for specific error messages
-
Validate model files:
# Test model loading from transformers import AutoModel model = AutoModel.from_pretrained("/path/to/model")
- Regenerate manifest: Remove the directory from vLLM-CLI and add it again