Skip to content

Add import_models command to discover local MLX models.#6

Merged
simonw merged 4 commits intosimonw:mainfrom
ivanfioravanti:main
Feb 17, 2025
Merged

Add import_models command to discover local MLX models.#6
simonw merged 4 commits intosimonw:mainfrom
ivanfioravanti:main

Conversation

@ivanfioravanti
Copy link
Contributor

Very basic import from local hugging face cache folder for all models containing mlx-community.

Closes #5

llm_mlx.py Outdated

@mlx.command()
def import_models():
cache_dir = Path(os.environ.get("HF_HOME", os.path.expanduser("~/.cache/huggingface")))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using HF_HOME is neat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I play with it when loading files from external disk.

@simonw
Copy link
Owner

simonw commented Feb 16, 2025

I tested this like so:

mv "$(uv run llm mlx models-file)" /tmp/llm-mlx.json
uv run llm mlx models                               

This confirmed I now have no models.

Then with this PR:

uv run llm mlx import-models

Imported 82 models from Hugging Face cache

That sounds like too many!

uv run llm models -q mlx

Outputs:

MlxModel: mlx-community/whisper-small.en-mlx/snapshots/52a88bf6e98b114a210c21bb83e22d6e1505cb73
MlxModel: mlx-community/Llama-3.3-70B-Instruct-4bit
MlxModel: mlx-community/SmolLM-135M-Instruct-4bit/snapshots
MlxModel: mlx-community/whisper-small.en-mlx/blobs
MlxModel: mlx-community/whisper-tiny
MlxModel: mlx-community/nanoLLaVA-1.5-8bit/refs
MlxModel: mlx-community/whisper-large-v3-turbo
MlxModel: mlx-community/whisper-small.en-mlx/snapshots
MlxModel: mlx-community/Meta-Llama-3-8B-Instruct-4bit
MlxModel: mlx-community/DeepSeek-R1-Distill-Llama-8B/refs
MlxModel: mlx-community/Llama-3.3-70B-Instruct-4bit/refs
MlxModel: mlx-community/Llama-3.2-3B-Instruct-bf16
MlxModel: mlx-community/whisper-tiny/snapshots/54773fb11b9b7640b1a2ce4f8b55b2ce44239589
MlxModel: mlx-community/Mistral-7B-Instruct-v0.3-4bit/snapshots/a4b8f870474b0eb527f466a03fbc187830d271f5
MlxModel: mlx-community/whisper-large-v3-turbo/refs
MlxModel: mlx-community/Llama-3.2-3B-Instruct-4bit/snapshots/7a82cca14319d695658275cf5e3b98c012bb2f87
MlxModel: mlx-community/llava-phi-3-mini-4bit
MlxModel: mlx-community/distil-whisper-large-v3
MlxModel: mlx-community/whisper-small.en-mlx/refs
MlxModel: mlx-community/Mistral-7B-Instruct-v0.3-4bit/refs
MlxModel: mlx-community/Qwen2.5-Coder-32B-Instruct-8bit
MlxModel: mlx-community/idefics2-8b-4bit
MlxModel: mlx-community/OpenELM-3B-instruct-8bit
MlxModel: mlx-community/DeepSeek-R1-Distill-Llama-8B/snapshots
MlxModel: mlx-community/DeepSeek-R1-Distill-Llama-8B/blobs
MlxModel: mlx-community/whisper-tiny/blobs
MlxModel: mlx-community/SmolLM-135M-Instruct-4bit/blobs
MlxModel: mlx-community/Qwen2.5-0.5B-Instruct-4bit/snapshots
MlxModel: mlx-community/OpenELM-270M-Instruct
MlxModel: mlx-community/Llama-3.3-70B-Instruct-4bit/blobs
MlxModel: mlx-community/distil-whisper-large-v3/blobs
MlxModel: mlx-community/Qwen2.5-0.5B-Instruct-4bit
MlxModel: mlx-community/nanoLLaVA-1.5-8bit/blobs
MlxModel: mlx-community/pixtral-12b-8bit
MlxModel: mlx-community/SmolLM-135M-Instruct-4bit/snapshots/642e06afe3fab57fd6cc518637c471af0a569e1e
MlxModel: mlx-community/Mistral-7B-Instruct-v0.1-4bit-mlx
MlxModel: mlx-community/SmolVLM-Instruct-bf16
MlxModel: mlx-community/Qwen2.5-0.5B-Instruct-4bit/refs
MlxModel: mlx-community/whisper-large-v3-turbo/snapshots
MlxModel: mlx-community/Llama-3.2-3B-Instruct-4bit (aliases: ml, l32)
MlxModel: mlx-community/distil-whisper-large-v3/refs
MlxModel: mlx-community/Mistral-7B-Instruct-v0.3-4bit/blobs
MlxModel: mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit/blobs
MlxModel: mlx-community/OpenELM-270M-Instruct/blobs
MlxModel: mlx-community/Llama-3.2-3B-Instruct-4bit/refs
MlxModel: mlx-community/Qwen2.5-0.5B-Instruct-4bit/blobs
MlxModel: mlx-community/OpenELM-270M-Instruct/snapshots/7cb5ebd2e82067793db75003630ed2442a16a29d
MlxModel: mlx-community/SmolLM-135M-Instruct-4bit/refs
MlxModel: mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit/refs
MlxModel: mlx-community/DeepSeek-R1-Distill-Llama-8B/snapshots/7caba1ee941e9f0100394ebd5fe5a193a51304fb
MlxModel: mlx-community/Llama-3.2-3B-Instruct-4bit/snapshots
MlxModel: mlx-community/QVQ-72B-Preview-4bit
MlxModel: mlx-community/llava-1.5-7b-4bit
MlxModel: mlx-community/whisper-tiny/refs
MlxModel: mlx-community/whisper-small.en-mlx
MlxModel: mlx-community/Llama-3.2-3B-Instruct-4bit/blobs
MlxModel: mlx-community/whisper-large-v3-turbo/snapshots/beea265c324f07ba1e347f3c8a97aec454056a86
MlxModel: mlx-community/whisper-large-v2-mlx
MlxModel: mlx-community/nanoLLaVA-1.5-8bit/snapshots/547f734ef05c24b2fa73618a77f6e7fd76bf0f4d
MlxModel: mlx-community/distil-whisper-large-v3/snapshots/e1c3c155644be59f8b477c0186719442f7e3fbb0
MlxModel: mlx-community/Mistral-7B-Instruct-v0.3-4bit
MlxModel: mlx-community/whisper-large-v1-mlx
MlxModel: mlx-community/Qwen2.5-VL-7B-Instruct-8bit
MlxModel: mlx-community/Llama-3.3-70B-Instruct-4bit/snapshots/de2dfaf56839b7d0e834157d2401dee02726874d
MlxModel: mlx-community/Llama-3.3-70B-Instruct-4bit/snapshots
MlxModel: mlx-community/whisper-large-v3-turbo/blobs
MlxModel: mlx-community/Mistral-7B-Instruct-v0.3-4bit/snapshots
MlxModel: mlx-community/OpenELM-270M-Instruct/snapshots
MlxModel: mlx-community/SmolLM-135M-Instruct-4bit
MlxModel: mlx-community/distil-whisper-large-v3/snapshots
MlxModel: mlx-community/Qwen2.5-0.5B-Instruct-4bit/snapshots/a5339a4131f135d0fdc6a5c8b5bbed2753bbe0f3
MlxModel: mlx-community/OpenELM-270M-Instruct/refs
MlxModel: mlx-community/DeepSeek-R1-Distill-Llama-8B
MlxModel: mlx-community/Llama-3.2-11B-Vision-Instruct-4bit
MlxModel: mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit
MlxModel: mlx-community/nanoLLaVA-1.5-8bit/snapshots
MlxModel: mlx-community/whisper-tiny/snapshots
MlxModel: mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit/snapshots/f429cf7184400c416f1d61c4d9dd3f47912fccba
MlxModel: mlx-community/nanoLLaVA-1.5-8bit
MlxModel: mlx-community/phi-4-4bit
MlxModel: mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit/snapshots
MlxModel: mlx-community/Meta-Llama-3-8B-4bit

Some of those end with things like /snapshots which looks like a bug.

It's also picked up some non-LLM models like mlx-community/whisper-tiny.

@simonw
Copy link
Owner

simonw commented Feb 16, 2025

I wonder if there's a good way to detect which of those models are compatible with LLM?

mlx-community/pixtral-12b-8bit and mlx-community/Llama-3.2-11B-Vision-Instruct-4bit are particularly tricky as those are vision models, which likely need https://github.com/Blaizzy/mlx-vlm

@ivanfioravanti
Copy link
Contributor Author

I don't like hardcoding model_types in code, but I don't see other possibilities, what do you think?

@simonw
Copy link
Owner

simonw commented Feb 16, 2025

Another option could be to make this command interactive - so it shows you a list of models b ur asks you to confirm before adding them, maybe even lets you key up and key down and select toggle the ones you want.

@ivanfioravanti
Copy link
Contributor Author

@simonw implemented, similar to huggingface-cli delete-cache. I excluded whisper and some VL models to reduce list.
Models can be imported or even removed if already imported.

It seems ok now. WDYT?

Comment on lines +94 to +103
if model_type in [
"whisper",
"llava",
"paligemma",
"qwen2_vl",
"qwen2_5_vl",
"florence2",
"florence",
]:
continue
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very smart.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I had 100+ models, I had to cut that list down.

window_size = os.get_terminal_size().lines - 5

while True:
print("\033[H\033[J", end="")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally I'd suggest we use a cross-platform library like Rich here, but since this plugin only works on macOS there's no need for us to add another dependency.

@simonw
Copy link
Owner

simonw commented Feb 17, 2025

% llm mlx import-models
Available models (↑/↓ to navigate, SPACE to select, ENTER to confirm, Ctrl+C to quit):
  ○ (llama) mlx-community/DeepSeek-R1-Distill-Llama-8B (already imported)
  ○ (llama) mlx-community/Llama-3.2-3B-Instruct-4bit (already imported)
  ○ (llama) mlx-community/Llama-3.3-70B-Instruct-4bit (already imported)
  ○ (llama) mlx-community/SmolLM-135M-Instruct-4bit (already imported)
> ○ (llava-qwen2) mlx-community/nanoLLaVA-1.5-8bit (already imported)
  ○ (mistral) mlx-community/Mistral-7B-Instruct-v0.3-4bit (already imported)
  ○ (mistral) mlx-community/Mistral-Small-24B-Instruct-2501-4bit (already imported)
  ○ (openelm) mlx-community/OpenELM-270M-Instruct (already imported)
  ○ (qwen2) mlx-community/DeepSeek-R1-Distill-Qwen-32B-4bit (already imported)
  ○ (qwen2) mlx-community/Qwen2.5-0.5B-Instruct-4bit (already imported)

Since ALL of those are already-imported the tool should probably quit without asking me to make a decision.

@simonw
Copy link
Owner

simonw commented Feb 17, 2025

llm -m mlx-community/OpenELM-270M-Instruct hi

Error: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

I wonder if it's possible to detect that a model doesn't have chat templates setup and avoid suggesting it?

@simonw
Copy link
Owner

simonw commented Feb 17, 2025

Tests failed in Python 3.9:

cat llm_mlx.py | llm -m o3-mini 'rewrite this file to not use case/match so it works in Python 3.9'  

Response: https://gist.github.com/simonw/16deb5b5cc66973ff04b2a2f3eaf00d9

@simonw
Copy link
Owner

simonw commented Feb 17, 2025

I think we can land this. Things like "don't show options if everything is installed already" are purely nice-to-haves.

@simonw simonw merged commit 862c8fc into simonw:main Feb 17, 2025
5 checks passed
simonw added a commit that referenced this pull request Feb 17, 2025
@ivanfioravanti
Copy link
Contributor Author

I think we can land this. Things like "don't show options if everything is installed already" are purely nice-to-haves.

But you can use tool to remove already imported models too. Maybe it was better calling it manage-models

@ivanfioravanti
Copy link
Contributor Author

llm -m mlx-community/OpenELM-270M-Instruct hi

Error: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

I wonder if it's possible to detect that a model doesn't have chat templates setup and avoid suggesting it?

I bet it is, I will check this in the weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Local MLX models already downloaded are not visibile

2 participants