███╗ ███╗██╗ ██╗ ██╗ ██████╗ ██╗ ██╗██╗
████╗ ████║██║ ╚██╗██╔╝ ██╔════╝ ██║ ██║██║
██╔████╔██║██║ ╚███╔╝█████╗██║ ███╗██║ ██║██║
██║╚██╔╝██║██║ ██╔██╗╚════╝██║ ██║██║ ██║██║
██║ ╚═╝ ██║███████╗██╔╝ ██╗ ╚██████╔╝╚██████╔╝██║
╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝
The Swiss Army Knife of Apple Silicon AI - A lightweight Inference Server for Apple's MLX engine with a GUI.
TLDR - OpenRouter-style v1 API interface for MLX with Ollama-like model management, featuring auto-queuing, on-demand model loading, and multi-user serving capabilities via single mac app.
MLX-Transcribe is a native macOS transcription app that transforms your voice into text instantly at your cursor position. Built with Apple Silicon optimization and seamless MLX-GUI integration, it delivers privacy-first, lightning-fast transcription.
Key Features:
- ⚡ Instant Transcription - Press
Control + ~and speak - text appears instantly at your cursor - 🤖 MLX-GUI Integration - Seamless local AI processing with automatic Parakeet model management
- 🛡️ Privacy First - Your voice never leaves your machine with local processing
- 🎯 Universal Cursor - Works in any app - editors, browsers, chat apps, terminals
- 🚀 Native Menu Bar - Lightweight, always-accessible macOS integration
Perfect companion to MLX-GUI for developers, writers, and anyone who values speed and privacy in voice-to-text workflows.
Add harmony chat template support for models like oss-gpt. Updated to latest mlx-lm, mlx-vlm, mlx-audio, mlx-whisper, mlx_embeddings, mlx_embedding_models, parakeet-mlx, transformers, and tokenizers.
Basic Features of MLX-GUI like auto model loading, model management, and API are starting to be added to things like llama.cpp and others. Remember we had it first in MLX-GUI!
| Package | Old Version | New Version |
|---|---|---|
| mlx-lm | 0.25.1 | 0.28.4 |
| mlx (core) | 0.28.0 | 0.30.0 |
| mlx-metal | 0.28.0 | 0.30.0 |
| mlx-vlm | 0.1.0 | 0.3.9 |
| mlx-audio | 0.1.0 | 0.2.6 |
| mlx-whisper | 0.4.0 | 0.4.3 |
| mlx_embeddings | 0.0.3 | 0.0.5 |
| mlx_embedding_models | 0.0.3 | 0.0.11 |
| parakeet-mlx | 0.3.5 | 0.4.1 |
| transformers | 4.53.1 | 4.57.3 |
| tokenizers | 0.21.4 | 0.22.1 |
v1.2.4
From Whisper to Embeddings in One API - 23 embedding models, 99 languages, complete audio/vision/text pipeline. Production-ready, not promises.
- 🎙️ Complete Whisper Ecosystem - All variants (Tiny to Large v3) with automatic fallback - never fails!
- 🌍 99+ Languages - Auto-detection with no configuration needed
- ⏱️ Word-Level Timestamps - Perfect for subtitles, content indexing, and meeting analysis
- 📼 Universal Format Support - WAV, MP3, MP4, M4A, FLAC, OGG, WebM - any audio format works
- 🎯 Parakeet TDT - Lightning-fast transcription for real-time applications
- 🎨 Beautiful Audio UI - Drag-and-drop interface with 11 languages and 5 output formats
- 🌟 23+ Models, 13 Families - E5, ModernBERT, Arctic, GTE, BGE, MiniLM, Qwen3, SentenceT5, Jina AI, and more!
- 🔧 Triple Library Support - Seamlessly integrates mlx_embedding_models, mlx_embeddings, AND sentence-transformers
- 🧪 Battle-Tested - 553 lines of embedding tests + 338 lines of audio tests ensure reliability
- 📏 Any Dimension - From efficient 384-dim to powerful 4096-dim embeddings
- 🎯 Smart Architecture Detection - Automatically optimizes extraction for each model type
- 🔢 L2-Normalized Vectors - Production-ready for similarity search and RAG applications
- ✨ 24B Parameter Model - Full support for Mistral-Small-3.2-24B-Instruct
- 🎨 Vision-Text Capability - Advanced multimodal processing via MLX-VLM
- 🧪 Test Suite Integration - Comprehensive testing ensuring reliable performance
- 🔧 Smart Classification - Automatic detection and proper model type assignment
- 🧪 900+ Lines of Tests - Comprehensive test coverage for production reliability
- 🔍 New Discovery Endpoint -
/v1/discover/sttfor easy speech-to-text model discovery - 🎯 Never-Fail Architecture - Smart Whisper fallback ensures audio transcription always works
- 📊 Enhanced Memory Management - Optimized loading for large embedding and audio models
- 🔄 Intelligent Queue System - Handles diverse result types (lists, arrays, dicts) seamlessly
- ⚡ Performance Optimization - Faster model switching and concurrent processing
v1.2.3 - Real-Time Model Status & Model Support (July 19 2025)
Key Features:
- 🚀 Real-time status monitoring with download progress
- 🧪 Built-in API test console with response analytics
- 🎨 15+ new verified models including SmolLM3, Kimi-K2, Gemma-3n
- 🧠 Trillion-parameter model support
- 🔧 Enhanced model type classification
v1.2.0-v1.2.2 - Memory Management & Vision Compatibility
Key Features:
- 🧠 Revolutionary auto-unload system with LRU eviction
- 🖼️ Complete CyberAI image compatibility
- 🔄 Three-layer memory protection
- 📸 Enhanced VLM stability for vision models
- 🛠️ Raw base64 image support
Download: Latest Release
-
✅ Why MLX? Llama.cpp and Ollama are great, but they are slower than MLX. MLX is a native Apple Silicon framework that is optimized for Apple Silicon. Plus, it's free and open source, and this have a nice GUI.
-
⚡️ I wanted to turn my mac Mini and a Studio into more useful multiuser inference servers that I don't want to manage.
-
🏗️ I just want to build AI things and not manage inference servers, or pay for expensive services while maintaining sovereignty of my data.
- 🧠 MLX Engine Integration - Native Apple Silicon acceleration via MLX
- 🎙️ Advanced Audio Intelligence - Complete Whisper & Parakeet support with multi-format processing
- 🔢 Production Embeddings - Multi-architecture support (BGE, MiniLM, Qwen3, Arctic, E5)
- 🖼️ Vision Models - Image understanding with Gemma-3n, Qwen2-VL, Mistral Small (enhanced stability)
- 🤖 Large Language Models - Full support for instruction-tuned and reasoning models
- 🔄 Intelligent Memory Management - Advanced auto-unload system with LRU eviction
- 🛡️ Three-Layer Memory Protection - Proactive cleanup, concurrent limits, emergency recovery
- ⚡ OpenAI Compatibility - Drop-in replacement for OpenAI API endpoints
- 🌐 REST API Server - Complete API for model management and inference
- 📊 Real-Time Monitoring - System status, memory usage, and model performance
- 🎨 Beautiful Admin Interface - Modern web GUI for model management
- 🔍 HuggingFace Integration - Discover and install MLX-compatible models
- 🍎 macOS System Tray - Native menu bar integration
- 📱 Standalone App - Packaged macOS app bundle (no Python required)
Text Generation
qwen3-8b-6bit- Qwen3 8B quantized modeldeepseek-r1-0528-qwen3-8b-mlx-8bit- DeepSeek R1 reasoning modelsmollm3-3b-4bit/smollm3-3b-bf16- SmolLM3 multilingual modelsgemma-3-27b-it-qat-4bit- Google Gemma 3 27B instruction-tunedmistral-small-3-2-24b-instruct-2506-mlx-4bit- Mistral Small 24B with visiondevstral-small-2507-mlx-4bit- Devstral coding model
Vision Models
gemma-3n-e4b-it/gemma-3n-e4b-it-mlx-8bit- Gemma 3n vision modelsmistral-small-3-2-24b-instruct-2506-mlx-4bit- Multimodal capabilities
Audio Transcription
whisper-large-v3-turbo- OpenAI Whisper Turbo for fast transcriptionparakeet-tdt-0-6b-v2- Ultra-fast Parakeet speech-to-text
Text Embeddings
qwen3-embedding-4b-4bit-dwq- Qwen3 embeddings (2560 dimensions)bge-small-en-v1-5-bf16- BGE embeddings (384 dimensions)all-minilm-l6-v2-4bit/all-minilm-l6-v2-bf16- MiniLM embeddingssnowflake-arctic-embed-l-v2-0-4bit- Arctic embeddings (1024 dimensions)
- macOS (Apple Silicon M1/M2/M3/M4 required)
- Python 3.11+ (for development)
- 8GB+ RAM (16GB+ recommended for larger models)
- Download the latest
.appfrom Releases - Drag to
/Applications - Launch - no Python installation required!
- From the menu bar, click the MLX icon to open the admin interface.
- Discover and install models from HuggingFace.
- Connect your AI app to the API endpoint.
📝 Models may take a few minutes to load. They are gigabytes in size and will download at your internet speed.
# Install MLX-GUI
pip install mlx-gui
# Launch with system tray
mlx-gui tray
# Or launch server only
mlx-gui start --port 8000# Clone the repository
git clone https://github.com/RamboRogers/mlx-gui.git
cd mlx-gui
# Install dependencies (10-100x faster than pip)
uv sync --extra app
# Launch with system tray
uv run mlx-gui tray
# Or launch server only
uv run mlx-gui start --port 8000💡 Why uv? uv is 10-100x faster than pip and provides better dependency resolution.
An API Endpoint for Jan or any other AI app
Simply configure the API endpoint in the app settings to point to your MLX-GUI server. This works with any AI app that supports the OpenAI API. Enter anything for the API key.
Launch the app and look for MLX in your menu bar:
- Open Admin Interface - Web GUI for model management
- System Status - Real-time monitoring
- Unload All Models - Free up memory
- Network Settings - Configure binding options
Navigate to http://localhost:8000/admin for:
- 🔍 Discover Tab - Browse and install MLX models from HuggingFace
- 🧠 Models Tab - Manage installed models (load/unload/remove)
- 📊 Monitor Tab - System statistics and performance
- ⚙️ Settings Tab - Configure server and model options
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-8b-6bit",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2-vl-2b-instruct",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
]
}],
"max_tokens": 200
}'curl -X POST http://localhost:8000/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F "file=@audio.wav" \
-F "model=parakeet-tdt-0-6b-v2"curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": ["Hello world", "How are you?"],
"model": "qwen3-embedding-0-6b-4bit"
}'# Install text model
curl -X POST http://localhost:8000/v1/models/install \
-H "Content-Type: application/json" \
-d '{
"model_id": "mlx-community/Qwen2.5-7B-Instruct-4bit",
"name": "qwen-7b-4bit"
}'
# Install audio model
curl -X POST http://localhost:8000/v1/models/install \
-H "Content-Type: application/json" \
-d '{
"model_id": "mlx-community/parakeet-tdt-0.6b-v2",
"name": "parakeet-tdt-0-6b-v2"
}'
# Install vision model
curl -X POST http://localhost:8000/v1/models/install \
-H "Content-Type: application/json" \
-d '{
"model_id": "mlx-community/Qwen2-VL-2B-Instruct-4bit",
"name": "qwen2-vl-2b-instruct"
}'
# Install embedding model
curl -X POST http://localhost:8000/v1/models/install \
-H "Content-Type: application/json" \
-d '{
"model_id": "mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ",
"name": "qwen3-embedding-0-6b-4bit"
}'┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ System Tray │ │ Web Admin GUI │ │ REST API │
│ (macOS) │◄──►│ (localhost:8000)│◄──►│ (/v1/*) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌─────────────────┐ │
│ Model Manager │◄─────────────┘
│ (Queue/Memory) │
└─────────────────┘
│
┌─────────────────┐
│ MLX Engine │
│ (Apple Silicon) │
└─────────────────┘
Full API documentation is available at /v1/docs when the server is running, or see API.md for complete endpoint reference.
POST /v1/chat/completions- OpenAI-compatible chat (text + images + Mistral Small)POST /v1/embeddings- NEW: Multi-architecture embeddings (BGE, MiniLM, Qwen3, Arctic)POST /v1/audio/transcriptions- NEW: Enhanced audio transcription (Whisper Turbo, Parakeet)
GET /v1/models- List installed modelsPOST /v1/models/install- Install from HuggingFacePOST /v1/models/{name}/load- Load model into memoryGET /v1/discover/models- Search HuggingFace for MLX modelsGET /v1/discover/embeddings- NEW: Search for embedding modelsGET /v1/discover/stt- NEW: Search for audio transcription models
GET /v1/system/status- System and memory statusGET /v1/manager/status- Detailed model manager status
git clone https://github.com/RamboRogers/mlx-gui.git
cd mlx-gui
# Install in development mode with audio and vision support
uv sync --extra dev --extra audio --extra vision# Start development server in one terminal
uv run mlx-gui start --log-level debug --reloadIn another terminal ...
# Run all tests
uv run pytestOR
# Run quick smoke tests
uv run pytest tests/test_audio.py::test_audio_transcription -q
uv run pytest tests/test_embeddings_endpoint.py -qNotes:
- The embeddings test includes a base64 variant that is currently skipped unless the server is configured to return base64-encoded vectors. Default output is floats.
# Build macOS app bundle (script runs `uv sync` automatically)
uv run ./scripts/build_app.sh
# Result: dist/MLX-GUI.appNotes:
- The build script performs
uv sync --frozen --extra app --extra audio --extra visionby default for reproducibility. - To skip syncing (e.g., if you already ran
uv sync): setSKIP_UV_SYNC=1before the command.
- The build prefers the Homebrew arm64 binaries at
/opt/homebrew/bin/{ffmpeg,ffprobe}and bundles matchinglibav*,libsw*, andlibpostproc*dylibs inside the app for runtime. - For development/tests, ensure the Homebrew tools are used:
- Set environment variables:
FFMPEG_BINARY=/opt/homebrew/bin/ffmpegFFMPEG_PROBE=/opt/homebrew/bin/ffprobePATH=/opt/homebrew/bin:$PATH
- Set environment variables:
- We intentionally avoid bundling PyAV's vendored
__dot__dylibsto prevent symbol conflicts. - The
ffmpeg_binaries/directory under the repo root is a build artifact staging area used by the app bundle process and should not be committed to source control.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MLX-GUI is licensed under the GNU General Public License v3.0 (GPLv3).
Free Software
- Apple MLX Team - For the incredible MLX framework
- MLX-LM - MLX language model implementations
- HuggingFace - For the model hub and transformers library









