GitHub - RamboRogers/mlx-gui: MLX-GUI MLX Inference Server for Apple Silicone

███╗   ███╗██╗     ██╗  ██╗      ██████╗ ██╗   ██╗██╗
████╗ ████║██║     ╚██╗██╔╝     ██╔════╝ ██║   ██║██║
██╔████╔██║██║      ╚███╔╝█████╗██║  ███╗██║   ██║██║
██║╚██╔╝██║██║      ██╔██╗╚════╝██║   ██║██║   ██║██║
██║ ╚═╝ ██║███████╗██╔╝ ██╗     ╚██████╔╝╚██████╔╝██║
╚═╝     ╚═╝╚══════╝╚═╝  ╚═╝      ╚═════╝  ╚═════╝ ╚═╝

The Swiss Army Knife of Apple Silicon AI - A lightweight Inference Server for Apple's MLX engine with a GUI.

TLDR - OpenRouter-style v1 API interface for MLX with Ollama-like model management, featuring auto-queuing, on-demand model loading, and multi-user serving capabilities via single mac app.

🚀 Apps using MLX-GUI

🎤 MLX-Transcribe - Lightning-Fast Voice Transcription

MLX-Transcribe is a native macOS transcription app that transforms your voice into text instantly at your cursor position. Built with Apple Silicon optimization and seamless MLX-GUI integration, it delivers privacy-first, lightning-fast transcription.

Key Features:

⚡ Instant Transcription - Press Control + ~ and speak - text appears instantly at your cursor
🤖 MLX-GUI Integration - Seamless local AI processing with automatic Parakeet model management
🛡️ Privacy First - Your voice never leaves your machine with local processing
🎯 Universal Cursor - Works in any app - editors, browsers, chat apps, terminals
🚀 Native Menu Bar - Lightweight, always-accessible macOS integration

Perfect companion to MLX-GUI for developers, writers, and anyone who values speed and privacy in voice-to-text workflows.

📦 Latest Release

🎉 v1.3.0 - Optimizations core Updates and Bug Fixes (December 14 2025)

Add harmony chat template support for models like oss-gpt. Updated to latest mlx-lm, mlx-vlm, mlx-audio, mlx-whisper, mlx_embeddings, mlx_embedding_models, parakeet-mlx, transformers, and tokenizers.

Basic Features of MLX-GUI like auto model loading, model management, and API are starting to be added to things like llama.cpp and others. Remember we had it first in MLX-GUI!

Package	Old Version	New Version
mlx-lm	0.25.1	0.28.4
mlx (core)	0.28.0	0.30.0
mlx-metal	0.28.0	0.30.0
mlx-vlm	0.1.0	0.3.9
mlx-audio	0.1.0	0.2.6
mlx-whisper	0.4.0	0.4.3
mlx_embeddings	0.0.3	0.0.5
mlx_embedding_models	0.0.3	0.0.11
parakeet-mlx	0.3.5	0.4.1
transformers	4.53.1	4.57.3
tokenizers	0.21.4	0.22.1

📚 Previous Releases

v1.2.4

🎉 v1.2.4 - Universal AI Ecosystem (July 22 2025)

From Whisper to Embeddings in One API - 23 embedding models, 99 languages, complete audio/vision/text pipeline. Production-ready, not promises.

🚀 NEW: Advanced Audio Intelligence

🎙️ Complete Whisper Ecosystem - All variants (Tiny to Large v3) with automatic fallback - never fails!
🌍 99+ Languages - Auto-detection with no configuration needed
⏱️ Word-Level Timestamps - Perfect for subtitles, content indexing, and meeting analysis
📼 Universal Format Support - WAV, MP3, MP4, M4A, FLAC, OGG, WebM - any audio format works
🎯 Parakeet TDT - Lightning-fast transcription for real-time applications
🎨 Beautiful Audio UI - Drag-and-drop interface with 11 languages and 5 output formats

🧠 NEW: Complete Embedding Ecosystem

🌟 23+ Models, 13 Families - E5, ModernBERT, Arctic, GTE, BGE, MiniLM, Qwen3, SentenceT5, Jina AI, and more!
🔧 Triple Library Support - Seamlessly integrates mlx_embedding_models, mlx_embeddings, AND sentence-transformers
🧪 Battle-Tested - 553 lines of embedding tests + 338 lines of audio tests ensure reliability
📏 Any Dimension - From efficient 384-dim to powerful 4096-dim embeddings
🎯 Smart Architecture Detection - Automatically optimizes extraction for each model type
🔢 L2-Normalized Vectors - Production-ready for similarity search and RAG applications

🤖 NEW: Mistral Small Integration

✨ 24B Parameter Model - Full support for Mistral-Small-3.2-24B-Instruct
🎨 Vision-Text Capability - Advanced multimodal processing via MLX-VLM
🧪 Test Suite Integration - Comprehensive testing ensuring reliable performance
🔧 Smart Classification - Automatic detection and proper model type assignment

🛠️ Technical Excellence

🧪 900+ Lines of Tests - Comprehensive test coverage for production reliability
🔍 New Discovery Endpoint - /v1/discover/stt for easy speech-to-text model discovery
🎯 Never-Fail Architecture - Smart Whisper fallback ensures audio transcription always works
📊 Enhanced Memory Management - Optimized loading for large embedding and audio models
🔄 Intelligent Queue System - Handles diverse result types (lists, arrays, dicts) seamlessly
⚡ Performance Optimization - Faster model switching and concurrent processing

v1.2.3 - Real-Time Model Status & Model Support (July 19 2025)

Key Features:

🚀 Real-time status monitoring with download progress
🧪 Built-in API test console with response analytics
🎨 15+ new verified models including SmolLM3, Kimi-K2, Gemma-3n
🧠 Trillion-parameter model support
🔧 Enhanced model type classification

v1.2.0-v1.2.2 - Memory Management & Vision Compatibility

Key Features:

🧠 Revolutionary auto-unload system with LRU eviction
🖼️ Complete CyberAI image compatibility
🔄 Three-layer memory protection
📸 Enhanced VLM stability for vision models
🛠️ Raw base64 image support

Download: Latest Release

Why ?

✅ Why MLX? Llama.cpp and Ollama are great, but they are slower than MLX. MLX is a native Apple Silicon framework that is optimized for Apple Silicon. Plus, it's free and open source, and this have a nice GUI.
⚡️ I wanted to turn my mac Mini and a Studio into more useful multiuser inference servers that I don't want to manage.
🏗️ I just want to build AI things and not manage inference servers, or pay for expensive services while maintaining sovereignty of my data.

GUI


Mac Native

🚀 Features

🎯 Universal AI Capabilities

🧠 MLX Engine Integration - Native Apple Silicon acceleration via MLX
🎙️ Advanced Audio Intelligence - Complete Whisper & Parakeet support with multi-format processing
🔢 Production Embeddings - Multi-architecture support (BGE, MiniLM, Qwen3, Arctic, E5)
🖼️ Vision Models - Image understanding with Gemma-3n, Qwen2-VL, Mistral Small (enhanced stability)
🤖 Large Language Models - Full support for instruction-tuned and reasoning models

🛠️ Enterprise-Grade Infrastructure

🔄 Intelligent Memory Management - Advanced auto-unload system with LRU eviction
🛡️ Three-Layer Memory Protection - Proactive cleanup, concurrent limits, emergency recovery
⚡ OpenAI Compatibility - Drop-in replacement for OpenAI API endpoints
🌐 REST API Server - Complete API for model management and inference
📊 Real-Time Monitoring - System status, memory usage, and model performance

🎨 User Experience

🎨 Beautiful Admin Interface - Modern web GUI for model management
🔍 HuggingFace Integration - Discover and install MLX-compatible models
🍎 macOS System Tray - Native menu bar integration
📱 Standalone App - Packaged macOS app bundle (no Python required)

🤖 Tested Models

Text Generation

qwen3-8b-6bit - Qwen3 8B quantized model
deepseek-r1-0528-qwen3-8b-mlx-8bit - DeepSeek R1 reasoning model
smollm3-3b-4bit / smollm3-3b-bf16 - SmolLM3 multilingual models
gemma-3-27b-it-qat-4bit - Google Gemma 3 27B instruction-tuned
mistral-small-3-2-24b-instruct-2506-mlx-4bit - Mistral Small 24B with vision
devstral-small-2507-mlx-4bit - Devstral coding model

Vision Models

gemma-3n-e4b-it / gemma-3n-e4b-it-mlx-8bit - Gemma 3n vision models
mistral-small-3-2-24b-instruct-2506-mlx-4bit - Multimodal capabilities

Audio Transcription

whisper-large-v3-turbo - OpenAI Whisper Turbo for fast transcription
parakeet-tdt-0-6b-v2 - Ultra-fast Parakeet speech-to-text

Text Embeddings

qwen3-embedding-4b-4bit-dwq - Qwen3 embeddings (2560 dimensions)
bge-small-en-v1-5-bf16 - BGE embeddings (384 dimensions)
all-minilm-l6-v2-4bit / all-minilm-l6-v2-bf16 - MiniLM embeddings
snowflake-arctic-embed-l-v2-0-4bit - Arctic embeddings (1024 dimensions)

📋 Requirements

macOS (Apple Silicon M1/M2/M3/M4 required)
Python 3.11+ (for development)
8GB+ RAM (16GB+ recommended for larger models)

🏃‍♂️ Quick Start

Option 1: Download Standalone App (Recommended for Mac Users)

Download the latest .app from Releases
Drag to /Applications
Launch - no Python installation required!
From the menu bar, click the MLX icon to open the admin interface.
Discover and install models from HuggingFace.
Connect your AI app to the API endpoint.

📝 Models may take a few minutes to load. They are gigabytes in size and will download at your internet speed.

Option 2: Install from PyPI

# Install MLX-GUI
pip install mlx-gui

# Launch with system tray
mlx-gui tray

# Or launch server only
mlx-gui start --port 8000

Option 3: Install from Source (with uv - Recommended)

# Clone the repository
git clone https://github.com/RamboRogers/mlx-gui.git
cd mlx-gui

# Install dependencies (10-100x faster than pip)
uv sync --extra app

# Launch with system tray
uv run mlx-gui tray

# Or launch server only
uv run mlx-gui start --port 8000

💡 Why uv? uv is 10-100x faster than pip and provides better dependency resolution.

🎮 Usage

An API Endpoint for Jan or any other AI app

Simply configure the API endpoint in the app settings to point to your MLX-GUI server. This works with any AI app that supports the OpenAI API. Enter anything for the API key.

System Tray (Recommended)

Launch the app and look for MLX in your menu bar:

Open Admin Interface - Web GUI for model management
System Status - Real-time monitoring
Unload All Models - Free up memory
Network Settings - Configure binding options

Web Admin Interface

Navigate to http://localhost:8000/admin for:

🔍 Discover Tab - Browse and install MLX models from HuggingFace
🧠 Models Tab - Manage installed models (load/unload/remove)
📊 Monitor Tab - System statistics and performance
⚙️ Settings Tab - Configure server and model options

API Usage

OpenAI-Compatible Chat

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-8b-6bit",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

Vision Models with Images

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2-vl-2b-instruct",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What do you see in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQ..."}}
      ]
    }],
    "max_tokens": 200
  }'

Audio Transcription

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.wav" \
  -F "model=parakeet-tdt-0-6b-v2"

Text Embeddings

curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["Hello world", "How are you?"],
    "model": "qwen3-embedding-0-6b-4bit"
  }'

Install Models

# Install text model
curl -X POST http://localhost:8000/v1/models/install \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "mlx-community/Qwen2.5-7B-Instruct-4bit",
    "name": "qwen-7b-4bit"
  }'

# Install audio model
curl -X POST http://localhost:8000/v1/models/install \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "mlx-community/parakeet-tdt-0.6b-v2",
    "name": "parakeet-tdt-0-6b-v2"
  }'

# Install vision model
curl -X POST http://localhost:8000/v1/models/install \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "mlx-community/Qwen2-VL-2B-Instruct-4bit",
    "name": "qwen2-vl-2b-instruct"
  }'

# Install embedding model
curl -X POST http://localhost:8000/v1/models/install \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ",
    "name": "qwen3-embedding-0-6b-4bit"
  }'

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  System Tray    │    │   Web Admin GUI  │    │   REST API      │
│  (macOS)        │◄──►│  (localhost:8000)│◄──►│  (/v1/*)        │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
                       ┌─────────────────┐              │
                       │ Model Manager   │◄─────────────┘
                       │ (Queue/Memory)  │
                       └─────────────────┘
                                │
                       ┌─────────────────┐
                       │  MLX Engine     │
                       │ (Apple Silicon) │
                       └─────────────────┘

📚 API Documentation

Full API documentation is available at /v1/docs when the server is running, or see API.md for complete endpoint reference.

Key Endpoints

🎯 Core AI Services

POST /v1/chat/completions - OpenAI-compatible chat (text + images + Mistral Small)
POST /v1/embeddings - NEW: Multi-architecture embeddings (BGE, MiniLM, Qwen3, Arctic)
POST /v1/audio/transcriptions - NEW: Enhanced audio transcription (Whisper Turbo, Parakeet)

🛠️ Model Management

GET /v1/models - List installed models
POST /v1/models/install - Install from HuggingFace
POST /v1/models/{name}/load - Load model into memory
GET /v1/discover/models - Search HuggingFace for MLX models
GET /v1/discover/embeddings - NEW: Search for embedding models
GET /v1/discover/stt - NEW: Search for audio transcription models

📊 System Operations

GET /v1/system/status - System and memory status
GET /v1/manager/status - Detailed model manager status

🛠️ Development

Setup Development Environment

git clone https://github.com/RamboRogers/mlx-gui.git
cd mlx-gui

# Install in development mode with audio and vision support
uv sync --extra dev --extra audio --extra vision

# Start development server in one terminal
uv run mlx-gui start --log-level debug --reload

In another terminal ...

# Run all tests
uv run pytest

OR

# Run quick smoke tests
uv run pytest tests/test_audio.py::test_audio_transcription -q
uv run pytest tests/test_embeddings_endpoint.py -q

Notes:

The embeddings test includes a base64 variant that is currently skipped unless the server is configured to return base64-encoded vectors. Default output is floats.

Build Standalone App

# Build macOS app bundle (script runs `uv sync` automatically)
uv run ./scripts/build_app.sh

# Result: dist/MLX-GUI.app

Notes:

The build script performs uv sync --frozen --extra app --extra audio --extra vision by default for reproducibility.
To skip syncing (e.g., if you already ran uv sync): set SKIP_UV_SYNC=1 before the command.

FFmpeg on Apple Silicon (arm64)

The build prefers the Homebrew arm64 binaries at /opt/homebrew/bin/{ffmpeg,ffprobe} and bundles matching libav*, libsw*, and libpostproc* dylibs inside the app for runtime.
For development/tests, ensure the Homebrew tools are used:
- Set environment variables:
  - FFMPEG_BINARY=/opt/homebrew/bin/ffmpeg
  - FFMPEG_PROBE=/opt/homebrew/bin/ffprobe
  - PATH=/opt/homebrew/bin:$PATH
We intentionally avoid bundling PyAV's vendored __dot__dylibs to prevent symbol conflicts.
The ffmpeg_binaries/ directory under the repo root is a build artifact staging area used by the app bundle process and should not be committed to source control.

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

⚖️ License

MLX-GUI is licensed under the GNU General Public License v3.0 (GPLv3).
Free Software

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
media		media
scripts		scripts
src/mlx_gui		src/mlx_gui
tests		tests
.gitignore		.gitignore
API.md		API.md
LICENSE		LICENSE
README.md		README.md
app_icon.icns		app_icon.icns
claude.md		claude.md
database_schema.sql		database_schema.sql
icon.png		icon.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🚀 Apps using MLX-GUI

🎤 MLX-Transcribe - Lightning-Fast Voice Transcription

📦 Latest Release

🎉 v1.3.0 - Optimizations core Updates and Bug Fixes (December 14 2025)

📚 Previous Releases

🎉 v1.2.4 - Universal AI Ecosystem (July 22 2025)

🚀 NEW: Advanced Audio Intelligence

🧠 NEW: Complete Embedding Ecosystem

🤖 NEW: Mistral Small Integration

🛠️ Technical Excellence

Why ?

🚀 Features

🎯 Universal AI Capabilities

🛠️ Enterprise-Grade Infrastructure

🎨 User Experience

🤖 Tested Models

📋 Requirements

🏃‍♂️ Quick Start

Option 1: Download Standalone App (Recommended for Mac Users)

Option 2: Install from PyPI

Option 3: Install from Source (with uv - Recommended)

🎮 Usage

An API Endpoint for Jan or any other AI app

System Tray (Recommended)

Web Admin Interface

API Usage

OpenAI-Compatible Chat

Vision Models with Images

Audio Transcription

Text Embeddings

Install Models

🏗️ Architecture

📚 API Documentation

Key Endpoints

🎯 Core AI Services

🛠️ Model Management

📊 System Operations

🛠️ Development

Setup Development Environment

Build Standalone App

FFmpeg on Apple Silicon (arm64)

🤝 Contributing

⚖️ License

Connect With Me 🤝

🙏 Acknowledgments

⭐ Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 3

Languages

Packages