Releases · SharpAI/SwiftLM

SwiftLM b197-7f62ac9

fix(deps): use remote URL dependencies for mlx-swift and mlx-swift-lm

Changelog

fix(deps): use remote URL dependencies for mlx-swift and mlx-swift-lm (7f62ac9)

Download

macOS Apple Silicon (arm64)

Quick Start

Please refer to the Getting Started section in the README for full installation and usage instructions.

Note: mlx.metallib is bundled in this archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b196-da535ea

chore: isolate HomeSec benchmark output to local tmp directory

Changelog

chore: isolate HomeSec benchmark output to local tmp directory (da535ea)
fix(benchmark): remove trailing /v1 from gateway URL for HomeSec option (c508b04)
chore: sync and lock mlx-swift-lm to latest main (73429e1)
feat: integrate HomeSec Benchmark as option 3 (LLM only) (5da1648)
feat: add Delete ALL Models option mapping to huggingface hub cache (c75b50a)
feat: add 8-bit model variants and model maintenance option to benchmark menu (36e7d49)
chore: bump mlx-swift-lm for Gemma 4 mixed-precision shape fix (bbb137a)
chore(release): finalize 0.2.9 release candidate with Gemma 4 8-bit verification and updated benchmark scripts (4a73a5f)

Download

macOS Apple Silicon (arm64)

Quick Start

Please refer to the Getting Started section in the README for full installation and usage instructions.

Note: mlx.metallib is bundled in this archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b188-cf0319f

ci: add C/C++/Metal extensions and lock file to release trigger paths

Changelog

ci: add C/C++/Metal extensions and lock file to release trigger paths (cf0319f)
Rename SwiftLM Chat to SwiftBuddy in README (Resolves #13) (5ad7716)
Make SwiftLM macOS screencast GIF clickable to high-res YouTube video (8d67925)
Stack iOS app demo beneath macOS demo in README header (826f905)
Promote iOS app section higher in README below Features (e336438)
Fix ugly README layout by moving mobile GIF to iOS section and prioritizing wide Mac demo (bf9d1bd)
Add macOS inference demo GIF to README (8b7d407)
Remove redundant GPU metallib warning from README (7170997)
Update release CI to use build.sh and package mlx.metallib instead of default.metallib (99e1679)
Move Quick Start (Getting Started) setup instructions to top of README (734b938)
Update README to replace disjointed scripts with unified run_benchmark.sh documentation (8eec8e1)
Refactor run_benchmark.sh to apply model picker to both benchmark suites (5f27226)
Consolidate both benchmark suites into run_benchmark.sh interactive menu (e1a50cf)
Add killall SwiftLM to end of bash test loop (6af47c3)
Restructure benchmarks section with Test 1 and Test 2 headers (5d73e0e)
Update README with correct binary path and clarify sliding window test (ddc8a75)
Update README for new build workflow and change Qwen2.5 to 3.5 in benchmark menu (e978096)
Add rich ANSI console visualization after benchmark completes (c199866)
Remove huggingface_hub dependency — SwiftLM downloads models natively via HubApi (9f643a8)
Build mlx.metallib from source via cmake instead of tracking pre-built binary (391cb43)
Fix build.sh: use tracked pre-built metallib instead of dynamic find (eab1e47)
Rename SwiftLM Chat to SwiftBuddy in README (553f637)
Force-add the version-matched default.metallib binary so it is available upon clone (fab10ac)
Add interactive benchmark launch script with menu (af60626)
Support automatic HuggingFace downloading to ./models via profile_runner.py (54f7121)
Update build.sh to dynamically find default.metallib (44a0baa)
Fix Liquid syntax errors, add build.sh, create tmp directory in profile_runner (045120e)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf SwiftLM-b188-macos-arm64.tar.gz
# mlx.metallib is included — run from the extracted directory
./SwiftLM --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: mlx.metallib is bundled in this archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b160-1233435

feat: complete extreme context profiling & fix prompt cache for TurboQuant

Fix: Prevent prompt cache from decoding TurboQuant compressed polar buffers back to fp16, saving ~19GB GPU allocation at 100K context.
Feat: Add GPU allocation tracking via ioreg to capture true memory demand including swap memory.
Docs: Update README with benchmark summary and multi-device profiling results structure.
Add run-benchmark workflow skill.
Add total memory and active GPU memory monitoring in MemoryUtils.

Changelog

feat: complete extreme context profiling & fix prompt cache for TurboQuant (1233435)
feat: extend profile_runner.py parameterization to test extreme contexts - Add --contexts flag to seamlessly loop through scale factors - Refactor script to output extended markdown matrix encompassing context depths - Enables sequential TTFT scaling tests up to 100k prompts (170501a)
feat: persist Aegis-AI Physical Model Profiler and backend physical memory logger - Injects C++ 'mach_task_basic_info' logging to parse real Apple Silicon wire memory limit - Extracts 'OS_RAM' string output at prefill boundaries - Integrates interactive --model parameter into profiling script matrix for ease-of-use. (2cd373f)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf SwiftLM-b160-macos-arm64.tar.gz
# default.metallib is included — run from the extracted directory
./SwiftLM --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: default.metallib is bundled in this archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b156-39ecadd

chore: move debugging scripts to dedicated folder

Changelog

chore: move debugging scripts to dedicated folder (39ecadd)
test: update harness runner loopback bindings and benchmark report (e593137)
fix: stabilize Gemma4 MoE inference — dynamic attention mask slicing (32ce0e2)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf SwiftLM-b156-macos-arm64.tar.gz
# default.metallib is included — run from the extracted directory
./SwiftLM --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: default.metallib is bundled in this archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b153-c59f6a1

Merge pull request #11 from SharpAI/feature/gemma-4-inference-ahan-moment

Feature/gemma 4 inference ahan moment

Changelog

Bump submodule. (e2ca7cc)
Bump submodule to enable Gemma 4 SSD streaming (dee5c00)
Bump mlx-swift-lm to fix MediaProcessing Swift 6 Strict Concurrency error (f59d527)
Support OpenAI's developer role (3d53a89)
Update InferenceEngine to load TransformersTokenizer from HubDownloader and update submodule reference (6e94301)
Map ChatCompletionRequest tool_calls natively into Chat.Message to retain contextual history (63a5cf8)
Update mlx-swift-lm submodule: Gemma4 tool parser + weight mapping (2955afa)
Update mlx-swift-lm submodule: RotatingKVCache mask fix (ee39ad2)
Fix JSON mode system prompt injection template exception (0639d5a)
Add Gemma4 native tool call parser (64d8a3c)
Fix Gemma 4 sliding window rotating KV cache regression and weight mapping (03025a4)
Fix SwiftLM inference server cache alignment, sliding window sigtrap, and prompt cache save race condition (a2b70dc)
feat: Sync submodule — TurboKV 512-dim virtual head splitting (67562be)
fix: Prevent crash on full prompt cache hit (100% match) (eac5ab3)
feat: Stabilize Gemma-4 backend inference and sync submodules (32dd183)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf SwiftLM-b153-macos-arm64.tar.gz
# default.metallib is included — run from the extracted directory
./SwiftLM --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: default.metallib is bundled in this archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b137-5f51468

Export MLXInferenceCore in Package.swift

Changelog

Export MLXInferenceCore in Package.swift (5f51468)
chore: Update Gemma 4 benchmark metrics and add comprehensive testing suite (5a05548)
feat: rename SwiftLMChat → SwiftBuddy, add design doc (d770bce)
docs: remove duplicate GIF embed, keep single intro line for iOS 13 Pro 6GB (049bce7)
docs: fix iOS demo GIF path to existing docs/demo.gif (f821ce0)
docs: add iPhone 13 Pro 6GB live demo GIF intro to iOS section (cf19434)
feat(chat): unified iOS + macOS premium UI overhaul (66fe453)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf SwiftLM-b137-macos-arm64.tar.gz
# default.metallib is included — run from the extracted directory
./SwiftLM --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: default.metallib is bundled in this archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b129-afb677c

fix(ci): compile default.metallib from .metal sources instead of searching for binary

The .metal shader sources are tracked in git but default.metallib is
gitignored (*.metallib rule). Previous approach searched for a pre-built
binary that CI never has. Now compiles fresh from the 39 tracked .metal
source files using xcrun metal + metallib — guaranteed version-matched
to the Swift binary by construction since it uses the same source files.

Changelog

fix(ci): compile default.metallib from .metal sources instead of searching for binary (afb677c)
docs: warn against Python mlx-metal metallib version mismatch (33e1511)
fix(release): correct metallib source — it ships in mlx-swift submodule, not built by swift build (e6556fc)
fix(release): bundle default.metallib in release tarball (2d6b174)
docs: add flash-moe reference to README and introduce benchmark test script (11e7078)
chore: bump mlx-swift-lm submodule (iOS I/O fix, ExpertStreaming, Mistral4) (1922374)
docs: add iOS demo GIF, iOS build instructions, and contributor Team ID note (c22abd0)
feat(ios): iOS-first TabView UI + stable inference lifecycle (d31ad49)
feat(swiftlmchat): HuggingFace live search + font/color fixes (ffe7b23)
fix(mlx-swift): remove non-existent cuda.cpp from Package.swift exclude list (6a7449a)
fix(swiftlmchat): full xcodebuild macOS compilation (4a46560)
fix(inference): Swift 6 Sendable + deprecated API cleanup on main (8d08728)
feat: iOS expert streaming via mmap page-cache for MoE models (541da29)
feat(swiftlmchat): proactive iOS lifecycle — unload on background, reload on foreground (d454c0c)
feat(swiftlmchat): platform-aware model management for iOS + macOS (dc4069f)
fix(swiftlmchat): wire MLXInferenceCore sources + SPM packages into Xcode project (32b7483)
feat: iOS expert streaming via mmap page-cache for MoE models (8671bc9)
feat(swiftlmchat): proactive iOS lifecycle — unload on background, reload on foreground (7def72a)
feat(swiftlmchat): platform-aware model management for iOS + macOS (c8bd3ba)
fix(swiftlmchat): wire MLXInferenceCore sources + SPM packages into Xcode project (35e6172)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf SwiftLM-b129-macos-arm64.tar.gz
# default.metallib is included — run from the extracted directory
./SwiftLM --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: default.metallib is bundled in this archive. Keep it in the same directory as the SwiftLM binary — Metal GPU compute will fail if it is missing.

SwiftLM b107-bf9e87e

Merge pull request #5 from SharpAI/feature/api-parity-roadmap

Feature/api parity roadmap

Changelog

feat(prompt-cache): token-by-token prefix match (llama-server style) (01df003)
docs(turbo-kv): add implementation status, hot-window design rationale, commit references (6ede853)
fix(turbo-kv): drop token count from log, show ratio+MB saved (layer-agnostic) (3df7430)
feat(turbo-kv): add mlx_turbo_kv_record C atomic + 10s log hook (dc6af72)
feat(turbo-kv): stage 2 — activate compressed KV attention pipeline (28a00f9)
feat(turbo-kv): add turbo_decode_k/v — batch dequantize for SDPA attention (e141627)
feat: add SwiftLM Chat multiplatform app (iOS + macOS) (957f763)
feat(turbo-kv): support head_dim=256 via two 128-dim sub-groups (48fd996)
fix(server): null content in tool_calls log; bump mlx-swift-lm for turbo-kv fixes (4143d3b)
chore(deps): bump mlx-swift-lm for SSD background telemetry fixes (c323412)
fix(server): support top-level enable_thinking parameter (e444985)
fix(server): support Qwen tags in state tracker (3f4cc09)
docs: clarify TurboQuant hybrid architecture in README (6c6d62c)
docs(engine): add TurboQuant C++ architecture notes (a83fa7d)
feat(metrics): expose SSD Flash-Stream stats to /metrics endpoint (26d2319)
fix(ssd-stream): route metrics to stderr, throttle to 10s, fix MB/s calc (60d538b)
fix(ci): Fix stale PCH module cache error after mlx-server→SwiftLM rename (ce626c1)
fix(metrics): rename Prometheus metrics from mlx_server_ to swiftlm_ prefix (60cc3e3)
docs: add MIGRATION_NOTE.md for Aegis-AI mlx-server → SwiftLM rename (1f7087b)
refactor: rename project from mlx-server to SwiftLM (480e349)
feat(turboquant): wire --turbo-kv flag into server and KVCache (2ca5b02)
feat(turboquant): implement turbo_encode_k/v CPU encode path (8286492)
fix(mlx-c): stub out turbo_encode to fix CI build (70ac5e8)
fix: Correct buffer range removal in ThinkingStateTracker (use ..<upperBound) (7dd655f)
feat: Add thinking/reasoning support (ThinkingStateTracker + prefill heartbeat) (dfa9fba)
test: Add TurboQuant unit tests to CI/CD pipeline (a042879)
docs: remove llama.cpp VLM comparison table from README (511a59b)
feat: Implement real TurboQuant KV cache compression (ported from llama-cpp-turboquant) (2bb7017)
docs: Add TurboQuant KV cache algorithm description to README (ab35fd2)
fix(server): Remove debug prompt_debug print from slot_launch log (7c20227)
fix(ssd): Restore 5 GB/s throughput + correct output via tensor_name cache key fix (95edfdc)
fix(ssd): Key expert offset cache by tensor_name not E — gate/up/down proj no longer collide (b814c88)
feat(ssd): Wire mlx_fast_pread_into for high-throughput SSD weight streaming (1c1ded9)
feat(ssd): Add mlx_fast_pread_into for direct NVMe reads into evaluated MLX buffers (3ea2afb)
feat(ssd): Restore SSD stream metrics around prefault() call (2f655a1)
fix(mlx-server): Restore correct output by using prefault+slice instead of raw SSD blob (5849c5a)
fix(mlx-server): Restore SSD streaming throughput and mem-limit enforcement via C++ memory-aware loader (231c62c)
fix(ssd): Rewrite streamed_gather_mm primitive to load directly into allocator::malloc enforcing memory limits (65e5497)
feat: llama-server style logging + SSE CRLF fix (82bcc4b)
feat: real-time token streaming to stdout + fflush (da21efe)
feat: per-request chat_template_kwargs.enable_thinking support (8d3e15f)
fix(build): capture streamExperts as local let before escaping health route closure (95a126d)
fix(metrics): correct gpu_layers, strategy, and estimated_tok_s for SSD streaming mode (4dc61a6)
fix: replace SWAP-ASSISTED warning with SSD STREAMING label when streamExperts active (40d65d0)
feat: add thinking and ssd_stream to Config log line for observability (54a619f)
feat: log full JSON response body matching llama-server log_server_r format (4c5e54b)
feat: add llama-server style generation logging + API response format docs (d3da36e)
fix(memory): use physical RAM budget for SSD streaming instead of Metal's capped working set size (df7d154)
docs: add AEGIS_INTEGRATION.md with complete Aegis-AI sidecar setup guide (b19abdb)
feat(mlx-swift): implement 1-second interval aggregated SSD read metrics for cleaner console output (b0b3b9b)
build(ci): add .gitmodules mapping to fix mlx-swift-lm cloning failure in GitHub Actions (3c879db)
docs: remove Aegis-AI integration block temporarily to prepare for new hero section (f27da83)
docs: remove outdated Metal compile lockup warning as MoE streamed inference primitives resolve the delay (0c3288b)
docs: add M5 to requirements and highlight pre-built binary usage (2de99c9)
fix(Server): update mlx-swift-lm submodule to receive Evaluate.swift iteration mapping bugfix (5d2ea4b)
fix: correct finish_reason=length and tool_calls test robustness (d40e9e4)
test(e2e): extend test suite from 21 to 31 tests (7f3911e)
ci: re-trigger workflow after mlx-swift-lm submodule push (e6a421a)
fix(ci): add mlx-swift-lm git submodule and checkout submodules in CI - Create .gitmodules pointing to SharpAI/mlx-swift-lm.git - Add submodules: recursive to both build.yml and e2e-test.yml checkout steps - Remove mlx-swift-lm from .gitignore so git tracks the submodule pointer (1aaa13b)
fix: enforce SIGKILL in e2e tests and expand HF timeout (9f70969)
docs: update quick start and curl snippet to demonstrate 122B model deduplication JSON query (349fc53)
docs: remove vLLM completely from matrix (5b31a29)
docs: fix test hardware to M5 Pro 64GB (8b1d723)
docs: revert incorrect Flash-MoE designation for mlx-server and restore vLLM column (b2ccae9)
docs: remove vLLM column and correctly attribute Flash-MoE features (310b940)
fix(ci): relocate standalone test scripts to scripts/ to prevent implicit SPM test target failure (c6078ad)
docs: fix hardware specs and document 4-bit JSON quantization caveat (8082313)
test: structure test scripts into tests directory and ignore artifacts (e0da633)
docs: add Flash-MoE and vLLM to comparison table (91852f8)
docs: recreate README with mlx-server comparisons and architecture details (ed2d2b0)
feat(mlx-swift): expose MLXFast.streamedGatherMM and update c-api signature (9131ff7)
feat(server): auto-wire safetensors resolution and stream environment path mapping (e29155c)
feat(mlx-c): expose SSD streamed_gather_mm primitive to c-api (b5e6ade)
feat(mlx): integrate core C++ unified memory SSD streaming primitives (86bcee8)
feat: localize MLX frameworks to write C++ turboquant and ssd streamer scaffolds (eced528)
fix(ux): add autonomous task-driven progress bar and restore MB counts (7d23eba)
feat(ux): add robust caching bandwidth speedometer (441cb2b)
fix(ux): progress bar fraction handling for non-byte counts (3472cba)
feat: add download speed and progress bar UX (5255cd6)
feat(moe): Expose --stream-experts flag to enable SSD inference streaming for large MoE models (7210980)
feat: add auto-calibration 'Wisdom' system (3aae8e2)
chore: update fork to f8f315b (20 model architectures with LayerPartitionable) (8dd340e)
fix: restore Package.resolved and add CI retry for HF downloads (ebef9b0)
feat: wire GPU/CPU layer partitioning to --gpu-layers flag (837ced0)
feat: add memory-aware model partitioning framework (9c89cfc)
feat: GPU yield — prevent Metal from starving macOS WindowServer (fd4a5e3)
fix: CI — install mlx.metallib from Python mlx package (30c06d6)
feat: Prompt caching — reduce TTFT by reusing system prompt KV state (20e1ce8)
feat: API key authentication (--api-key flag) (9fe2175)
feat: Phase 3 — Memory limit, /metrics, enhanced /health, graceful shutdown, stats (e4ebecb)
feat: Phase 2 — JSON mode, VLM vision support, multipart content, extra sampling params (6589cbe)
feat: Phase 1 API parity with mlx-lm (337ec6d)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf SwiftLM-b107-macos-arm64.tar.gz
./SwiftLM --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

mlx-server b21-2d382ef

Merge pull request #3 from SharpAI/feature/api-parity-roadmap

Feature/api parity roadmap

Changelog

fix: CI — install mlx.metallib from Python mlx package (4086ce9)
feat: Prompt caching — reduce TTFT by reusing system prompt KV state (6b34c97)
feat: API key authentication (--api-key flag) (75e927d)
feat: Phase 3 — Memory limit, /metrics, enhanced /health, graceful shutdown, stats (433d90e)
feat: Phase 2 — JSON mode, VLM vision support, multipart content, extra sampling params (bfc980a)
feat: Phase 1 API parity with mlx-lm (519bfda)

Download

macOS Apple Silicon (arm64)

Quick Start

tar -xzf mlx-server-b21-macos-arm64.tar.gz
./mlx-server --model mlx-community/Qwen2.5-3B-Instruct-4bit --port 5413

Note: Requires mlx.metallib next to the binary for GPU compute. See README for setup.

Releases: SharpAI/SwiftLM

SwiftLM b197

SwiftLM b197-7f62ac9

Changelog

Download

Quick Start

Uh oh!

SwiftLM b196

SwiftLM b196-da535ea

Changelog

Download

Quick Start

Uh oh!

SwiftLM b188

SwiftLM b188-cf0319f

Changelog

Download

Quick Start

Uh oh!

SwiftLM b160

SwiftLM b160-1233435

Changelog

Download

Quick Start

Uh oh!

SwiftLM b156

SwiftLM b156-39ecadd

Changelog

Download

Quick Start

Uh oh!

SwiftLM b153

SwiftLM b153-c59f6a1

Changelog

Download

Quick Start

Uh oh!

SwiftLM b137

SwiftLM b137-5f51468

Changelog

Download

Quick Start

Uh oh!

SwiftLM b129

SwiftLM b129-afb677c

Changelog

Download

Quick Start

Uh oh!

SwiftLM b107

SwiftLM b107-bf9e87e

Changelog

Download

Quick Start

Uh oh!

mlx-server b21

mlx-server b21-2d382ef

Changelog

Download

Quick Start

Uh oh!