v1.0.6 — Apache 2.0

AI server running 100% locally.
Persistent memory across conversations.
Zero data in the cloud.

Minimum viable product for the real world. Open to community feedback. 🚀

macOS
Download the DMG
v1.0.6 · Apple Silicon
Linux
Download AppImage
v1.0.6 · ARM64 · Ubuntu 24.04+ · ~1.1 GB
chmod +x nexe-app_*.AppImage && ./nexe-app_*.AppImage

Six pillars

Local & Private

Runs entirely on your computer. No conversations, no data, no documents ever leave your device. Absolute privacy guaranteed by architecture.

RAG Memory

Remembers information across sessions with 768-dimensional embeddings in Qdrant. Indexes MD, PDF and TXT documents. Toggle individual collections on/off from the sidebar.

Multi-backend

Native MLX for Apple Silicon, universal llama.cpp, or Ollama bridge. Switch model and backend without rewriting anything. Unified API.

Modular

Each backend is an independent plugin. Add new features without touching the core. Architecture designed to grow and experiment.

Automatic Memory

The server auto-saves relevant information from conversations with trilingual intent detection, intelligent deduplication and automatic pruning. Delete facts with MEM_DELETE and see each save as a collapsible blue block.

Multilingual

Full i18n system in CA/ES/EN for the interface, system prompts, RAG labels and error messages. Switch language without restarting.

Four commands

01 — Clone the repository
$ git clone https://github.com/jgoy-labs/server-nexe
$ cd server-nexe
02 — Guided installation
# Detects hardware, picks backend and model
$ ./setup.sh
03 — Start the server
$ ./nexe go
# → http://localhost:9119
# → http://localhost:9119/ui
04 — Chat with memory
$ ./nexe chat --rag
# Store information:
$ ./nexe memory store "..."

Choose your engine

RECOMMENDED · MAC

MLX

Native for Apple Silicon. Maximum performance on your M1/M2/M3. Uses the Neural Engine GPU at 100%. Best option if you have a modern Mac.

Apple Silicon GPU accelerated mlx-community
UNIVERSAL

llama.cpp

Compatible with all GGUF formats. Works on Mac (Metal GPU), Linux and Windows. Lightweight, flexible and very active community.

GGUF Metal GPU Cross-platform
BRIDGE

Ollama

If you already have Ollama installed, NEXE can use it directly as a backend. Reuse all the models you already have downloaded.

Ollama API Reuse models Easy integration

Explore the project

Start now

Download it. Break it. Experiment.

NEXE is your local assistant. Ask it how it works, how to create plugins or how to extend it. It remembers context. Always local.