server.nexe — Self-Hosted Local AI Server with RAG Memory

v1.0.6 — Apache 2.0

AI server running 100% locally.
Persistent memory across conversations.
Zero data in the cloud.

Minimum viable product for the real world. Open to community feedback. 🚀

macOS

Download the DMG

v1.0.6 · Apple Silicon

Linux

Download AppImage

v1.0.6 · ARM64 · Ubuntu 24.04+ · ~1.1 GB

chmod +x nexe-app_*.AppImage && ./nexe-app_*.AppImage

GitHub Releases

MLX

llama.cpp

Ollama

RAG

Qdrant

Total Privacy

FastAPI

Plugins

768-dim embeddings

OpenAI compatible

Apple Silicon

Dual-key auth

MLX

llama.cpp

Ollama

RAG

Qdrant

Total Privacy

FastAPI

Plugins

768-dim embeddings

OpenAI compatible

Apple Silicon

Dual-key auth

Why NEXE

Six pillars

Local & Private

Runs entirely on your computer. No conversations, no data, no documents ever leave your device. Absolute privacy guaranteed by architecture.

RAG Memory

Remembers information across sessions with 768-dimensional embeddings in Qdrant. Indexes MD, PDF and TXT documents. Toggle individual collections on/off from the sidebar.

Multi-backend

Native MLX for Apple Silicon, universal llama.cpp, or Ollama bridge. Switch model and backend without rewriting anything. Unified API.

Modular

Each backend is an independent plugin. Add new features without touching the core. Architecture designed to grow and experiment.

Automatic Memory

The server auto-saves relevant information from conversations with trilingual intent detection, intelligent deduplication and automatic pruning. Delete facts with MEM_DELETE and see each save as a collapsible blue block.

Multilingual

Full i18n system in CA/ES/EN for the interface, system prompts, RAG labels and error messages. Switch language without restarting.

Let's start

Four commands

01 — Clone the repository

$ git clone https://github.com/jgoy-labs/server-nexe
$ cd server-nexe

02 — Guided installation

# Detects hardware, picks backend and model
$ ./setup.sh

03 — Start the server

$ ./nexe go
# → http://localhost:9119
# → http://localhost:9119/ui

04 — Chat with memory

$ ./nexe chat --rag
# Store information:
$ ./nexe memory store "..."

Available backends

Choose your engine

RECOMMENDED · MAC

MLX

Native for Apple Silicon. Maximum performance on your M1/M2/M3. Uses the Neural Engine GPU at 100%. Best option if you have a modern Mac.

Apple Silicon GPU accelerated mlx-community

UNIVERSAL

llama.cpp

Compatible with all GGUF formats. Works on Mac (Metal GPU), Linux and Windows. Lightweight, flexible and very active community.

GGUF Metal GPU Cross-platform

BRIDGE

Ollama

If you already have Ollama installed, NEXE can use it directly as a backend. Reuse all the models you already have downloaded.

Ollama API Reuse models Easy integration

Documentation

Explore the project

What is NEXE Philosophy, use cases and roadmap. → Installation Complete step-by-step guide to get started. → Architecture Modular architecture in three layers: Core → Plugins → Services. → REST API Complete reference. OpenAI /v1/chat/completions compatible. → RAG System How persistent memory works with Qdrant and embeddings. → Modular Modular plugin system and how to create new ones. → MEM_SAVE Automatic memory: intent detection, deduplication and intelligent pruning. →

Start now

Download it. Break it. Experiment.

NEXE is your local assistant. Ask it how it works, how to create plugins or how to extend it. It remembers context. Always local.

Download the DMG Installation guide