GitHub - LVT382009/noxem

Features · Quick Start · Architecture · Config · Benchmarks · Contributing

Features

🧠 Brain 1 — Semantic Engine


Hybrid Search	Vector KNN + FTS5 keyword, merged via Reciprocal Rank Fusion
Auto-Categorization	Tags: preference, project, profile, goal, entity, event, fact...
Smart Dedup	Cosine >0.92 → merge automatically
Conflict Resolution	Entity-attribute matching → older superseded
Contextual Enrichment	Context prefix before indexing — ~49% better retrieval
Weibull Decay	Profiles never decay, requests expire in 3 days
Spaced Repetition	Recalled memories stay relevant longer
Adaptive Search	Classifies query intent, weights vector vs keyword
MMR Diversity	No near-identical results in search
Provenance Graph	Full lineage tracking through supersession history

🚀 Brain 2 — Reasoning Engine


Cloud or Local	QwenProxy (cloud) or any local OpenAI-compatible LLM
Ollama / LM Studio / llama.cpp	Drop-in — just enter your base URL + model name
Drift Detection	Warns when conversation goes off-goal
Context Recovery	Preserves critical info across compaction
Session Extraction	Stores key memories when session ends
Background Research	Detects topics → DDG search → extract facts → store
Multi-Query Expansion	Generates alternate phrasings for vague searches
Consolidation	Clusters low-importance → single high-importance summary
Category Auto-Correction	Catches and fixes misclassified memories
Search Feedback Loop	Boost importance for memories that influenced responses
Bi-Temporal Tracking	`valid_from` / `valid_until` timestamps
Research Hints	Compact summaries injected — no fact dump

Tip

Run hermes-noxem to start. Choose Brain 1 only (fast, low RAM) or Brain 1 + Brain 2. Brain 2 supports two providers:

QwenProxy — free cloud Qwen 3.6 Plus (requires Qwen account, auto-login)
Local model — any OpenAI-compatible endpoint (Ollama, LM Studio, llama.cpp, etc.)

Local mode is fully offline — no account needed. DDG search handles research when no cloud LLM is available.

Quick Start

Requirements: Node.js 22+, Python 3.10+, Hermes Agent v2026+

Linux / WSL

git clone https://github.com/LVT382009/noxem.git
cd noxem
bash install.sh

macOS

xcode-select --install
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install node
git clone https://github.com/LVT382009/noxem.git
cd noxem
bash install.sh

Note

First run downloads Brain 1 (~300 MB).

How to Use

hermes-noxem                  # Launch with interactive selection
hermes-noxem --qwenproxy      # Launch with cloud Brain 2 (no prompt)
hermes-noxem --local          # Launch with local Brain 2 (no prompt)
hermes-noxem --no-brain2      # Launch memory-only (no prompt)
hermes-noxem --resume <id>    # Continue a session with Noxem

Mode	Enabled	Best for
Brain 1 only	Semantic search, dedup, categorization, FTS5	Low RAM, quick lookups
Brain 1 + QwenProxy	Everything + cloud Qwen 3.6 advisor + research	Full sessions, web research
Brain 1 + Local	Everything + local LLM advisor + DDG research	Fully offline, privacy-first

When you select Brain 2, you'll be asked to choose a provider:

Brain 2 — Provider Selection

 [1] Qwen 3.6 Plus — free cloud via QwenProxy (requires Qwen account)
 [2] Local model — any OpenAI-compatible LLM (Ollama, LM Studio, llama.cpp...)
 [3] Skip Brain 2

If you choose Local model, you'll be prompted for:

Setting	Description	Example
Base URL	Your LLM server's OpenAI-compatible endpoint	`http://localhost:11434/v1` (Ollama)
Model name	The model identifier your server expects	`gemma4:e4b`, `qwen3:8b`, `llama3.1`
API key	Optional — not needed for Ollama or llama.cpp	Leave empty to skip

Supported Local Providers

Provider	Default Base URL	Notes
Ollama	`http://localhost:11434/v1`	Auto-detects installed models
LM Studio	`http://localhost:1234/v1`	Start local server first
llama.cpp	`http://127.0.0.1:8080/v1`	Use `--jinja` flag for reasoning support
Any OpenAI-compatible	Your URL	Must support `/v1/chat/completions`

Note

When using a local model, web research falls back to DuckDuckGo search instead of QwenProxy's built-in search. This works fully offline (DDG doesn't need a cloud LLM).

Configuring via `hermes memory setup`

Settings can also be configured through the Hermes setup wizard:

hermes memory setup

This saves your configuration to ~/.hermes/noxem.json, which the launcher reads on startup. Available settings:

Key	Description	Default
`memory_server`	Memory server URL	`http://127.0.0.1:3001`
`brain2_provider`	`qwenproxy` or `local`	`qwenproxy`
`llm_url`	LLM API endpoint	`http://127.0.0.1:8000/v1/chat/completions`
`llm_model`	Model name for LLM calls	`qwen3.6-plus-no-thinking`
`llm_api_key`	API key (optional)	(empty)
`embedding_enabled`	Enable Brain-1 vector search	`true`

Using Brain 2 as an OpenAI API

Brain 2 exposes a full OpenAI-compatible API on port 8000. Use it with any tool:

# Base URL
http://127.0.0.1:8000/v1

# Available models (varies by provider)
# QwenProxy mode: qwen3.6-plus, qwen3.6-plus-no-thinking
# Local mode: whatever your local server provides

# Example
curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.6-plus-no-thinking","messages":[{"role":"user","content":"Hello"}]}'

Both streaming and non-streaming are supported — it works as a drop-in OpenAI base URL.

hermes noxem status              # Server health + memory stats
hermes noxem search <query>      # Search stored memories
hermes noxem run                 # Run maintenance manually
hermes noxem config              # Show current configuration

Architecture

Hermes Agent
|
v
Noxem Plugin (Python) --HTTP--> Noxem Server (Node.js :3001)
|                               |
|              +----------------+----------------+
|              |                                 |
|     Semantic Engine                   LLM Adapter (:8000)
|     ---------------                   ----------------
|     Vector KNN                        Provider: qwenproxy | local
|     Dedup/categorize                  QwenProxy: SSE <-> JSON bridge
|     Importance score                  Local: direct passthrough
|              |                        Model name normalization (QwenProxy)
|              |                        API key forwarding
|              |                                 |
|              +----------------+----------------+
|                               |
|              Brain 2 Provider (chosen at launch)
|              ----------------
|              qwenproxy: QwenProxy Server (:3000) -> chat.qwen.ai (cloud)
|              local:    Ollama / LM Studio / llama.cpp / any OpenAI API
|                               |
|              Qwen3.6-plus (cloud)  OR  Local model (offline)
|                               |
|              SQLite DB
|              (FTS5 + Vectors)
|
+-- Tools: memory_search . memory_store .
           memory_supersede . memory_lineage .
           memory_contradiction_check . memory_feedback

Memory Lifecycle

Store --> Enrich --> Categorize --> Extract Entity --> Score Importance
|          |           |               |                  |
v          v           v               v                  v
SQLite   Context    Auto-tag       Entity+attr       0.1 - 1.0
+ FTS5   prefix     (12 types)     pairs             (type-based)
|
v
+-------------------------------------------------------------+
| Background Maintenance (every 5 min)                        |
| Dedup --> Contradict --> Consolidate --> Clean/Auto-correct |
+-------------------------------------------------------------+
|
v
Search --> Hybrid (KNN + FTS5) --> RRF merge --> MMR rerank --> Score
|
v
Feedback: recalled memories get importance boost (+0.03)

Commands

hermes-noxem                  # Launch with interactive selection
hermes-noxem --qwenproxy      # Launch with cloud Brain 2
hermes-noxem --local          # Launch with local Brain 2
hermes-noxem --no-brain2      # Launch memory-only

hermes noxem status           # Server health + memory stats
hermes noxem search <query>   # Search stored memories
hermes noxem run              # Run maintenance manually
hermes noxem config           # Show current configuration

Available Tools

Tool	Description
`memory_search`	Search with method: hybrid, vector, or keyword
`memory_store`	Store a fact with auto-categorization
`memory_supersede`	Mark old memory as superseded by newer one
`memory_lineage`	Trace provenance chain through supersession history
`memory_contradiction_check`	Find contradicting memories (same entity+attribute)
`memory_feedback`	Report which memory IDs influenced your response

Configuration

Variable	Default	Description
`MEMORY_PORT`	`3001`	Server port
`MEMORY_DB_DIR`	`./data`	Database directory
`DUP_THRESHOLD`	`0.92`	Deduplication sensitivity
`CONTRADICT_THRESHOLD`	`0.80`	Contradiction detection threshold
`ENABLE_MAINTENANCE`	`true`	Auto-cleanup every 5 minutes
`ENABLE_RESEARCH`	`true`	Background web research pipeline
`RESEARCH_MIN_INTERVAL`	`30000`	Min ms between research per session
`MEMORY_DECAY_HALF_LIFE`	`30`	Default recency decay (days)
`MEMORY_MAX_TOKENS`	`2000`	Token budget for context injection
`RATE_LIMIT_MAX`	`120`	Max requests per minute per IP
`AUTO_PURGE_DAYS`	`365`	Days before low-importance memories are purged
`BRAIN2_PROVIDER`	`qwenproxy`	Brain 2 mode: `qwenproxy` or `local`
`LOCAL_LLM_URL`	(empty)	Local LLM base URL (e.g. `http://localhost:11434/v1`)
`LLM_MODEL`	`qwen3.6-plus-no-thinking`	Model for Brain 2 calls
`LLM_API_KEY`	(empty)	API key for local LLM (optional)
`LLM_TIMEOUT`	`120000`	Brain 2 request timeout (ms)
`QWENPROXY_PORT`	`3000`	QwenProxy server port (cloud mode only)
`QWENPROXY_URL`	`http://127.0.0.1:3000`	QwenProxy upstream URL (cloud mode only)

Full env variable list

Variable	Default	Description
`ENABLE_EMBEDDING`	`true`	Enable Brain 1 semantic engine
`ENABLE_ADVISOR`	`true`	Enable Brain 2 advisor
`EMBEDDING_MODEL`	`default`	Brain 1 engine identifier
`EMBEDDING_DTYPE`	`q8`	Engine precision (fp32/q8/q4)
`EMBEDDING_DIM`	`256`	Brain 1 vector dimension
`EMBEDDING_LOAD_RETRIES`	`2`	Brain 1 engine retry count
`EMBEDDING_LOAD_TIMEOUT`	`300000`	Brain 1 engine load timeout (ms)
`EMBEDDING_CLEAR_CACHE_ON_RETRY`	`false`	Clear engine cache on retry
`LLM_URL` / `GEMMA_URL`	`http://127.0.0.1:8000/v1/chat/completions`	LLM API endpoint (adapter proxies to Brain 2)
`LLM_PORT` / `GEMMA4_PORT`	`8000`	Adapter listening port
`MEMORY_MAX_RESULTS`	`5`	Default search result limit
`MEMORY_API_KEY`	(empty)	Bearer token for API auth
`CORS_ORIGIN`	`http://localhost:*`	CORS allowed origins
`LOG_LEVEL`	`info`	Log verbosity (`silent` to suppress)
`HF_FETCH_TIMEOUT`	`180000`	Component download timeout (ms)
`HF_FETCH_RETRIES`	`3`	Retry count for failed component downloads

Benchmarks

Tested on WSL2 Ubuntu, Node.js 22. Run your own: cd server && bash benchmark.sh

Operation	Latency	Notes
Store (single)	~23 ms	Auto-categorization + entity extraction + FTS5
Store (batch 50)	~0.6 ms each	Bulk insert, single transaction
Search (hybrid)	~25 ms	Vector KNN + FTS5 via RRF
Search (FTS)	~26 ms	Full-text with Weibull scoring
Sync turn	~20 ms	Store user + assistant messages
Maintenance cycle	~18 ms	Dedup + contradiction + consolidation + archive

Note

With Brain 1 enabled, hybrid search adds ~5-10 ms for vector KNN lookup. Brain 1 loads in the background without blocking server startup.

Contributing

Fork the repo
Create a feature branch (git checkout -b feature/my-feature)
Commit your changes (git commit -m 'Add my feature')
Push to the branch (git push origin feature/my-feature)
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.github		.github
hooks		hooks
plugins/memory/noxem		plugins/memory/noxem
scripts		scripts
server		server
.gitignore		.gitignore
README.md		README.md
fix-memory-provider.sh		fix-memory-provider.sh
hermes-config.yaml		hermes-config.yaml
install.bat		install.bat
install.sh		install.sh
noxem-launcher.bat		noxem-launcher.bat
noxem-launcher.sh		noxem-launcher.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

🧠 Brain 1 — Semantic Engine

🚀 Brain 2 — Reasoning Engine

Quick Start

Linux / WSL

macOS

How to Use

Supported Local Providers

Configuring via `hermes memory setup`

Using Brain 2 as an OpenAI API

Architecture

Memory Lifecycle

Commands

Available Tools

Configuration

Benchmarks

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

🧠 Brain 1 — Semantic Engine

🚀 Brain 2 — Reasoning Engine

Quick Start

Linux / WSL

macOS

How to Use

Supported Local Providers

Configuring via hermes memory setup

Using Brain 2 as an OpenAI API

Architecture

Memory Lifecycle

Commands

Available Tools

Configuration

Benchmarks

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuring via `hermes memory setup`

Packages