Skip to content

suhaasteja/SageMem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SageMem

A GPU-memory-hierarchy-inspired memory layer for multi-agent LLM systems.

Four memory tiers (L1/L2/L3/DRAM), a MESI-style coherence protocol for shared beliefs, and capacity-based permission scoping — applied to semantic memory at the orchestration layer.

SageMem architecture


The problem

Multi-agent LLM systems have a memory problem that looks exactly like the one CPUs solved decades ago: many parallel compute units sharing state, with no principled protocol for who sees what updates when.

Existing frameworks (LangGraph, Mem0, Letta) handle concurrent writes with last-write-wins or ad-hoc reducers. Stale reads are silent. There is no cache coherence model in production multi-agent memory anywhere.

The approach

The GPU memory hierarchy was designed from day one for many parallel processors sharing state with bounded latency and explicit coherence. SageMem applies that model to semantic memory.

Tier Analog Storage Scope Latency
L1 SM registers + L1 In-memory LRU Per-agent ~0.000ms
L2 Shared L2 per GPC Redis Per-cluster ~0.2ms
L3 Global shared memory Postgres JSONB Global ~0.5ms
DRAM HBM / VRAM Postgres + pgvector Global/semantic ~0.4ms

MESI coherence: Each L1 cache entry carries a state (Modified / Exclusive / Shared / Invalid). When an agent writes a shared key, it broadcasts an InvalidateMessage over Redis pub/sub. All agents with that key cached receive the message and mark their entry Invalid. The next read re-fetches the fresh value.

Capacity as capability: Each agent declares which tiers it may access. ScopedHierarchy enforces this as a hard boundary — an agent restricted to L1+L2 physically cannot see L3 data.


Installation

Requires Python 3.12+, Redis, and Postgres with pgvector.

# Install dependencies
brew install redis postgresql pgvector
brew services start redis
brew services start postgresql
createdb sagemem_test
psql sagemem_test -c "CREATE EXTENSION IF NOT EXISTS vector;"

# Install the package
uv sync --extra dev

Usage

Basic hierarchy

import asyncio
from sagemem.tiers.l1 import L1Tier
from sagemem.tiers.l2 import L2Tier
from sagemem.tiers.l3 import L3Tier
from sagemem.hierarchy import MemoryHierarchy

async def main():
    l1 = L1Tier(capacity=128)
    l2 = L2Tier(url="redis://localhost:6379", namespace="myapp")
    l3 = L3Tier(dsn="postgresql://localhost/mydb")

    await l2.connect()
    await l3.connect()

    h = MemoryHierarchy(tiers=[l1, l2, l3])

    # Write to L3; read falls through L1→L2→L3, promotes on hit
    await h.set("fact:pi", 3.14159, tier_index=2)
    value = await h.get("fact:pi")   # L1 miss → L2 miss → L3 hit → promotes to L1+L2
    value = await h.get("fact:pi")   # L1 hit

asyncio.run(main())

Coherence protocol

from sagemem.coherence.bus import CoherenceBus
from sagemem.tiers.l1_coherent import CoherentL1Tier

bus = CoherenceBus(url="redis://localhost:6379")
await bus.connect()

l1_a = CoherentL1Tier(agent_id="agent-a", bus=bus)
l1_b = CoherentL1Tier(agent_id="agent-b", bus=bus)
await l1_a.start()  # subscribes to invalidation messages
await l1_b.start()

# Agent B caches a value
await l1_b.set("belief:x", "old value")

# Agent A writes — broadcasts InvalidateMessage
await l1_a.set("belief:x", "new value")
# l1_b's entry for "belief:x" is now Invalid
# Next read by agent B re-fetches from L2/L3

Capacity scoping

from sagemem.scope import AgentCapability, ScopedHierarchy, ScopeViolationError

cap = AgentCapability(
    tiers={0, 1},       # L1 + L2 only
    l3_read=False,
    l3_write=False,
)
scoped = ScopedHierarchy(hierarchy, cap, agent_id="restricted-agent")

await scoped.get("l3_key")    # returns None — L3 not in scope
await scoped.set("k", "v", tier_index=2)  # raises ScopeViolationError

Running tests

# Unit tests (no services needed)
uv run pytest tests/unit/ -v

# Integration tests (requires Redis + Postgres)
uv run pytest tests/integration/ -v

# Hypothesis property tests (MESI coherence invariants)
uv run pytest tests/property/ -v

# All tests
uv run pytest tests/ -q

Benchmarks

# Tier latency
uv run python benchmarks/tier_latency.py

# Full workload comparison vs flat-memory baseline
uv run python -m benchmarks.run_all

Live demo

uv run python -m demo.server.main
# Open http://localhost:8000

The demo shows two agents writing conflicting beliefs, the MESI invalidation firing over Redis pub/sub, and both agents converging to the same value in real time.


Documentation


Academic context

This project implements and extends the framework proposed in:

Yu et al., "Multi-Agent Memory from a Computer Architecture Perspective", arxiv:2603.10062, March 2026

The paper proposes a three-tier hierarchy and identifies two protocol gaps: cache sharing across agents, and structured memory access control. SageMem is an implementation of that framework with two concrete additions:

  1. A MESI-inspired coherence protocol with Redis pub/sub invalidation
  2. Capacity as a capability/permission boundary enforced at the API level

What this is not

  • Not a production-grade distributed system (no multi-region, no HA)
  • Not an inference engine modification (we do not touch KV cache tensors)
  • Not LangGraph, Mem0, or Letta — those are what this is designed to replace

Evaluation

locomo_eval/ contains a reproducible benchmark pipeline against the LoCoMo dataset (1,540 QA pairs across 10 multi-session conversations), using the Mem0/Memobase evaluation methodology.

cd locomo_eval
python scripts/download_dataset.py
python scripts/run_full_eval.py --adapter full_context --runs 3
python scripts/run_full_eval.py --adapter naive_rag --runs 3
python scripts/run_full_eval.py --adapter my_system --runs 3

Requires OPENAI_API_KEY in locomo_eval/.env. See locomo_eval/EVAL_PLAN.md for full methodology.

Releases

No releases published

Packages

 
 
 

Contributors

Languages