Teach your AI to remember, reflect, and grow.
Memory, Fast and Slow — for AI agents.
memfas is a local memory toolkit for AI agents. Point it at a directory, give it some files, and your agent gets persistent memory that survives context windows, session restarts, and compaction.
It works through simple CLI commands and a Python API — easy enough for an agent harness to call, and simple enough for an AI agent to use directly as a tool. No servers, no cloud dependencies. Just a local SQLite database and your memory files.
You: "Let's continue the project."
Agent without memory: "I apologize, but I don't have context about what project..."
Agent with memfas: "Right — the acme frontend, due Q2. Alice flagged the auth issue yesterday."
memfas also ships with skills — instruction files that teach agents how humans manage their knowledge. How to journal. When to write things down. How to review and consolidate. We humanize agents by giving them the same practices that make humans effective.
An agent's memory isn't one thing. Like human memory, it varies along two dimensions: how persistent it is (short-term vs long-term) and how fundamental it is (core identity vs episodic experiences).
short-term ──────────────────── long-term
fundamental ┌──────────┐ ┌──────────┐ ┌──────────┐
│ core │ GOAL.md │ │ ROLE.md │ │ SOUL.md │
│ └──────────┘ └──────────┘ └──────────┘
│
│ learned ┌─────────────┐ ┌────────────────────┐
│ │ Experiences │ │ Patterns & Skills │
│ └─────────────┘ └────────────────────┘
│
│ ┌────────────────────────────────────────┐
│ │ Project Memory │
│ └────────────────────────────────────────┘
│
│ episodic ┌─────────────┐ ┌────────────────────────┐
↓ │ Working │ │ Past Interactions │
transient │ Memory │ │ │
└─────────────┘ └────────────────────────┘
memfas supports all of these:
- Core memory — files like
GOAL.md,ROLE.md,SOUL.mdthat are loaded every session. These define who the agent is, what it's working on, and how it behaves. Humans author these; the agent reads them as its identity. - Learned knowledge — experiences and skills accumulated over time. The agent writes these through use — patterns it notices, solutions it discovers, preferences it learns.
- Project memory — workspace-specific context. Notes, decisions, and state for the current project. Shared across agents working in the same space.
- Episodic memory — working memory is the current context window (the most transient); past interactions are stored and searchable. Context engineering manages the bridge between the two.
Two forces shape this memory:
┌─────────────────────┐ ┌───────────────────────┐
│ human (harness) │ ──injected────→ │ │
│ agent memory │ wisdom │ Agent's │
│ memory skills │ │ Explicit Memory │
└─────────────────────┘ │ │
│ (all the boxes │
┌─────────────────────┐ │ above) │
│ agent │ ──self-maintained──→ │
│ memory management │ experiences │ │
│ skills │ and skills │ │
└─────────────────────┘ └───────────────────────┘
Injected wisdom: The human (or harness) sets up the framework — role definitions, goals, skills, organizational context. This is "taught" knowledge.
Self-maintained memory: The agent writes its own learned experiences, project notes, and accumulated skills. This is "earned" knowledge.
memfas provides the infrastructure for both.
pip install agent-memfas
cd ~/my-agent && memfas initmemfas init scaffolds a memory directory, indexes initial files, and generates a config with sensible defaults (FTS5 search, zero dependencies):
my-agent/
├── memfas.yaml # Configuration (search backend, sources, etc.)
├── memory/
│ ├── GOAL.md # Current tasks and objectives
│ ├── ROLE.md # Agent identity and relationships
│ ├── SOUL.md # Values, communication style
│ ├── experiences/ # Learned experiences
│ ├── patterns/ # Accumulated know-how
│ ├── people/ # Knowledge about people
│ └── project/ # Workspace-specific context
└── skills/
└── memory-active.md # Memory management skill
# Add triggers for fast recall (Type 1)
memfas remember alice --hint "Project manager, prefers async communication"
memfas remember acme --hint "Client project, due Q2, React frontend"
# Store new knowledge — writes to file AND indexes automatically (Type 2)
echo "Alice confirmed: deadline is March 15" | memfas store --tag decisions
# Recall — combines triggers + search
memfas recall "What did Alice say about the deadline?"Core memory files (GOAL, ROLE, SOUL) are indexed on init and always loaded. When you memfas store new knowledge, it's written to a file and indexed in one step — no manual memfas index needed. Use memfas index only when you add files to the memory directory by hand.
Your agent can now call memfas recall before each turn and get relevant context from memory. That's the simplest integration — and it already works.
memfas is designed for both the agent harness (the program running your agent) and the agent itself (the AI). You can use either mode alone, or combine them.
The harness decides when to read and write, like conventional memory management. Before each turn, the harness calls memfas recall and injects context into the prompt. After important events, it updates memory files and re-indexes.
from agent_memfas import Memory
mem = Memory("./memfas.yaml")
def before_turn(user_message):
context = mem.recall(user_message)
return f"{context}\n\nUser: {user_message}"
def after_turn(important_info):
# Write to memory files, re-index
mem.index_file("./memory/latest.md")The agent doesn't even know memfas exists — it just gets better context.
The agent manages its own memory. You provide memfas as a tool (via CLI, MCP, or function calling) along with a skill file that teaches the agent when and how to use it.
memfas ships with ready-to-use skill files you can include directly in your agent's system prompt or core memory. The skill itself becomes part of the agent's core memory — it learns how to remember.
Excerpt from the memory skill (see skills/memory-active.md for the full version):
## Memory Management
You have persistent memory via the `memfas` tool. Use it actively — don't wait to be asked.
### Reading memory
Before responding to any question about prior work, context, or decisions:
memfas recall "<relevant keywords>"
### Writing memory
When you learn something worth remembering for future sessions, write it immediately.
Write for your future self AND for other agents who share this workspace:
# Store a piece of knowledge (saves to file + indexes for search)
echo "Alice prefers async. Confirmed in standup 2025-01-15." | memfas store --tag "people"
# Add a fast-recall trigger for frequently accessed topics
memfas remember alice --hint "PM, prefers async, timezone EST"
### What to write
- Decisions and their rationale ("We chose Postgres because...")
- People and their preferences ("Alice prefers async")
- Project state changes ("Auth module completed, moved to billing")
- Patterns you notice ("This codebase uses repository pattern for DB access")
- Corrections ("Earlier I thought X, actually Y — see [link]")
### What NOT to write
- Transient task details (use project memory / GOAL.md for that)
- Things already in the codebase (code is its own memory)
- Personal opinions about the user (role context belongs in ROLE.md)
### When to write
Write often. Write after every meaningful interaction. The cost of forgetting is
higher than the cost of storing something you won't need.Providing memfas as a tool: For agents that support tool use / function calling, memfas operations can be exposed as tool schemas. Example for Claude:
{
"name": "memfas_recall",
"description": "Recall relevant memories for a given context. Combines keyword triggers (fast) and search (slow). Call this before answering questions about prior work, decisions, or context.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "What to remember — a question, topic, or keywords"
}
},
"required": ["query"]
}
}{
"name": "memfas_store",
"description": "Store a piece of knowledge to persistent memory. Write often — after decisions, discoveries, corrections, and meaningful interactions. Write for all agents sharing this workspace, not just yourself.",
"input_schema": {
"type": "object",
"properties": {
"content": {
"type": "string",
"description": "The knowledge to store — be specific and include context"
},
"tag": {
"type": "string",
"description": "Category tag (e.g., 'people', 'decisions', 'patterns', 'project')"
}
},
"required": ["content"]
}
}{
"name": "memfas_remember",
"description": "Add a keyword trigger for instant recall. Use for topics that come up repeatedly.",
"input_schema": {
"type": "object",
"properties": {
"keyword": {
"type": "string",
"description": "The keyword or phrase to trigger on"
},
"hint": {
"type": "string",
"description": "Short description returned on trigger match"
}
},
"required": ["keyword", "hint"]
}
}The same schemas work with OpenAI function calling (rename
input_schematoparameters). Seeskills/tool-schemas/for ready-to-use definitions.
The harness handles routine recall and indexing automatically. The agent has direct access for deeper operations — storing knowledge, searching specific topics, adding triggers. This gives you the reliability of harness-driven memory with the intelligence of agent-driven curation.
class AgentHarness:
def __init__(self):
self.mem = Memory("./memfas.yaml")
def before_turn(self, user_message):
# Harness: automatic context injection
return self.mem.recall(user_message)
def agent_tools(self):
# Agent: direct access for active memory management
return ["memfas recall", "memfas store", "memfas remember",
"memfas search", "memfas triggers"]You don't always control the full agent environment. Just providing
memfasas a CLI tool alongside the harness already gives effective memory management.
Multiple retrieval strategies, all backed by local SQLite. Start simple, add capability as you need it.
| Strategy | How it works | When to use |
|---|---|---|
| Keyword triggers | Fast scan — checks all triggers against the query | Things the agent asks about often |
| Full-text search | BM25 ranking via SQLite FTS5 | Finding specific content (zero deps) |
| Embedding search | Semantic similarity via local models | Finding conceptually related content |
| External sources | Query pre-indexed databases | Journal entries, knowledge bases |
How triggers work: When you call memfas recall, it loads all keyword triggers and scans the query for substring matches. With a typical trigger set (10–100 keywords), this completes in microseconds. It's not a hash lookup — it's a linear scan — but for the expected scale it's effectively instant compared to the search backends that follow.
All strategies combine naturally. A single memfas recall fires matching triggers and runs a search, returning a merged, scored result.
pip install agent-memfas # Triggers + FTS5 (zero dependencies)
pip install agent-memfas[embeddings] # + semantic search (FastEmbed / Ollama)Two complementary problems, one umbrella:
Per-turn curation — which memories should I bring into this turn?
Before each LLM call, the curator scores all stored memories against the current query and conversation topic, then selects the most relevant ones within a token budget. It's additive — it decides what to pull in from the memory store.
- Topic detection tracks conversation shifts across turns
- Multi-factor scoring: semantic similarity + recency + access patterns
- Token budget controls how much memory to inject (5 levels, from minimal to full)
- 84% token reduction in practice (50K baseline → 8K curated)
Session compaction — the conversation is too long, what can I compress?
During long sessions, the context window fills up. The context manager monitors usage and triggers pre-emptive compaction at 50% capacity (not 90% when recovery is hard). It's subtractive — it decides what to take out or compress within the active conversation.
- Three-way classification: KEEP (high relevance) / SUMMARIZE (medium, compress via LLM) / DROP (low, archive to cold storage)
- Cold storage — dropped context recoverable for 30 days via search
- Pluggable summarization (MiniMax API or custom backends)
- Agent-agnostic — plugs into any agent loop via callbacks
Together, curation keeps the right memories flowing in, and compaction keeps the conversation itself manageable. One adds signal, the other removes noise.
# Per-turn curation
from agent_memfas.curation import ContextCurator
curator = ContextCurator("./memfas.yaml")
result = curator.get_context(query="project status?", session_id="main")
# Session compaction
from agent_memfas.context import ContextManager
ctx = ContextManager(config_path="./memfas.yaml", ...)
status = ctx.before_response(max_tokens=100000) # auto-compacts if neededpip install agent-memfas[curation] # Per-turn curation
pip install agent-memfas[context] # Session compactionMemory without maintenance is just accumulation. Journaling, reflection, and compaction close the loop — the agent doesn't just store experiences, it learns from them and keeps its memory healthy for its future self and for other agents that share the workspace.
The key idea: when an agent journals, it's not writing for itself right now — it's writing for a future version of itself that will wake up fresh with no context, or for a peer agent that needs to pick up where it left off. A journaling skill (skills/journaling.md) teaches the agent this practice.
Like all memfas capabilities, memory maintenance works both ways: the harness can trigger mem.journal() on a schedule, or the agent can run memfas journal itself when it decides it's time to reflect.
On a schedule (cron, or triggered by the harness), the agent can:
- Journal — distill recent interactions into lasting memories, written for the benefit of any future agent. "Today I helped debug a race condition in the auth service. The root cause was a missing mutex on the session store — check the session module first for similar bugs."
- Reflect — review accumulated memories and extract patterns. "I've seen three auth bugs this month — all related to session handling. This is likely a systemic issue worth flagging."
- Consolidate — merge related memories, update stale ones, promote frequent patterns to fast triggers. Strengthen what matters, let the rest fade.
- Summarize — compress verbose memories into concise, high-signal entries that are cheaper to load and easier to scan.
This is what separates an agent that wakes up fresh every session from one that genuinely grows. Each cycle builds understanding that persists — not just for this agent, but for any agent working in the same space.
Experience → Store → Journal → Reflect → Consolidate
↑ │
└─────────── next session ─────────────────┘
Today, memory maintenance runs within a single agent's memory store. In the future, cross-agent memory (shared stores, federated search) is a natural extension — the journaling and reflection patterns already write for a general audience, not just for self.
Future skills can teach agents other human practices — goal review, prioritization, team knowledge sharing. The system is extensible: any practice you can describe in a document, an agent can learn.
The name memfas comes from Daniel Kahneman's Thinking, Fast and Slow.
Humans have two cognitive systems. System 1 is fast, automatic, effortless — you just know your coworker's name. System 2 is slow, deliberate, effortful — you have to search for what was decided in last month's meeting. And then there's something deeper — the overnight processing that consolidates the day's experiences into lasting knowledge.
memfas gives AI agents the same capabilities:
- Fast recall — keyword triggers fire on pattern match, like muscle memory
- Slow search — ranked retrieval finds relevant content, like thinking hard
- Maintenance — periodic journaling, reflection, and consolidation, like learning from reviewing your day
This isn't about making AI human. It's about giving it the same tools for memory that make humans effective — instant recall for the familiar, deep search for the specific, and maintenance to turn experience into lasting knowledge.
memfas
│
┌───────────────┼────────────────┐
│ │ │
Memory Context Memory
Store Engineering Maintenance
│ │ (planned)
triggers curation journaling
FTS5 compaction reflection
embeddings cold storage consolidation
external
│ │ │
└───────────────┼────────────────┘
│
CLI · Python API · Skills
│
┌───────┴───────┐
│ │
Harness calls Agent calls
(passive) (active)
Install what you need:
pip install agent-memfas # Memory store (triggers + FTS5, zero deps)
pip install agent-memfas[embeddings] # + semantic search (FastEmbed + sqlite-vec)
pip install agent-memfas[curation] # + per-turn context curation
pip install agent-memfas[context] # + session compaction & cold storage
pip install agent-memfas[all] # Everything| Command | Description |
|---|---|
memfas init |
Scaffold memory directory with templates (GOAL, ROLE, SOUL, skills) |
memfas recall <context> |
Combined recall — triggers + search |
memfas search <query> |
Search indexed content only |
memfas store [--tag <tag>] |
Store knowledge — saves to file + indexes (stdin or --content) |
memfas remember <kw> --hint <h> |
Add a keyword trigger for fast recall |
memfas forget <keyword> |
Remove a trigger |
memfas triggers |
List all triggers |
memfas index <paths...> |
Index files or directories |
memfas reindex -b <backend> |
Re-index with a different backend |
memfas suggest |
Auto-suggest triggers from indexed content |
memfas stats |
Show memory statistics |
memfas curate <query> |
Get curated context (with curation module) |
See the Skills and Playbook for practical tips on trigger setup, workflow integration, and best practices.
MIT