Skip to content

wangtian24/agent-memfas

Repository files navigation

memfas

Teach your AI to remember, reflect, and grow.

Memory, Fast and Slow — for AI agents.

PyPI version Python 3.10+ License: MIT


What is memfas?

memfas is a local memory toolkit for AI agents. Point it at a directory, give it some files, and your agent gets persistent memory that survives context windows, session restarts, and compaction.

It works through simple CLI commands and a Python API — easy enough for an agent harness to call, and simple enough for an AI agent to use directly as a tool. No servers, no cloud dependencies. Just a local SQLite database and your memory files.

You: "Let's continue the project."
Agent without memory: "I apologize, but I don't have context about what project..."
Agent with memfas:    "Right — the acme frontend, due Q2. Alice flagged the auth issue yesterday."

memfas also ships with skills — instruction files that teach agents how humans manage their knowledge. How to journal. When to write things down. How to review and consolidate. We humanize agents by giving them the same practices that make humans effective.


The memory model

An agent's memory isn't one thing. Like human memory, it varies along two dimensions: how persistent it is (short-term vs long-term) and how fundamental it is (core identity vs episodic experiences).

                      short-term ──────────────────── long-term

 fundamental          ┌──────────┐ ┌──────────┐ ┌──────────┐
      │   core        │ GOAL.md  │ │ ROLE.md  │ │ SOUL.md  │
      │               └──────────┘ └──────────┘ └──────────┘
      │
      │   learned     ┌─────────────┐ ┌────────────────────┐
      │               │ Experiences │ │ Patterns & Skills  │
      │               └─────────────┘ └────────────────────┘
      │
      │               ┌────────────────────────────────────────┐
      │               │          Project Memory                 │
      │               └────────────────────────────────────────┘
      │
      │   episodic    ┌─────────────┐ ┌────────────────────────┐
      ↓               │  Working    │ │   Past Interactions     │
 transient            │  Memory    │ │                          │
                      └─────────────┘ └────────────────────────┘

memfas supports all of these:

  • Core memory — files like GOAL.md, ROLE.md, SOUL.md that are loaded every session. These define who the agent is, what it's working on, and how it behaves. Humans author these; the agent reads them as its identity.
  • Learned knowledge — experiences and skills accumulated over time. The agent writes these through use — patterns it notices, solutions it discovers, preferences it learns.
  • Project memory — workspace-specific context. Notes, decisions, and state for the current project. Shared across agents working in the same space.
  • Episodic memory — working memory is the current context window (the most transient); past interactions are stored and searchable. Context engineering manages the bridge between the two.

Two forces shape this memory:

┌─────────────────────┐                    ┌───────────────────────┐
│ human (harness)     │ ──injected────→    │                       │
│ agent memory        │    wisdom          │   Agent's             │
│ memory skills       │                    │   Explicit Memory     │
└─────────────────────┘                    │                       │
                                           │   (all the boxes      │
┌─────────────────────┐                    │    above)             │
│ agent               │ ──self-maintained──→                       │
│ memory management   │    experiences     │                       │
│ skills              │    and skills      │                       │
└─────────────────────┘                    └───────────────────────┘

Injected wisdom: The human (or harness) sets up the framework — role definitions, goals, skills, organizational context. This is "taught" knowledge.

Self-maintained memory: The agent writes its own learned experiences, project notes, and accumulated skills. This is "earned" knowledge.

memfas provides the infrastructure for both.


Get started in 30 seconds

pip install agent-memfas
cd ~/my-agent && memfas init

memfas init scaffolds a memory directory, indexes initial files, and generates a config with sensible defaults (FTS5 search, zero dependencies):

my-agent/
├── memfas.yaml              # Configuration (search backend, sources, etc.)
├── memory/
│   ├── GOAL.md              # Current tasks and objectives
│   ├── ROLE.md              # Agent identity and relationships
│   ├── SOUL.md              # Values, communication style
│   ├── experiences/         # Learned experiences
│   ├── patterns/            # Accumulated know-how
│   ├── people/              # Knowledge about people
│   └── project/             # Workspace-specific context
└── skills/
    └── memory-active.md     # Memory management skill
# Add triggers for fast recall (Type 1)
memfas remember alice --hint "Project manager, prefers async communication"
memfas remember acme --hint "Client project, due Q2, React frontend"

# Store new knowledge — writes to file AND indexes automatically (Type 2)
echo "Alice confirmed: deadline is March 15" | memfas store --tag decisions

# Recall — combines triggers + search
memfas recall "What did Alice say about the deadline?"

Core memory files (GOAL, ROLE, SOUL) are indexed on init and always loaded. When you memfas store new knowledge, it's written to a file and indexed in one step — no manual memfas index needed. Use memfas index only when you add files to the memory directory by hand.

Your agent can now call memfas recall before each turn and get relevant context from memory. That's the simplest integration — and it already works.


Two ways to integrate

memfas is designed for both the agent harness (the program running your agent) and the agent itself (the AI). You can use either mode alone, or combine them.

Passive mode — the harness drives memory

The harness decides when to read and write, like conventional memory management. Before each turn, the harness calls memfas recall and injects context into the prompt. After important events, it updates memory files and re-indexes.

from agent_memfas import Memory

mem = Memory("./memfas.yaml")

def before_turn(user_message):
    context = mem.recall(user_message)
    return f"{context}\n\nUser: {user_message}"

def after_turn(important_info):
    # Write to memory files, re-index
    mem.index_file("./memory/latest.md")

The agent doesn't even know memfas exists — it just gets better context.

Active mode — the agent drives memory

The agent manages its own memory. You provide memfas as a tool (via CLI, MCP, or function calling) along with a skill file that teaches the agent when and how to use it.

memfas ships with ready-to-use skill files you can include directly in your agent's system prompt or core memory. The skill itself becomes part of the agent's core memory — it learns how to remember.

Excerpt from the memory skill (see skills/memory-active.md for the full version):

## Memory Management

You have persistent memory via the `memfas` tool. Use it actively — don't wait to be asked.

### Reading memory
Before responding to any question about prior work, context, or decisions:
  memfas recall "<relevant keywords>"

### Writing memory
When you learn something worth remembering for future sessions, write it immediately.
Write for your future self AND for other agents who share this workspace:

  # Store a piece of knowledge (saves to file + indexes for search)
  echo "Alice prefers async. Confirmed in standup 2025-01-15." | memfas store --tag "people"

  # Add a fast-recall trigger for frequently accessed topics
  memfas remember alice --hint "PM, prefers async, timezone EST"

### What to write
- Decisions and their rationale ("We chose Postgres because...")
- People and their preferences ("Alice prefers async")
- Project state changes ("Auth module completed, moved to billing")
- Patterns you notice ("This codebase uses repository pattern for DB access")
- Corrections ("Earlier I thought X, actually Y — see [link]")

### What NOT to write
- Transient task details (use project memory / GOAL.md for that)
- Things already in the codebase (code is its own memory)
- Personal opinions about the user (role context belongs in ROLE.md)

### When to write
Write often. Write after every meaningful interaction. The cost of forgetting is
higher than the cost of storing something you won't need.

Providing memfas as a tool: For agents that support tool use / function calling, memfas operations can be exposed as tool schemas. Example for Claude:

{
  "name": "memfas_recall",
  "description": "Recall relevant memories for a given context. Combines keyword triggers (fast) and search (slow). Call this before answering questions about prior work, decisions, or context.",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "What to remember — a question, topic, or keywords"
      }
    },
    "required": ["query"]
  }
}
{
  "name": "memfas_store",
  "description": "Store a piece of knowledge to persistent memory. Write often — after decisions, discoveries, corrections, and meaningful interactions. Write for all agents sharing this workspace, not just yourself.",
  "input_schema": {
    "type": "object",
    "properties": {
      "content": {
        "type": "string",
        "description": "The knowledge to store — be specific and include context"
      },
      "tag": {
        "type": "string",
        "description": "Category tag (e.g., 'people', 'decisions', 'patterns', 'project')"
      }
    },
    "required": ["content"]
  }
}
{
  "name": "memfas_remember",
  "description": "Add a keyword trigger for instant recall. Use for topics that come up repeatedly.",
  "input_schema": {
    "type": "object",
    "properties": {
      "keyword": {
        "type": "string",
        "description": "The keyword or phrase to trigger on"
      },
      "hint": {
        "type": "string",
        "description": "Short description returned on trigger match"
      }
    },
    "required": ["keyword", "hint"]
  }
}

The same schemas work with OpenAI function calling (rename input_schema to parameters). See skills/tool-schemas/ for ready-to-use definitions.

Hybrid mode (recommended)

The harness handles routine recall and indexing automatically. The agent has direct access for deeper operations — storing knowledge, searching specific topics, adding triggers. This gives you the reliability of harness-driven memory with the intelligence of agent-driven curation.

class AgentHarness:
    def __init__(self):
        self.mem = Memory("./memfas.yaml")

    def before_turn(self, user_message):
        # Harness: automatic context injection
        return self.mem.recall(user_message)

    def agent_tools(self):
        # Agent: direct access for active memory management
        return ["memfas recall", "memfas store", "memfas remember",
                "memfas search", "memfas triggers"]

You don't always control the full agent environment. Just providing memfas as a CLI tool alongside the harness already gives effective memory management.


What's inside

Memory Store — the foundation

Multiple retrieval strategies, all backed by local SQLite. Start simple, add capability as you need it.

Strategy How it works When to use
Keyword triggers Fast scan — checks all triggers against the query Things the agent asks about often
Full-text search BM25 ranking via SQLite FTS5 Finding specific content (zero deps)
Embedding search Semantic similarity via local models Finding conceptually related content
External sources Query pre-indexed databases Journal entries, knowledge bases

How triggers work: When you call memfas recall, it loads all keyword triggers and scans the query for substring matches. With a typical trigger set (10–100 keywords), this completes in microseconds. It's not a hash lookup — it's a linear scan — but for the expected scale it's effectively instant compared to the search backends that follow.

All strategies combine naturally. A single memfas recall fires matching triggers and runs a search, returning a merged, scored result.

pip install agent-memfas                 # Triggers + FTS5 (zero dependencies)
pip install agent-memfas[embeddings]     # + semantic search (FastEmbed / Ollama)

Context Engineering — managing what the agent sees

Two complementary problems, one umbrella:

Per-turn curationwhich memories should I bring into this turn?

Before each LLM call, the curator scores all stored memories against the current query and conversation topic, then selects the most relevant ones within a token budget. It's additive — it decides what to pull in from the memory store.

  • Topic detection tracks conversation shifts across turns
  • Multi-factor scoring: semantic similarity + recency + access patterns
  • Token budget controls how much memory to inject (5 levels, from minimal to full)
  • 84% token reduction in practice (50K baseline → 8K curated)

Session compactionthe conversation is too long, what can I compress?

During long sessions, the context window fills up. The context manager monitors usage and triggers pre-emptive compaction at 50% capacity (not 90% when recovery is hard). It's subtractive — it decides what to take out or compress within the active conversation.

  • Three-way classification: KEEP (high relevance) / SUMMARIZE (medium, compress via LLM) / DROP (low, archive to cold storage)
  • Cold storage — dropped context recoverable for 30 days via search
  • Pluggable summarization (MiniMax API or custom backends)
  • Agent-agnostic — plugs into any agent loop via callbacks

Together, curation keeps the right memories flowing in, and compaction keeps the conversation itself manageable. One adds signal, the other removes noise.

# Per-turn curation
from agent_memfas.curation import ContextCurator
curator = ContextCurator("./memfas.yaml")
result = curator.get_context(query="project status?", session_id="main")

# Session compaction
from agent_memfas.context import ContextManager
ctx = ContextManager(config_path="./memfas.yaml", ...)
status = ctx.before_response(max_tokens=100000)  # auto-compacts if needed
pip install agent-memfas[curation]       # Per-turn curation
pip install agent-memfas[context]        # Session compaction

Memory Maintenance (coming soon)

Memory without maintenance is just accumulation. Journaling, reflection, and compaction close the loop — the agent doesn't just store experiences, it learns from them and keeps its memory healthy for its future self and for other agents that share the workspace.

The key idea: when an agent journals, it's not writing for itself right now — it's writing for a future version of itself that will wake up fresh with no context, or for a peer agent that needs to pick up where it left off. A journaling skill (skills/journaling.md) teaches the agent this practice.

Like all memfas capabilities, memory maintenance works both ways: the harness can trigger mem.journal() on a schedule, or the agent can run memfas journal itself when it decides it's time to reflect.

On a schedule (cron, or triggered by the harness), the agent can:

  • Journal — distill recent interactions into lasting memories, written for the benefit of any future agent. "Today I helped debug a race condition in the auth service. The root cause was a missing mutex on the session store — check the session module first for similar bugs."
  • Reflect — review accumulated memories and extract patterns. "I've seen three auth bugs this month — all related to session handling. This is likely a systemic issue worth flagging."
  • Consolidate — merge related memories, update stale ones, promote frequent patterns to fast triggers. Strengthen what matters, let the rest fade.
  • Summarize — compress verbose memories into concise, high-signal entries that are cheaper to load and easier to scan.

This is what separates an agent that wakes up fresh every session from one that genuinely grows. Each cycle builds understanding that persists — not just for this agent, but for any agent working in the same space.

   Experience → Store → Journal → Reflect → Consolidate
       ↑                                         │
       └─────────── next session ─────────────────┘

Today, memory maintenance runs within a single agent's memory store. In the future, cross-agent memory (shared stores, federated search) is a natural extension — the journaling and reflection patterns already write for a general audience, not just for self.

Future skills can teach agents other human practices — goal review, prioritization, team knowledge sharing. The system is extensible: any practice you can describe in a document, an agent can learn.


Why "Fast and Slow"?

The name memfas comes from Daniel Kahneman's Thinking, Fast and Slow.

Humans have two cognitive systems. System 1 is fast, automatic, effortless — you just know your coworker's name. System 2 is slow, deliberate, effortful — you have to search for what was decided in last month's meeting. And then there's something deeper — the overnight processing that consolidates the day's experiences into lasting knowledge.

memfas gives AI agents the same capabilities:

  • Fast recall — keyword triggers fire on pattern match, like muscle memory
  • Slow search — ranked retrieval finds relevant content, like thinking hard
  • Maintenance — periodic journaling, reflection, and consolidation, like learning from reviewing your day

This isn't about making AI human. It's about giving it the same tools for memory that make humans effective — instant recall for the familiar, deep search for the specific, and maintenance to turn experience into lasting knowledge.


Architecture

                  memfas
                    │
    ┌───────────────┼────────────────┐
    │               │                │
 Memory          Context          Memory
 Store          Engineering      Maintenance
    │               │             (planned)
 triggers        curation       journaling
 FTS5            compaction     reflection
 embeddings      cold storage   consolidation
 external
    │               │                │
    └───────────────┼────────────────┘
                    │
          CLI · Python API · Skills
                    │
            ┌───────┴───────┐
            │               │
       Harness calls   Agent calls
        (passive)       (active)

Installation

Install what you need:

pip install agent-memfas                 # Memory store (triggers + FTS5, zero deps)
pip install agent-memfas[embeddings]     # + semantic search (FastEmbed + sqlite-vec)
pip install agent-memfas[curation]       # + per-turn context curation
pip install agent-memfas[context]        # + session compaction & cold storage
pip install agent-memfas[all]            # Everything

CLI Reference

Command Description
memfas init Scaffold memory directory with templates (GOAL, ROLE, SOUL, skills)
memfas recall <context> Combined recall — triggers + search
memfas search <query> Search indexed content only
memfas store [--tag <tag>] Store knowledge — saves to file + indexes (stdin or --content)
memfas remember <kw> --hint <h> Add a keyword trigger for fast recall
memfas forget <keyword> Remove a trigger
memfas triggers List all triggers
memfas index <paths...> Index files or directories
memfas reindex -b <backend> Re-index with a different backend
memfas suggest Auto-suggest triggers from indexed content
memfas stats Show memory statistics
memfas curate <query> Get curated context (with curation module)

See the Skills and Playbook for practical tips on trigger setup, workflow integration, and best practices.


License

MIT

About

Memory Fast and Slow for AI agents — dual-store memory with keyword triggers, embeddings, and proactive curation

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages