Skip to content

Feature: Memory Extraction (Atomic Facts from Text) + Forgetting (Decay, Pruning, Targeted Purge) #678

@teknium1

Description

@teknium1

Overview

Two complementary capabilities for memory lifecycle management:

  1. Memory Extraction — Decompose text blobs (task outputs, context summaries, long agent responses) into atomic, self-contained facts that each enter the memory pipeline independently. This powers automatic knowledge capture from context compression and task completion.

  2. Forgetting — Targeted purging by scope, age, or categories, plus automatic importance decay over time. This is what keeps memory useful — without forgetting, memory becomes an ever-growing pile of stale information.

Inspired by CrewAI's extract_memories() and forget() (MIT licensed).

Parent tracking issue: #509
Depends on: Memory storage migration (SQLite with scope/importance fields)
Integrates with: #480 (Context Condensation), #499 (Context Compaction)


Part 1: Memory Extraction

New extract Action

memory(action="extract", content="<long text blob>")

Uses the auxiliary LLM to decompose raw text into discrete, self-contained memory statements:

Input:

After reviewing the infrastructure options, the team recommends PostgreSQL
for the user database due to its JSONB support. Estimated cost is $2,400/month
on RDS. The compliance team flagged that all user data must stay in EU regions.
DevOps prefers managed services over self-hosted.

Output:

{
  "extracted": [
    "Team recommends PostgreSQL for user database due to JSONB support",
    "Estimated database cost is $2,400/month on RDS",
    "Compliance requires all user data to remain in EU regions",
    "DevOps prefers managed services over self-hosted"
  ]
}

Each fact can then be stored via memory(action="add", ...) — entering the full cognitive encoding pipeline if enabled.

Context Compression Integration

During context compaction (when the conversation gets too long and middle messages are summarized), run extract on the summary and auto-store the extracted facts. This ensures knowledge transitions from ephemeral context to persistent storage.

Integration point: agent/context_compressor.py — after generating a summary, call extract + add for each fact.


Part 2: Forgetting

New forget Action

memory(action="forget", scope="/project/old", older_than="30d")

Parameters:

  • scope — Purge memories under this scope path
  • older_than — Purge memories older than this duration (e.g. "30d", "6m")
  • categories — Purge memories with these categories

Performs soft-delete (sets forgotten=1) so data can be recovered if needed.

Automatic Importance Decay

Periodic maintenance (configurable, default: daily) that decays importance of stale memories:

new_importance = importance * 0.5^(days_since_access / half_life_days)
  • Default half-life: 30 days
  • Memories accessed recently get their last_accessed_at updated, resetting decay
  • High-importance memories (user identity, core preferences) can be marked as decay-exempt
  • Memories below importance threshold (default: 0.05) after 30+ days are auto-pruned

Configuration

memory:
  decay:
    enabled: true
    half_life_days: 30
    prune_threshold: 0.05
    prune_after_days: 30
    exempt_scopes: ["/user"]  # Never decay user profile memories

Files to Change

  • tools/memory_tool.py — Add extract and forget actions
  • agent/context_compressor.py — Integration point for auto-extraction during compaction
  • agent/prompts/ — Extraction system prompt (decompose text into atomic facts)
  • tests/tools/test_memory_tool.py — Tests for extract and forget

Acceptance Criteria

  • extract action decomposes text into atomic facts via LLM
  • Extraction gracefully degrades (returns full text as single fact on LLM failure)
  • forget action soft-deletes memories matching scope/age/category filters
  • Importance decay reduces stale memory importance over time
  • Decay-exempt scopes are respected
  • Auto-pruning removes very low importance memories after configurable days
  • Context compression integration extracts and stores facts from summaries

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions