Skip to content

Kitolochi/memorypack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memorypack

Compress a folder of markdown files into a small, structured summary that fits in an LLM's context window — without losing the ability to look up specifics.

The Problem

You have a knowledge base — maybe 40 markdown files about your goals, health, career, projects, whatever. You want an AI to use all of it when answering questions. But dumping 80 raw text chunks into every prompt wastes tokens and most of it isn't relevant to the question.

What This Does

memorypack reads your markdown files and produces a three-tier summary:

Tier 0 — Overview       (~200 tokens)   One paragraph covering everything.
Tier 1 — Topic summaries (~100 tokens each)   One paragraph per cluster of related content.
Tier 2 — Key facts      (bullet points)   Specific details extracted from each cluster.

The original files stay untouched. The summaries are an additional layer — you can still search the raw text when you need a specific detail.

How It Works

There are 6 steps. Here's what each one does in plain English:

Step 1: Chunk

Split every markdown file into small pieces (~512 tokens each). Break at headings and paragraph boundaries so each piece is about one thing.

Step 2: Embed

Turn each chunk into a 384-number fingerprint (a vector) using a small local model (all-MiniLM-L6-v2). Chunks about similar topics get similar fingerprints.

Step 3: Deduplicate

Compare every pair of chunks by their fingerprints. If two chunks are ≥92% similar, they're saying the same thing — keep the longer one, drop the other.

Step 4: Cluster

Group the remaining chunks by similarity using k-means clustering. The algorithm automatically picks how many groups to make (2–10) by testing different values and keeping the one where groups are most internally coherent (silhouette score).

Each cluster gets a label pulled from the markdown headings of its chunks.

Step 5: Summarize

Send each cluster to a summarizer to get a ~100 token paragraph. Then summarize all the summaries into one ~200 token overview.

Step 6: Extract Facts

Scan each cluster for sentences that contain specific, useful information — things with proper nouns, numbers, or actionable language ("always", "must", "prefers"). Filter out vague transitions ("However, this...", "It should be noted..."). Keep up to 10 facts per cluster.

Output

A single markdown file (or multi-file format) with this structure:

# Knowledge Base: My Life
> Compressed by memorypack | 40 files | 8,820 → 1,400 tokens (6.3:1)

## Overview
One paragraph covering all major themes...

## Topics
### Health & Fitness
Summary paragraph about health-related content...

### Career & Projects
Summary paragraph about career-related content...

## Facts
### Health & Fitness
- Specific fact 1
- Specific fact 2

### Career & Projects
- Specific fact 1
- Specific fact 2

Usage (Python CLI)

pip install -e .

compress — Build a knowledge base

memorypack compress ~/path/to/markdown/files -o compressed/

Options:

-o, --output DIR             Output directory (default: "compressed")
--format [single|multi]      One file or separate files
--device [cpu|cuda|mps]      GPU acceleration
--compression-target FLOAT   Target compression ratio (default: 6.0)
--chunk-size INT             Target tokens per chunk (default: 512)
--topic STR                  Name for the output header

prune — Shrink an existing knowledge base

Remove low-value clusters, merge near-duplicates, or enforce a token budget on a previously compressed knowledge_base.md.

# Enforce a 1500-token budget
memorypack prune compressed/knowledge_base.md --max-tokens 1500

# Drop clusters scoring below 0.3 importance, write to a new file
memorypack prune compressed/knowledge_base.md --min-importance 0.3 -o pruned.md

# Preview what would be removed without writing
memorypack prune compressed/knowledge_base.md --max-tokens 1000 --dry-run

# Skip near-duplicate merging
memorypack prune compressed/knowledge_base.md --max-tokens 1200 --no-merge

Options:

--max-tokens INT             Token budget (0 = no limit)
--min-importance FLOAT       Drop clusters below this score [0-1]
--similarity-threshold FLOAT Cosine threshold for duplicate detection (default: 0.80)
--no-merge                   Disable near-duplicate merging
--dry-run                    Print plan without writing
--device [cpu|cuda|mps]      GPU acceleration
-o, --output PATH            Output file (default: overwrite input)

watch — Auto-recompress on file changes

Poll a directory for .md changes and re-run compression automatically. Optionally auto-prune if output exceeds a token budget.

# Watch with 30-second interval, auto-prune at 2000 tokens
memorypack watch ~/knowledge/ --interval 30 --token-budget 2000 -o compressed/

# Watch with defaults (60s interval, no budget)
memorypack watch ~/knowledge/

Options:

--interval INT               Polling interval in seconds (default: 60)
--token-budget INT           Auto-prune threshold (0 = no limit)
--device [cpu|cuda|mps]      GPU acceleration
--topic STR                  Name for the output header
-o, --output DIR             Output directory (default: "compressed")

How It Fits Into a Larger System

memorypack produces static summaries. In a real app, you'd combine it with hybrid search — both semantic (vector) and full-text (BM25) — to get the best of both worlds:

You ask: "How's my health progress?"

Four things happen:

1. COMPRESSED SUMMARIES (memorypack output)
   → Overview paragraph (always loaded, ~200 tokens)
   → Health cluster summary + facts (~150 tokens)

2. HYBRID SEARCH (Vector + BM25)
   → Vector: embed your question, find semantically similar chunks
   → BM25: rank chunks by term frequency × inverse document frequency
   → Merge both result lists via Reciprocal Rank Fusion (RRF, k=60)
   → Skip chunks that overlap with the summaries above

3. CLAUDE CODE SESSION INDEX
   → 968+ JSONL session files parsed and chunked
   → Past decisions, conversations, and context are searchable
   → Sessions indexed alongside memory files in both vector and BM25

4. SCORED MEMORIES (separate system)
   → Short extracted facts from past conversations
   → Matched by keyword/topic relevance

All four get stuffed into the LLM prompt.
The summaries cover the broad picture cheaply.
Hybrid search fills in specifics — BM25 catches exact terms that
  vector search misses, vector search catches meaning that BM25 misses.
Session history surfaces past decisions and context.
Nothing is lost — the originals are always one search away.

Why Hybrid Search?

Each search method has blind spots:

Method Good at Misses
Vector (cosine) Semantic similarity, paraphrases, conceptual matches Exact terms, rare proper nouns, specific numbers
BM25 (TF-IDF) Exact keyword matches, rare terms, specific names Synonyms, paraphrases, conceptual queries
Hybrid (RRF) Both Documents appearing in both lists get boosted scores

RRF merging is simple and effective: score(doc) = Σ 1/(60 + rank_i(doc)). A document ranked #1 in both lists scores higher than one ranked #1 in only one.

Adaptive RAG Budget

When summaries match the question well (high cosine similarity between question and cluster centroids), fewer raw chunks are needed:

Summary match quality Raw chunks fetched
Strong (sim ≥ 0.6) 5 chunks
Medium (sim ≥ 0.45) 8 chunks
Weak (sim < 0.45) 12-15 chunks

This saves 500-1,500 tokens per query without losing coverage.

Architecture

Compression Pipeline (memorypack)

~/.claude/memory/
├── domains/
│   ├── health/
│   │   ├── profile.md
│   │   ├── goals.md
│   │   └── current_state.md
│   ├── career/
│   └── financial/
└── ...
        │
        ▼
┌──────────────┐
│   CHUNKER    │  Split by headings, ~512 tokens each
└──────┬───────┘
       ▼
┌──────────────┐
│  EMBEDDER    │  all-MiniLM-L6-v2, 384-dim vectors
└──────┬───────┘
       ▼
┌──────────────┐
│   DEDUP      │  Union-find, cosine ≥ 0.92 → keep longest
└──────┬───────┘
       ▼
┌──────────────┐
│  CLUSTER     │  K-means, auto-select k via silhouette
└──────┬───────┘
       ▼
┌──────────────┐
│ SUMMARIZE    │  ~100 tokens per cluster + ~200 token overview
└──────┬───────┘
       ▼
┌──────────────┐
│   FACTS      │  Rule-based extraction, max 10 per cluster
└──────┬───────┘
       ▼
  compressed-knowledge.json
  (or knowledge_base.md)

Search Pipeline (runtime integration)

~/.claude/memory/      ─── chunker ────┐
(293 markdown files)                    │
                                        ├── Chunk[] ──┬── LanceDB (vector, 384-dim)
~/.claude/projects/    ─── session     │              │
(968+ JSONL sessions)      parser ─────┘              └── MiniSearch (BM25, TF-IDF)
                                                             │
                                   search(query) ────────────┤
                                   │  vector results ────────┤
                                   │  BM25 results ──────────┤
                                   │  RRF merge (k=60) ──────┘
                                   └── SearchResult[] (unified interface)

Key Parameters

Parameter Default What it does
chunk_size 512 Target tokens per chunk
dedup_threshold 0.92 How similar two chunks must be to count as duplicates
min_clusters 2 Minimum number of topic groups
max_clusters 10-20 Maximum number of topic groups
summary_max_tokens 150 Length of each cluster summary
overview_max_tokens 250 Length of the overall overview

Dependencies

  • sentence-transformers — local embeddings (no API calls)
  • transformers + torch — BART summarization (local) or swap for an API-based summarizer
  • scikit-learn — clustering algorithms
  • nltk — sentence tokenization
  • click + rich — CLI interface

About

Compress markdown knowledge bases into context-efficient format for LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages