Skip to content

darshjme/agent-compress

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agent-compress

Zero-dependency auto memory compression for long-running AI agent sessions.

Python 3.9+ License: MIT Zero dependencies

Inspired by AgentScope's auto memory compression mechanism.
When an agent's conversation history grows too large, agent-compress automatically summarises the older messages into a structured schema, keeps only recent context, and injects the summary as a system message — enabling infinite-context-like behaviour with any LLM.


Why?

Every LLM has a finite context window. Long agent sessions hit limits fast.
Common solutions: truncate (lose context) or sliding window (lose coherence).
agent-compress does better: summarise the past, keep the present.

Approach Tokens used Context quality
No compression 100 % ✅ Full
Truncation 20–30 % ❌ Lost context
Sliding window 20–30 % ⚠️ Fragmented
agent-compress 10–20 % ✅ Structured summary + recent

Install

pip install agent-compress

Or from source:

git clone https://github.com/darshjme/agent-compress
cd agent-compress
pip install -e .

No external dependencies. Pure Python 3.9+.


Quick Start

from agent_compress import MemoryCompressor, CompressionConfig

# 1. Configure
config = CompressionConfig(
    trigger_threshold=50_000,   # compress when estimated tokens exceed this
    keep_recent=3,              # keep last N messages uncompressed
    summary_max_chars=2_000,    # max summary length
    compression_marker="__compressed__",
)

compressor = MemoryCompressor(config)

# 2. Add messages (OpenAI-style dicts, plain strings, or any object)
for msg in conversation_history:
    compressor.add(msg)

# 3. Compress when needed
if compressor.needs_compression():
    summary = compressor.compress()
    print(f"Compressed {summary.compressed_count} messages")
    print(f"Saved ~{summary.token_estimate} tokens")
    print(summary.task_overview)

# 4. Get context to send to your LLM
context = compressor.get_context()
# → [{"role": "system", "content": "[COMPRESSED MEMORY SUMMARY]..."}, ...recent_msgs]

response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=context,
)

API Reference

CompressionConfig

Parameter Default Description
trigger_threshold 50_000 Token estimate that triggers compression
keep_recent 3 Number of recent messages to keep uncompressed
summary_max_chars 2_000 Max chars in the generated summary
compression_marker "__compressed__" Key added to compressed message dicts
chars_per_token 4.0 Char-to-token ratio for estimation
min_messages_to_compress 4 Minimum messages before compression runs
summarizer None Optional custom summarizer callable

MemoryCompressor

Method / Property Description
.add(message) Add a message to the buffer
.add_many(messages) Add multiple messages at once
.needs_compression() Returns True if compression is needed
.compress() Compresses old messages, returns CompressedSummary
.get_context() Returns [system_summary_msg, ...recent_msgs]
.total_tokens Estimated token count of uncompressed messages
.compression_history List of CompressionRecord audit entries
.latest_summary The most recent CompressedSummary
.stats() Dict snapshot of compressor state
.reset() Clear all state

CompressedSummary

summary.task_overview        # what was being done
summary.current_state        # what has been completed
summary.discoveries          # key findings, errors, decisions
summary.next_steps           # outstanding work
summary.context_to_preserve  # constraints, preferences, settings
summary.compressed_count     # number of messages compressed
summary.token_estimate       # tokens saved
summary.to_text()            # render as system message string
summary.to_dict()            # serialise to dict

Custom (LLM-based) Summarizer

The default summarizer is fully rule-based (no LLM required). For higher-quality summaries, plug in your own:

import openai

def llm_summarizer(messages, max_chars=2000):
    from agent_compress import CompressedSummary

    prompt = f"Summarise these {len(messages)} messages into a structured summary..."
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt + str(messages)},],
        max_tokens=500,
    )
    text = response.choices[0].message.content
    return CompressedSummary(
        task_overview=text,
        compressed_count=len(messages),
    )

config = CompressionConfig(summarizer=llm_summarizer)
compressor = MemoryCompressor(config)

Compression Ratio Example

Before compression: 8 messages, ~4,800 chars, ~1,200 tokens
After compression:  1 system summary + 3 recent messages
                    Summary: ~400 chars (~100 tokens)
                    Recent: ~1,800 chars (~450 tokens)
Compression ratio:  ~54 % token reduction

Token Estimation

agent-compress uses character-based estimation: tokens ≈ chars / chars_per_token.
Default ratio 4.0 matches GPT-4 tokenisation for typical English prose.
Adjust via CompressionConfig(chars_per_token=3.5) for code-heavy sessions.


Compatibility

Works with any message format:

  • OpenAI {"role": "...", "content": "..."} dicts
  • Anthropic-style messages
  • Plain strings
  • Custom objects with .content or .text attributes
  • LangChain BaseMessage objects
  • AutoGen / CrewAI message objects

Development

git clone https://github.com/darshjme/agent-compress
cd agent-compress
pip install -e ".[dev]"
pytest tests/ -v

License

MIT © Darshankumar Joshi

About

Zero-dep auto memory compression for long-running AI agent sessions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages