Skip to content

desenyon/collective-compress

Repository files navigation

Collective Compress

The new standard of context compression.
Trim long prompts before inference — keep what matters, cut ~65% of wasted tokens.

Quick start · API · Website · Architecture

Rust 1.85+ MIT Desenyon


by Desenyon

Every oversized prompt burns GPU cycles, grid power, and cooling water on tokens that never needed to run. Collective Compress learns which lines to keep for the current question — under a fixed token budget — using a ~5K-parameter CPU policy. No PyTorch. No GPU. Sub-millisecond eviction before your LLM call.

At a glance

Collective Compress Truncation / FIFO
KV savings @ 35% budget ~65% ~65%
Oracle recall 100% ~25%
Policy size ~5K params rule-based
Runtime Rust / CPU CPU

Quick start

Website (Vercel)

The marketing site and in-browser demo live in web/. Deploy to Vercel with the root vercel.json, or run locally:

cd web && npm install && npm run dev

Open http://localhost:3000 — WebGL shader hero, live compression demo, API dashboard.

API server

export CC_ADMIN_TOKEN=your-secret-here
export CC_KEY_STORE_FILE=.data/api_keys.json
cargo run -p collective-compress-api
Endpoint Description
GET /docs Interactive OpenAPI (Swagger UI)
POST /v1/compress Compress with cc_live_… API key
POST /v1/compress/batch Batch compress (up to 32)
POST /v1/compress/compare Compare all eviction policies
POST /v1/admin/keys Create keys (admin token)
curl -X POST http://localhost:8080/v1/compress \
  -H "X-API-Key: cc_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"context":"long text…","query":"What matters?","budget_ratio":0.35}'

CLI

cargo build --release
echo "def foo(): return 42" | ./target/release/collective-compress -q "foo function"
./target/release/collective-compress demo   # middle-truncation failure case

Library

use collective_compress::compress_context;

let result = compress_context(
    "long agent context…",
    "What does fetch return when the row is missing?",
    0.35,
    None,
)?;
println!("{:.1}% KV saved", result.kv_savings_pct());

How it works

Long context + question
        ↓
  tokenize + 9-dim feature extraction (CPU)
        ↓
  ~5K-param MLP scores each token
        ↓
  eviction policy keeps budget + sinks + recent tail
        ↓
  compressed text → your LLM

Weights ship as assets/checkpoints/model.json. The browser demo loads the same file from web/public/model.json.

Project layout

Path Description
crates/collective-compress Core compression engine
crates/collective-compress-api Axum HTTP API + OpenAPI
crates/collective-compress-cli Command-line tool
web/ Next.js site for Vercel (shader hero, demo, dashboard)
assets/checkpoints/ Trained policy weights (model.json)

Configuration

Variable Default Description
CC_ADMIN_TOKEN Admin secret for key management
CC_KEY_STORE_FILE memory Persist API keys to JSON
CC_PORT 8080 API listen port
NEXT_PUBLIC_API_URL API URL for the web dashboard

No third-party credentials are bundled. Configure your own secrets via environment variables.

Development

cargo test                    # 15 Rust tests
cd web && npm run build       # Next.js production build
docker build -t collective-compress .

Author

Desenyon — Collective Compress

License

MIT — see LICENSE.

About

Collective Compress — learned context compression for LLMs by Desenyon

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors