The new standard of context compression.
Trim long prompts before inference — keep what matters, cut ~65% of wasted tokens.
Quick start · API · Website · Architecture
by Desenyon
Every oversized prompt burns GPU cycles, grid power, and cooling water on tokens that never needed to run. Collective Compress learns which lines to keep for the current question — under a fixed token budget — using a ~5K-parameter CPU policy. No PyTorch. No GPU. Sub-millisecond eviction before your LLM call.
| Collective Compress | Truncation / FIFO | |
|---|---|---|
| KV savings @ 35% budget | ~65% | ~65% |
| Oracle recall | 100% | ~25% |
| Policy size | ~5K params | rule-based |
| Runtime | Rust / CPU | CPU |
The marketing site and in-browser demo live in web/. Deploy to Vercel with the root vercel.json, or run locally:
cd web && npm install && npm run devOpen http://localhost:3000 — WebGL shader hero, live compression demo, API dashboard.
export CC_ADMIN_TOKEN=your-secret-here
export CC_KEY_STORE_FILE=.data/api_keys.json
cargo run -p collective-compress-api| Endpoint | Description |
|---|---|
GET /docs |
Interactive OpenAPI (Swagger UI) |
POST /v1/compress |
Compress with cc_live_… API key |
POST /v1/compress/batch |
Batch compress (up to 32) |
POST /v1/compress/compare |
Compare all eviction policies |
POST /v1/admin/keys |
Create keys (admin token) |
curl -X POST http://localhost:8080/v1/compress \
-H "X-API-Key: cc_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"context":"long text…","query":"What matters?","budget_ratio":0.35}'cargo build --release
echo "def foo(): return 42" | ./target/release/collective-compress -q "foo function"
./target/release/collective-compress demo # middle-truncation failure caseuse collective_compress::compress_context;
let result = compress_context(
"long agent context…",
"What does fetch return when the row is missing?",
0.35,
None,
)?;
println!("{:.1}% KV saved", result.kv_savings_pct());Long context + question
↓
tokenize + 9-dim feature extraction (CPU)
↓
~5K-param MLP scores each token
↓
eviction policy keeps budget + sinks + recent tail
↓
compressed text → your LLM
Weights ship as assets/checkpoints/model.json. The browser demo loads the same file from web/public/model.json.
| Path | Description |
|---|---|
crates/collective-compress |
Core compression engine |
crates/collective-compress-api |
Axum HTTP API + OpenAPI |
crates/collective-compress-cli |
Command-line tool |
web/ |
Next.js site for Vercel (shader hero, demo, dashboard) |
assets/checkpoints/ |
Trained policy weights (model.json) |
| Variable | Default | Description |
|---|---|---|
CC_ADMIN_TOKEN |
— | Admin secret for key management |
CC_KEY_STORE_FILE |
memory | Persist API keys to JSON |
CC_PORT |
8080 |
API listen port |
NEXT_PUBLIC_API_URL |
— | API URL for the web dashboard |
No third-party credentials are bundled. Configure your own secrets via environment variables.
cargo test # 15 Rust tests
cd web && npm run build # Next.js production build
docker build -t collective-compress .Desenyon — Collective Compress
MIT — see LICENSE.