CarbonProxy

Logo
We call our underlying API EcoStack.
How a memory map is built — one exchange at a time.
Memory layer architecture on how it works.

Inspiration

AI is invisible infrastructure and invisible infrastructure has invisible costs. Most developers have never once thought about the water their API calls consume or the carbon they emit. We did some digging and found that a single GPT-4o query uses roughly 10 times more water than a Google search, and that AI inference is one of the fastest growing sources of untracked carbon emissions on the planet. What made it worse was realizing the waste isn't even necessary, most of it comes from sloppy API usage. Bloated prompts, full conversation history re-sent on every turn, powerful models used for trivial tasks. Nobody built the layer that fixes this. So we did.

What it does

CarbonProxy is a carbon-aware middleware layer between any app and any LLM.

Every request passes through three optimization stages:

1) Prompt Optimizer: Compresses user input to its semantic core while preserving intent, constraints, and technical details.

2) Memory Layer: Stores compact summaries of prior exchanges and injects only relevant context via vector similarity, replacing repeated long histories with short targeted context.

3) Model Router: Scores request complexity and routes to the smallest capable model, so trivial tasks don’t consume premium-model resources.

The result is a live dashboard showing estimated tokens saved, CO₂ avoided, and water impact reduction by session and over time. Integration is lightweight and drop-in for existing LLM workflows.

How we built it

CarbonProxy Architecture

We split into three tracks with clean contracts and integrated fast:

1) Backend + Optimization Track: FastAPI service with prompt compression and model routing. Complexity scoring combines token count, keyword signals, and structure to map requests to the right model tier.

CarbonProxy Architecture

2) Memory + Retrieval Track: SQLite-backed memory store with embedding-based retrieval. Each turn is summarized asynchronously, and relevant summaries are injected on the next turn using cosine similarity (NumPy only, no external vector DB).

3) Frontend + Observability Track: Real-time dashboard visualizing token reduction, estimated CO₂ impact, and memory growth over time.

We connected everything through two explicit contracts: POST /memory/inject before model call POST /memory/save after model call (async) This let each stream develop and validate independently before final integration.

Challenges we ran into

1) Aggressive compression can remove critical constraints: We added semantic safety checks (action verbs, constraint phrases, complexity targets, schema field preservation) so “shorter” never means “wrong.”

2) Memory relevance vs. noise: Naive history injection can reintroduce token bloat. We solved this with summary chunks + similarity thresholds, so only contextually relevant facts are injected.

3) Making the carbon numbers defensible: We didn't want to make up figures that a technically literate judge could immediately dismiss. We derived everything from Google's August 2025 environmental report, published energy benchmarks, and the US grid intensity average from the IEA and built a clear methodology we could explain in 30 seconds if challenged.

Accomplishments that we're proud of

The memory layer compression ratio. By exchange five in a typical session, a naive system is sending over 2,400 tokens of history on every request. CarbonProxy sends 340 (a whooping 86% reduction) while the model still has all the context it needs, just compressed and targeted. That number surprised us when we first measured it.

We're also proud of the three-layer architecture composing as cleanly as it did. Each layer was built independently and integrated in the final two hours without significant rework. That's a sign the contracts were designed right from the start.

And the water angle. Most AI sustainability tools talk about carbon. Nobody talks about water. The fact that data centers consume millions of liters of water annually for cooling, and that every wasteful API call contributes to that landed differently with everyone we showed it to. The eco logo wasn't just aesthetic. It was the right metaphor.

What we learned

That the biggest environmental cost in AI applications isn't the model, it's the usage pattern. A team of 10 developers making 1,000 daily API calls with no optimization is dramatically more wasteful than a team using a smaller model intelligently. The model is not the bottleneck. The behavior around the model is. We also learned that SQLite is criminally underrated for hackathon infrastructure. Zero setup, zero server, Python stdlib, inspectable with a GUI tool, fast enough for everything we needed. The reflex to reach for Postgres or a hosted database added nothing here.

And we learned that sustainability arguments land hardest when they're expressed in human terms. Saying "we saved 4,820 tokens" means nothing. Saying "we saved enough energy to charge your phone 12 times, in one five-minute coding session" lands in the room.

What's next for CarbonProxy

The team is turning this into an easy-to-install package (SDK) so any developer can add it to their app with just three lines of code. We also plan to add a "Carbon Budget," allowing companies to set a limit on how much CO₂ their AI can emit each month.