Skip to content

bizzarethought/green-pipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Green-Pipe: Carbon-Aware AI Gateway

AI Carbon Routing MIT License Project Status

Reduce your AI carbon footprint with smart model routing, advanced token pruning, and real-time carbon tracking.


Quick Start

1. Configure API Keys

cd web-app
cp config.example.js config.js

Edit config.js and add your API keys:

2. Start the App

python -m http.server 8080

3. Open in Browser

Go to: http://localhost:8080


How Routing Works

Green-Pipe analyzes your prompt and routes it to the most carbon-efficient model:

Routed to EFFICIENT Routed to POWERFUL
Simple Q&A Code generation
Definitions Detailed analysis
Math calculations Creative writing
Short factual Long complex

Efficient Model: Llama 3.1 8B (~0.01g CO₂/query)
Powerful Model: Llama 3.3 70B (~0.12g CO₂/query)

Green Boost: When your local grid is clean (high renewable or low carbon), the app automatically unlocks the powerful model for any query—no CO₂ guilt!

High Carbon Warning: If the grid is dirty and you request a complex task, you’ll be prompted to wait, schedule for later, or proceed anyway.


Future Appliance: Local Model First, Cloud Expansion

The Green-Pipe architecture is designed for extensibility and future appliance scenarios:

Local Model First (Edge AI)

  • Default Routing: The system can be configured to always attempt to run queries on a local model (e.g., via Ollama or other on-device LLMs) for maximum privacy, speed, and zero network carbon cost.
  • Fallback Logic: If the local model cannot handle the query (due to complexity, resource limits, or lack of capability), the system can automatically escalate to a more powerful local or cloud model.
  • Benefits:
    • Keeps data private and on-premises
    • Reduces latency and network energy use
    • Enables offline or low-connectivity operation

Cloud-Based Routing (Expansion)

  • Seamless Escalation: For queries that exceed local model capabilities, Green-Pipe can route requests to cloud-based LLMs (e.g., Groq, OpenAI, or other providers), factoring in real-time grid carbon intensity and user preferences.
  • Smart Selection: The router can be expanded to select among multiple cloud providers based on carbon intensity, cost, or performance.
  • Hybrid Workflows: Future versions may support hybrid execution, where part of a workflow runs locally and part in the cloud, optimizing for both carbon and capability.

This flexible architecture allows organizations to start with local, low-carbon inference and expand to cloud-based AI as needed, always with carbon awareness and user control.


Token Pruning

Green-Pipe uses both rule-based and AI-powered (Transformers.js) pruning to reduce token count before sending your prompt:

  • Removes greetings, closings, and filler words
  • Converts embedded JSON to a compact TOON format
  • Optionally prunes semantically unimportant words (if AI pruning is enabled)

Animated visualizations show you exactly what was pruned and how many tokens were saved.


Real-Time Carbon Tracking

  • Live Grid Intensity: Powered by Electricity Maps API, updates every 5 minutes
  • Per-Query CO₂: Calculated using actual grid data and model energy specs
  • Green Boost: Automatically enables powerful model when grid is clean
  • CO₂ Forecast & Scheduling: If the grid is dirty, you can schedule your query for the next green window
  • Session Stats: Tracks tokens saved, CO₂ saved, and energy saved in real time

CO₂ & Energy Calculations

Green-Pipe provides transparent, real-time calculations for every query and session:

1. CO₂ Emissions per Query

$$ ext{CO}_2\ (g) = \text{Energy (kWh)} \times \text{Grid Intensity (gCO}_2/\text{kWh)} $$

Where:

  • Energy (kWh) is estimated from the number of tokens, model specs, and datacenter PUE.
  • Grid Intensity is live from Electricity Maps.

2. Token Pruning Savings

Removes unnecessary tokens before sending to the LLM, then calculates:

$$ ext{Energy Saved (Wh)} = \text{Tokens Pruned} \times \text{EnergyPerToken} \times \text{PUE} $$ $$ ext{CO}_2\ \text{Saved (g)} = \frac{\text{Energy Saved (Wh)}}{1000} \times \text{Grid Intensity} $$

3. Model Routing Savings

Shows the difference in CO₂ between efficient and powerful models for the same query:

$$ ext{CO}_2\ \text{Saved} = \text{CO}_2^{\text{powerful}} - \text{CO}_2^{\text{efficient}} $$

4. Session Statistics

Tracks and displays:

  • Total tokens processed
  • Tokens pruned
  • Energy saved
  • CO₂ saved (routing + pruning)

5. Forecast & Scheduling

Calculates potential CO₂ savings by waiting for a lower-carbon grid window, using live or forecasted grid intensity.

All calculations are shown in the UI, with breakdowns for each query and session.



Project Structure

web-app/
├── index.html          # Main UI
├── app.js              # Core application logic
├── config.js           # API keys (gitignored)
├── config.example.js   # API keys template
└── llmClients/
    └── ollamaClient.js # LLM client abstraction
extension/
├── arbitrator.js       # Model routing logic
├── background.js       # Service worker
├── content.js          # Content script
└── manifest.json       # Chrome extension manifest

API Keys

This project uses two free APIs:

  1. Groq API — Fast LLM inference
    Free: 30 requests/minute
    Get key

  2. Electricity Maps API — Real-time carbon intensity
    Free: 100 requests/day
    Get key


Features

  • Smart model routing based on query complexity and live grid data
  • Green Boost: unlocks powerful model when grid is clean
  • Token pruning with animated and AI-powered visualization
  • Live carbon intensity and CO₂ forecast
  • Per-query and session CO₂/energy stats
  • Query scheduling for optimal carbon savings
  • Real-time CO₂ savings display

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors