Shadow-Infra

Shadow-Infra mirrors real traffic to temporary "shadow" pods when a GitHub PR is opened, compares responses with a multi-step LangGraph pipeline, and shows a Drift Report UI so you can spot regressions before merging.

Architecture

GitHub PR opened
    → GitHub Webhook → pr-watcher (FastAPI :8000)
    → Parses docker-compose.yaml from the PR branch
    → Spins up a shadow container via docker-compose
    → Registers the deployment in Supabase

Incoming HTTP traffic
    → traffic-splitter (Go tee proxy :8080)
    → 100% → Production upstream (blocking, returned to client, latency recorded)
    → 1%   → Shadow pod (non-blocking goroutine, latency recorded)
    → Both responses + latencies stored in Supabase via comparison-agent

comparison-agent (FastAPI :8001)
    → POST /compare receives (prod, shadow) response pair + latencies
    → Runs LangGraph analysis pipeline:
         structural_check  (rule-based, no LLM)
              ├─ obvious Critical/Warning → format_verdict  (fast path, no LLM call)
              └─ ambiguous               → extract_diffs → semantic_analysis → format_verdict
    → Stores verdict + diff in Supabase

frontend (React + Vite :5173)
    → Lists active PRs and their shadow status
    → Click a PR → Drift Report with side-by-side diff + verdict badges

Services

Service	Language	Port	Description
`traffic-splitter`	Go 1.21	8080	Tee proxy — 1% mirror to shadow, records latency
`pr-watcher`	Python / FastAPI	8000	GitHub webhook handler
`comparison-agent`	Python / FastAPI	8001	LangGraph analysis pipeline (Claude claude-sonnet-4-6)
`frontend`	React + Vite	5173	Drift Report UI

Comparison Agent — LangGraph Pipeline

The comparison agent uses a multi-step LangGraph graph instead of a single-shot LLM call. This avoids wasting tokens on obvious cases and gives the LLM structured data to reason over.

START → structural_check → (fast path) → format_verdict → END
                         → (ambiguous) → extract_diffs → semantic_analysis → format_verdict → END

Nodes

structural_check — pure Python, no LLM:

Shadow 5xx when prod is 2xx → Critical immediately
Auth failure (401/403) on shadow when prod succeeded → Critical immediately
Shadow latency ≥ 10× prod → Critical; ≥ 3× prod → Warning
Empty shadow body when prod has content → Critical
Any match triggers the fast path, skipping the LLM entirely

extract_diffs — pure Python:

Parses both response bodies as JSON and computes added/removed/changed keys
Detects type changes on shared keys (e.g. string → number)
Falls back to size diff and content-type change detection for non-JSON bodies

semantic_analysis — LLM (Claude claude-sonnet-4-6):

Receives structural flags + structured field diffs + truncated raw bodies
Uses with_structured_output(VerdictModel) (Pydantic) — no manual JSON parsing
System prompt is cached via cache_control: ephemeral

format_verdict — pure Python:

If fast path was taken: assembles verdict from rule-based flags
If LLM path was taken: no-op (verdict already in state from semantic_analysis)

Latency Tracking

response_pairs stores prod_latency_ms and shadow_latency_ms for every captured pair. The traffic-splitter records wall-clock time around both the production proxy call and the shadow request, and sends both values to the comparison agent in the /compare payload.

Quick Start

Prerequisites

Docker + Docker Compose
A Supabase project
A GitHub Personal Access Token (or GitHub App)
An Anthropic API key

1. Clone and configure

git clone https://github.com/itsgeorgema/shadow-infra
cd shadow-infra
cp .env.example .env
# Edit .env with your real credentials

2. Start all services

docker compose up --build

Open http://localhost:5173 for the Drift Report UI.

Action Items (manual steps required)

The following steps cannot be automated and must be completed by you:

1. Create a Supabase project and run the schema

Go to supabase.com and create a new project.
Open the SQL Editor in your project dashboard.
Paste and run the contents of supabase/schema.sql.
Copy your Project URL, anon key, and service_role key into .env.

If you have an existing database from a prior version of this project, run the migration at the bottom of supabase/schema.sql to add the latency columns:

ALTER TABLE response_pairs ADD COLUMN IF NOT EXISTS prod_latency_ms integer;
ALTER TABLE response_pairs ADD COLUMN IF NOT EXISTS shadow_latency_ms integer;

2. Configure a GitHub webhook

Option A — GitHub App (recommended for production):

Create a GitHub App at https://github.com/settings/apps/new.
Enable Pull request events.
Set the webhook URL to https://<your-public-url>/webhook.
Generate a webhook secret and set it as GITHUB_WEBHOOK_SECRET in .env.
Install the app on your target repository.
Generate an installation access token and set it as GITHUB_TOKEN in .env.

Option B — Repository webhook (simplest for testing):

Go to your repo → Settings → Webhooks → Add webhook.
Set Payload URL to https://<your-public-url>/webhook.
Set Content type to application/json.
Enter a secret and copy it to GITHUB_WEBHOOK_SECRET in .env.
Select Pull request events only.

3. Fill in all .env values

Open .env (copied from .env.example) and populate every variable:

SUPABASE_URL          — from Supabase project settings
SUPABASE_ANON_KEY     — from Supabase project settings → API
SUPABASE_SERVICE_KEY  — from Supabase project settings → API (service_role)
GITHUB_WEBHOOK_SECRET — secret you chose when creating the webhook
GITHUB_TOKEN          — GitHub PAT with repo read access (or App installation token)
ANTHROPIC_API_KEY     — from console.anthropic.com
PROD_URL              — URL of your production service (e.g. http://prod-app:8080)
SHADOW_SAMPLE_RATE    — fraction of requests to mirror (default 0.01 = 1%)
VITE_SUPABASE_URL     — same as SUPABASE_URL (exposed to browser)
VITE_SUPABASE_ANON_KEY — same as SUPABASE_ANON_KEY (exposed to browser)

4. Run `go mod tidy` in traffic-splitter/

The Go service uses only the standard library, so go.sum is intentionally empty. Run:

cd traffic-splitter
go mod tidy

This will verify dependencies and populate go.sum before building.

5. Expose pr-watcher publicly for GitHub webhooks (local dev)

GitHub cannot reach localhost. Use ngrok or a similar tunnel:

ngrok http 8000
# Copy the https://xxxx.ngrok.io URL → set as webhook URL on GitHub

6. Deploy to production

Option A — Kubernetes (recommended)

See the Kubernetes deployment section below.

Option B — Docker (single host)

Suggested platforms:

Railway — simplest, import repo and deploy each service
Render — Docker support, free tier available
Fly.io — fly launch in each service directory

For the traffic-splitter, configure your DNS / load balancer to route production traffic through it on port 8080.

Development

Run a single service locally

# Go traffic splitter
cd traffic-splitter
go run .

# Python services
cd pr-watcher
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

cd comparison-agent
pip install -r requirements.txt
uvicorn main:app --reload --port 8001

# Frontend
cd frontend
npm install
npm run dev

Environment variable reference

Variable	Used by	Description
`PROD_URL`	traffic-splitter	URL of the production upstream
`SHADOW_URL`	traffic-splitter	URL of the active shadow pod (patched automatically by pr-watcher in K8s)
`SHADOW_SAMPLE_RATE`	traffic-splitter	Fraction of requests to mirror (0–1)
`COMPARISON_API_URL`	traffic-splitter	URL of comparison-agent
`DEPLOYMENT_ID`	traffic-splitter	Supabase deployment ID for this PR (patched automatically by pr-watcher in K8s)
`SUPABASE_URL`	all services	Supabase project REST URL
`SUPABASE_ANON_KEY`	traffic-splitter, frontend	Public anon key
`SUPABASE_SERVICE_KEY`	pr-watcher, comparison-agent	Service-role key (full access)
`GITHUB_WEBHOOK_SECRET`	pr-watcher	HMAC secret for signature verification
`GITHUB_TOKEN`	pr-watcher	GitHub token for GitHub API calls
`ANTHROPIC_API_KEY`	comparison-agent	Anthropic API key for Claude
`VITE_SUPABASE_URL`	frontend	Supabase URL (Vite public env var)
`VITE_SUPABASE_ANON_KEY`	frontend	Supabase anon key (Vite public env var)
`SHADOW_NAMESPACE`	pr-watcher	K8s namespace for shadow Deployments (default: `shadow-infra`)
`TRAFFIC_SPLITTER_DEPLOYMENT`	pr-watcher	K8s Deployment name to patch on PR open/close (default: `traffic-splitter`)

Kubernetes Deployment

All manifests are in k8s/. The setup is self-contained within the shadow-infra namespace.

How it works in K8s

When a PR is opened, pr-watcher:

Creates a K8s Deployment + ClusterIP Service named shadow-pr{N} in the shadow-infra namespace
Patches the traffic-splitter Deployment's SHADOW_URL and DEPLOYMENT_ID env vars, triggering a rolling restart
Traffic-splitter begins mirroring 1% of requests to http://shadow-pr{N}:{port}

When the PR is closed, pr-watcher deletes the shadow Deployment/Service and clears the traffic-splitter's env vars (re-entering passthrough mode).

Prerequisites

Kubernetes cluster (EKS, GKE, AKS, or local minikube/kind)
kubectl and kustomize (or kubectl ≥ 1.14 which bundles kustomize)
A container registry to push images to (e.g. GHCR, ECR, Docker Hub)

1. Build and push images

REGISTRY=ghcr.io/your-org   # replace with your registry

docker build -t $REGISTRY/traffic-splitter:latest ./traffic-splitter
docker build -t $REGISTRY/pr-watcher:latest        ./pr-watcher
docker build -t $REGISTRY/comparison-agent:latest  ./comparison-agent
docker build -t $REGISTRY/frontend:latest          ./frontend

docker push $REGISTRY/traffic-splitter:latest
docker push $REGISTRY/pr-watcher:latest
docker push $REGISTRY/comparison-agent:latest
docker push $REGISTRY/frontend:latest

2. Set image names in manifests

Replace <your-registry>/... in each manifest with your actual image paths:

sed -i "s|<your-registry>|$REGISTRY|g" k8s/*.yaml

3. Create the secret

Fill in k8s/secret.yaml (all values are base64-encoded):

echo -n "https://your-project.supabase.co" | base64   # SUPABASE_URL
echo -n "your-anon-key"                    | base64   # SUPABASE_ANON_KEY
# ... etc.

Then apply it:

kubectl apply -f k8s/secret.yaml

4. Apply everything else

kubectl apply -k k8s/

This applies (in order): namespace → comparison-agent → traffic-splitter → pr-watcher (with ServiceAccount + RBAC) → frontend.

5. Verify

kubectl get pods -n shadow-infra
kubectl get svc  -n shadow-infra

All four pods should reach Running state. The comparison-agent readiness probe must pass before traffic-splitter starts.

6. Point GitHub webhooks at pr-watcher

kubectl get svc pr-watcher -n shadow-infra
# Copy the EXTERNAL-IP and set it as the GitHub webhook URL: http://<EXTERNAL-IP>/webhook

7. Route production traffic through traffic-splitter

kubectl get svc traffic-splitter -n shadow-infra
# Configure your load balancer / Ingress to forward traffic to <EXTERNAL-IP>:8080

RBAC summary

pr-watcher runs under a dedicated ServiceAccount with a namespace-scoped Role granting:

apps/deployments: get, create, update, patch, replace, delete
core/services: get, create, delete

This is the minimum required to manage shadow Deployments and patch the traffic-splitter.

Project Structure

shadow-infra/
├── supabase/schema.sql          — DB tables: shadow_deployments, response_pairs, verdicts
├── traffic-splitter/            — Go tee proxy
│   ├── main.go
│   ├── splitter/proxy.go        — httputil.ReverseProxy + shadow goroutine + latency timing
│   ├── splitter/config.go       — env-based configuration
│   └── store/supabase.go        — HTTP client for Supabase REST API
├── pr-watcher/                  — GitHub webhook handler
│   ├── main.py                  — FastAPI app, webhook verification, lifecycle
│   ├── manifest_parser.py       — Fetch + parse docker-compose.yaml from GitHub
│   └── shadow_manager.py        — K8s Deployment/Service create/delete + traffic-splitter patch
├── comparison-agent/            — LangGraph analysis pipeline
│   ├── main.py                  — POST /compare endpoint
│   └── agent.py                 — LangGraph graph: structural_check → extract_diffs → semantic_analysis
└── frontend/                    — React + Vite + Tailwind
    └── src/
        ├── api.ts               — Supabase JS client + query functions
        ├── types.ts             — TypeScript interfaces
        ├── App.tsx              — Router
        └── components/
            ├── PRList.tsx       — Table of active deployments
            ├── DriftReport.tsx  — Per-PR diff + verdict view
            ├── ResponseDiff.tsx — react-diff-viewer-continued wrapper
            └── VerdictBadge.tsx — Safe/Warning/Critical badge

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
comparison-agent		comparison-agent
frontend		frontend
k8s		k8s
pr-watcher		pr-watcher
supabase		supabase
traffic-splitter		traffic-splitter
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Folders and files

Latest commit

History

Repository files navigation

Shadow-Infra

Architecture

Services

Comparison Agent — LangGraph Pipeline

Nodes

Latency Tracking

Quick Start

Prerequisites

1. Clone and configure

2. Start all services

Action Items (manual steps required)

1. Create a Supabase project and run the schema

2. Configure a GitHub webhook

3. Fill in all .env values

4. Run go mod tidy in traffic-splitter/

5. Expose pr-watcher publicly for GitHub webhooks (local dev)

6. Deploy to production

Option A — Kubernetes (recommended)

Option B — Docker (single host)

Development

Run a single service locally

Environment variable reference

Kubernetes Deployment

How it works in K8s

Prerequisites

1. Build and push images

2. Set image names in manifests

3. Create the secret

4. Apply everything else

5. Verify

6. Point GitHub webhooks at pr-watcher

7. Route production traffic through traffic-splitter

RBAC summary

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

4. Run `go mod tidy` in traffic-splitter/

Packages