A "Low-Level SRE" system that detects runaway processes at the kernel level and autonomously applies mitigations using a LangGraph AI agent.
┌─────────────────────────────────────────────────────────┐
│ Surgery Console (React :8501) │
│ [CPU Leak] [FD Leak] │ Smoothed CPU graph │
│ Live processes │ Agent monologue │
└────────────────────┬───────────┴──────────────────────┘
│ HTTP/SSE
┌──────────▼──────────┐
│ Go Daemon (:8080) │
│ eBPF (Linux) OR │
│ ps-poll (macOS) │
└──┬──────────────┬───┘
kernel │ │ REST /mitigate
events │ ┌─────────▼──────────┐
│ │ LangGraph Agent │
│ │ (Python) │
│ │ IDLE→ANALYZING │
│ │ →PLANNING │
│ │ →EXECUTING │
│ │ →VERIFYING │
│ │ →RESOLVED │
│ └─────────┬──────────┘
│ │
│ ┌─────────▼──────────┐
└───►│ Redis │
│ - monologue log │
│ - incident store │
│ - metrics cache │
└────────────────────┘
| Component | Technology |
|---|---|
| Kernel Interface | Go + cilium/ebpf (Linux eBPF tracepoints) |
| Mock Mode | Go ps-polling — works on macOS, no root needed |
| AI Orchestration | LangGraph + GPT-5.4 (OpenAI) |
| Data Store | Redis (metrics, monologue, incident history) |
| Dashboard | React + Vite + Node API |
| Container | Docker Compose |
brew install go redis
# Start Redis
brew services start rediscd daemon
go mod tidy
go run ./cmd/daemon --mock --cpu-threshold=80
# Daemon API now available at http://localhost:8080# Create .env
cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...
make dev-agent
# Automatically creates agent/.venv, installs requirements.txt, and runs main.pyTo activate the venv in your own shell (e.g. for debugging):
source agent/.venv/bin/activate
cd dashboard
npm install
npm run dev
# Open http://localhost:8501Open the dashboard, click CPU Leak, then watch:
- The CPU graph separates standard process load from leak/suspect load
- The agent monologue appear (
ANALYZING → EXECUTING → VERIFYING → RESOLVED) - CPU return to baseline
Or from CLI:
make test-cpu-leak# Debian/Ubuntu
sudo apt-get install clang llvm libbpf-dev linux-headers-$(uname -r) bpftool cpulimit
# Generate vmlinux.h from your running kernel
make bpf-compile# Daemon must run as root for eBPF
sudo go run ./daemon/cmd/daemon --cpu-threshold=80 --fd-threshold=200Runs the complete stack — Redis, daemon, agent, and dashboard — in a local Kubernetes cluster via Minikube. The daemon runs in real eBPF mode (Minikube's VM is Linux).
brew install minikube kubectlcp .env.example .env
# Set OPENAI_API_KEY in .env
make k8s-up
# Builds all images into Minikube, applies k8s/ manifests, and prints service URLsmake k8s-status # show pod status
make k8s-rebuild # rebuild images and rollout restart all deployments
make k8s-down # delete namespace and stop Minikube- Images are built directly into Minikube's Docker daemon (
imagePullPolicy: Never) — no registry needed. - The daemon pod runs privileged with
hostPID: truefor eBPF tracepoint access. - The OpenAI API key is read from
.envand stored as a Kubernetes Secret (openai-secret) in theneural-ebpfnamespace. - Manifests live in
k8s/; each service has its own file plus a sharedconfigmap.yaml.
The daemon has two monitoring modes:
Mock mode (--mock): Polls ps aux every second. Fires alerts when a PID sustains >80% CPU for 3+ consecutive seconds. Works anywhere with no privileges.
eBPF mode (Linux, root): Attaches tracepoints:
sched/sched_switch— tracks per-PID CPU time in 100ms windowssyscalls/sys_enter_openat— counts file opens per PID per second
BPF maps accumulate stats; when a threshold is crossed, a perf_event_output fires and the Go userspace reader emits a structured KernelEvent to all SSE subscribers.
Mitigation actions (applied via normal OS APIs, not eBPF):
| Action | Mechanism | Privilege |
|---|---|---|
throttle_cpu |
cpulimit or cgroup v2 cpu.max |
root for cgroup |
suspend |
SIGSTOP |
own process or root |
resume |
SIGCONT |
own process or root |
set_rlimit_fd |
prlimit(2) |
root (Linux) |
kill |
SIGKILL |
own process or root |
The state machine follows this flow:
IDLE ──► ANALYZE (LLM + get_processes tool)
│
▼
PLAN + EXECUTE (LLM + throttle_cpu/suspend/kill tools)
│
▼
VERIFY (LLM + get_processes, checks if CPU normalized)
│
┌────┴──────────────┐
│ resolved? │ no, attempts < 3?
▼ ▼
RESOLVED PLAN + EXECUTE (escalate)
The LLM (GPT-5.4) writes the "internal monologue" at each step. These are streamed to Redis and displayed live in the dashboard.
monologue LIST (lpush) — agent monologue entries (JSON), capped at 500
incidents LIST (lpush) — resolved incident IDs, capped at 100
incident:<ms> HASH — data=JSON, timestamp=float
metrics:<pid> LIST — {ts, cpu, mem} metrics, capped at 300, TTL 1h
monologue_stream PubSub channel — real-time monologue streaming
neural-ebpf/
├── daemon/
│ ├── cmd/daemon/main.go # Entrypoint
│ ├── internal/
│ │ ├── monitor/
│ │ │ ├── monitor.go # Core monitor, pub/sub
│ │ │ ├── mock.go # ps-polling mode
│ │ │ ├── ebpf.go # eBPF loader (Linux)
│ │ │ ├── ebpf_stub.go # Stub for non-Linux
│ │ │ ├── mitigate.go # Mitigation actions
│ │ │ ├── types.go # Event/request types
│ │ │ └── util.go # /proc helpers
│ │ └── api/server.go # HTTP + SSE server
│ ├── bpf/
│ │ ├── cpu_monitor.bpf.c # sched_switch tracepoint
│ │ ├── fd_monitor.bpf.c # openat tracepoint
│ │ └── Makefile # clang build
│ ├── Dockerfile
│ └── go.mod
├── agent/
│ ├── main.py # SSE listener + thread pool
│ ├── agent.py # LangGraph state machine
│ ├── tools.py # Daemon API tools
│ ├── redis_store.py # Persistence layer
│ ├── requirements.txt
│ └── Dockerfile
├── dashboard/
│ ├── src/ # React Surgery Console
│ ├── server.js # Node API for daemon/Redis/scripts
│ ├── package.json
│ └── Dockerfile
├── k8s/
│ ├── 00-namespace.yaml # neural-ebpf namespace
│ ├── configmap.yaml # shared env (DAEMON_URL, REDIS_*)
│ ├── redis.yaml # Redis Deployment + ClusterIP Service
│ ├── daemon.yaml # Daemon Deployment + NodePort :30080
│ ├── agent.yaml # Agent Deployment
│ └── dashboard.yaml # Dashboard Deployment + NodePort :30501
├── scripts/
│ ├── cpu_leak.py # CPU anomaly simulator
│ └── fd_leak.py # FD anomaly simulator
├── Makefile
└── .env.example
-
go mod tidy— Runcd daemon && go mod tidyto generatego.sumand download dependencies (cilium/ebpf,gorilla/mux). -
Set
OPENAI_API_KEY— Copy.env.exampleto.envand fill in your OpenAI API key. -
eBPF compilation (Linux only) — On a Linux machine/VM:
sudo apt-get install clang llvm libbpf-dev linux-headers-$(uname -r) bpftool cd daemon/bpf && make all
This generates
cpu_monitor.bpf.oandfd_monitor.bpf.o. -
vmlinux.h— The eBPF C programs includevmlinux.h(BTF type definitions). Generate it on your target Linux machine:bpftool btf dump file /sys/kernel/btf/vmlinux format c > daemon/bpf/vmlinux.h -
Install
cpulimit— For CPU throttling to work on macOS:brew install cpulimit. On Linux:apt-get install cpulimit. -
Redis (local dev only) —
brew services start redis(macOS) when running services locally viamake dev-*. Not needed formake k8s-up— Redis runs as a pod. -
Local dependencies — Run
make install-toolsto createagent/.venv, installagent/requirements.txtinto it, and runnpm installindashboard/. The venv is also created automatically on the firstmake dev-agentrun. -
Test the demo — For local mode: run
make dev-daemon+make dev-agent+make dev-dashboardin three terminal tabs, then openhttp://localhost:8501. For Minikube: runmake k8s-upand use the printed dashboard URL.