An autonomous infrastructure agent. It detects incidents, figures out what broke, fixes it, and writes a post-mortem — then uses that post-mortem to handle the next incident better.
Your app is running on cloud servers. One of them crashes at 3 AM. Normally, an engineer gets paged, wakes up, SSH's in, pokes around logs, restarts the service, maybe scales it up, then spends 30 minutes writing an incident report.
Ghost Operator replaces that entire workflow. It's constantly scanning — checking your Render services, searching the web for outage reports, monitoring status pages and community forums. When it picks up a signal that something is wrong, it kicks off a pipeline:
- It figures out what broke — which service, what error codes showed up, how bad it is.
- It checks its memory: "Have I seen this before? What fixed it last time?"
- It takes action — restarts the service, scales it up, whatever the situation calls for.
- It writes a post-mortem and saves it, so next time it has more context.
The key thing: it gets better over time. Every incident becomes a node in a knowledge graph. Every post-mortem becomes searchable memory. The agent that handles incident #50 is meaningfully smarter than the one that handled incident #1.
┌────────────────────────────────────────────────────────────────┐
│ GHOST OPERATOR │
│ │
│ ┌──────────┐ ┌──────────┐ ┌────────────┐ ┌───────────┐ │
│ │ DETECT │──>│ ANALYZE │──>│ KNOWLEDGE │──>│ ACT │ │
│ │ │ │ │ │ │ │ │ │
│ │ Yutori │ │ Entity │ │ Neo4j │ │ Render │ │
│ │ Tavily │ │ Severity │ │ Senso │ │ Restart │ │
│ │ Render │ │ Classify │ │ │ │ Scale │ │
│ └──────────┘ └──────────┘ └────────────┘ └───────────┘ │
│ │ │ │
│ └─────────────── 60-second cycle ──────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ REST API · SSE Stream · Dashboard · Health Check │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
Every 60 seconds (or on-demand via Trigger Scan):
- Detect — Query Render health API, search Tavily for outage reports, poll Yutori scouts
- Analyze — Extract service names + error codes, classify severity, infer root cause
- Remediate — Check Neo4j/Senso for past similar incidents, then act via Render API (restart, scale, resume)
- Report — Generate post-mortem, store in Neo4j graph + Senso memory, broadcast to dashboard
If no signals are found, the cycle exits early.
| Tool | What it does here |
|---|---|
| Tavily | Search API that scans the web for live outage reports, status page changes, and incident-related news |
| Yutori | Deploys persistent monitoring agents ("scouts") that watch Reddit, Hacker News, and service status pages around the clock for early warning signals |
| Neo4j | Graph database that links every incident to its affected services, errors, root causes, and past fixes — so the agent can look up what worked before |
| Render | Cloud platform that both hosts Ghost Operator and is the remediation target — the agent restarts, scales, or resumes services through its API |
| Senso.ai | Stores every post-mortem as searchable memory, so the agent can recall past incidents and their resolutions when handling new ones |
┌──────────┐
AFFECTS──────────>│ Service │
│ └──────────┘
┌──────────┐──┤
│ Incident │ │HAS_ERROR───────>┌──────────┐
└──────────┘ │ │ Error │
│ └──────────┘
│CAUSED_BY───────>┌───────────┐
│ │ RootCause │
│ └───────────┘
│REMEDIATED_BY───>┌─────────────┐
│ │ Remediation │
│ └─────────────┘
│HAS_POSTMORTEM──>┌────────────┐
└ │ PostMortem │
└────────────┘
The remediator queries this graph before acting. Past incidents with similar services/errors surface what worked before.
Here's what it looks like when Ghost Operator handles a real incident end-to-end:
Scenario: A Render-hosted API service crashed due to memory exhaustion (OOM). Ghost Operator detected the suspended service, identified the root cause, restarted it, scaled to 2 instances for resilience, and wrote a post-mortem — all within 15 seconds, with zero human intervention.
Incident Timeline
─────────────────
10:34:22 [DETECT] Render health check: ghost-operator-api is suspended
10:34:23 [DETECT] Tavily: "Render API degraded performance" reported
10:34:25 [ANALYZE] Severity: CRITICAL | Services: render | Error: OOM
10:34:25 [ANALYZE] Root cause: Memory exhaustion
10:34:28 [REMEDIATE] Restarted ghost-operator-api → Success
10:34:31 [REMEDIATE] Scaled ghost-operator-api to 2 instances → Success
10:34:35 [REPORT] Post-mortem generated and stored in Neo4j + Senso
10:34:35 [SYSTEM] Cycle complete — service recovered
The post-mortem from this incident is now stored in the knowledge graph. Next time a similar OOM event occurs, the remediator will find this record and already know that restart + scale worked.
Live at /dashboard. Real-time health stats, incident feed (color-coded by severity), agent activity log, and knowledge graph counts — all updated via SSE.
Trigger Scan — Fires the full pipeline immediately instead of waiting for the next 60s cycle. Hits POST /api/trigger.
| Endpoint | Method | What |
|---|---|---|
/health |
GET | Health check |
/dashboard |
GET | Dashboard UI |
/events |
GET | SSE stream |
/api/incidents |
GET | All incidents |
/api/incidents/:id |
GET | Single incident |
/api/activity |
GET | Last 100 log entries |
/api/graph/stats |
GET | Node counts by type |
/api/graph |
GET | Full graph (nodes + edges) |
/api/trigger |
POST | Run detection cycle now |
git clone https://github.com/arhrid/ghost_operator.git
cd ghost_operator
npm install
cp .env.example .env # fill in your API keysFor Neo4j, create a free sandbox at sandbox.neo4j.com and drop the Bolt URL + credentials into .env. No schema setup needed — nodes are created on first run.
npm run dev # development
npm run build && npm start # productionDashboard at http://localhost:3000/dashboard.
A render.yaml Blueprint is included, so Render can pick up the build and start commands automatically.
-
Fork or push this repo to your own GitHub account
-
Go to Render Dashboard → New → Blueprint
-
Connect your GitHub repo — Render will detect
render.yaml -
In the Render dashboard, set these environment variables for the service:
Variable Where to get it TAVILY_API_KEYtavily.com — free tier available YUTORI_API_KEYyutori.com NEO4J_URIsandbox.neo4j.com — use the Bolt URL NEO4J_USERDefaults to neo4jon SandboxNEO4J_PASSWORDFrom your Neo4j Sandbox instance RENDER_API_KEYRender dashboard → Account Settings → API Keys SENSO_API_KEYsenso.ai SENSO_ORGANIZATION_IDFrom your Senso dashboard -
Click Apply — Render will run
npm install && npm run build, then start the server -
Your dashboard will be live at
https://<your-service-name>.onrender.com/dashboard
src/
├── index.ts # Orchestrator, Express server, cron, SSE
├── config.ts # Env var loading
├── types/index.ts # Shared types
├── agents/
│ ├── detector.ts # Tavily + Yutori + Render health
│ ├── analyzer.ts # Entity extraction, severity, root cause
│ ├── remediator.ts # Past-incident lookup + Render actions
│ └── reporter.ts # Post-mortem generation + storage
├── services/
│ ├── tavily.ts # Tavily search client
│ ├── yutori.ts # Yutori scouting client
│ ├── neo4j.ts # Neo4j graph client
│ ├── render.ts # Render API client
│ └── senso.ts # Senso memory client
└── dashboard/
└── index.html # Single-page dashboard
TypeScript, Node.js, Express, Neo4j Driver, Axios, node-cron.
Acknowledgements: Built with Claude Code & Antigravity
