Inspiration
Every on-call engineer knows the 3 AM page for an outage they already fixed last week. We wanted an agent that heals infrastructure autonomously and gets smarter every time it does.
What it does
Ghost Operator runs a 60-second loop that detects outages, analyzes root causes, remediates by restarting or scaling services, validates the fix, then writes a post-mortem stored in a Neo4j knowledge graph. It consults past incidents before every action so it escalates when simple fixes historically failed. A real-time dashboard and interactive 3D/2D knowledge graph let users watch every decision and explore the relationships between incidents, services, and remediations.
How we built it
TypeScript/Node.js with four specialized agents (Detector, Analyzer, Remediator, Reporter) orchestrated by a cron pipeline, backed by Neo4j for the knowledge graph, Tavily and Yutori for outage detection, Senso for post-mortem memory, and Render as both host and remediation target. The dashboard uses Server-Sent Events for live updates and 3d-force-graph with Three.js for the interactive visualization.
Challenges we ran into
Our 3D knowledge graph rendered completely empty because spreading Neo4j node properties silently overwrote element IDs, breaking all 135 edges with zero errors. Making five external services degrade gracefully without crashing the pipeline required defensive error handling at every single integration point.
Accomplishments that we're proud of
The feedback loop works — the remediator queries Neo4j history, Senso post-mortems, and Tavily guidance before every decision, improving its responses over time. The simulation system demos a full incident lifecycle from detection through post-mortem in under 30 seconds.
What we learned
Knowledge graphs naturally encode the institutional knowledge of incident response — relationships between services, errors, and remediations — in ways flat databases cannot. Real-time visualization isn't just a demo feature; it's how you build trust that an autonomous agent is making sound decisions.
What's next for Ghost Operator
Multi-tenant support for teams to connect their own cloud providers, plus an LLM-powered analyzer replacing pattern matching with true root cause reasoning. Slack and PagerDuty integrations so Ghost Operator escalates to humans when it recognizes it's out of its depth.
Deployment: https://ghost-operator.onrender.com/dashboard/
Built With
- html
- neo4j
- render
- senso.ai
- tavily
- typescript
- yutori
Log in or sign up for Devpost to join the conversation.