Ghost Operator

An autonomous infrastructure agent. It detects incidents, figures out what broke, fixes it, and writes a post-mortem — then uses that post-mortem to handle the next incident better.

In Simple Terms

Your app is running on cloud servers. One of them crashes at 3 AM. Normally, an engineer gets paged, wakes up, SSH's in, pokes around logs, restarts the service, maybe scales it up, then spends 30 minutes writing an incident report.

Ghost Operator replaces that entire workflow. It's constantly scanning — checking your Render services, searching the web for outage reports, monitoring status pages and community forums. When it picks up a signal that something is wrong, it kicks off a pipeline:

It figures out what broke — which service, what error codes showed up, how bad it is.
It checks its memory: "Have I seen this before? What fixed it last time?"
It takes action — restarts the service, scales it up, whatever the situation calls for.
It writes a post-mortem and saves it, so next time it has more context.

The key thing: it gets better over time. Every incident becomes a node in a knowledge graph. Every post-mortem becomes searchable memory. The agent that handles incident #50 is meaningfully smarter than the one that handled incident #1.

Architecture

┌────────────────────────────────────────────────────────────────┐
│                         GHOST OPERATOR                         │
│                                                                │
│  ┌──────────┐   ┌──────────┐   ┌────────────┐   ┌───────────┐  │
│  │  DETECT  │──>│ ANALYZE  │──>│ KNOWLEDGE  │──>│    ACT    │  │
│  │          │   │          │   │            │   │           │  │
│  │ Yutori   │   │ Entity   │   │ Neo4j      │   │ Render    │  │
│  │ Tavily   │   │ Severity │   │ Senso      │   │ Restart   │  │
│  │ Render   │   │ Classify │   │            │   │ Scale     │  │
│  └──────────┘   └──────────┘   └────────────┘   └───────────┘  │
│        │                                              │        │
│        └─────────────── 60-second cycle ──────────────┘        │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  REST API  ·  SSE Stream  ·  Dashboard  ·  Health Check  │  │
│  └──────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────┘

Pipeline

Every 60 seconds (or on-demand via Trigger Scan):

Detect — Query Render health API, search Tavily for outage reports, poll Yutori scouts
Analyze — Extract service names + error codes, classify severity, infer root cause
Remediate — Check Neo4j/Senso for past similar incidents, then act via Render API (restart, scale, resume)
Report — Generate post-mortem, store in Neo4j graph + Senso memory, broadcast to dashboard

If no signals are found, the cycle exits early.

Service Integrations

Tool	What it does here
Tavily	Search API that scans the web for live outage reports, status page changes, and incident-related news
Yutori	Deploys persistent monitoring agents ("scouts") that watch Reddit, Hacker News, and service status pages around the clock for early warning signals
Neo4j	Graph database that links every incident to its affected services, errors, root causes, and past fixes — so the agent can look up what worked before
Render	Cloud platform that both hosts Ghost Operator and is the remediation target — the agent restarts, scales, or resumes services through its API
Senso.ai	Stores every post-mortem as searchable memory, so the agent can recall past incidents and their resolutions when handling new ones

Knowledge Graph

                                 ┌──────────┐
               AFFECTS──────────>│ Service  │
               │                 └──────────┘
 ┌──────────┐──┤
 │ Incident │  │HAS_ERROR───────>┌──────────┐
 └──────────┘  │                 │  Error   │
               │                 └──────────┘
               │CAUSED_BY───────>┌───────────┐
               │                 │ RootCause │
               │                 └───────────┘
               │REMEDIATED_BY───>┌─────────────┐
               │                 │ Remediation │
               │                 └─────────────┘
               │HAS_POSTMORTEM──>┌────────────┐
               └                 │ PostMortem │
                                 └────────────┘

The remediator queries this graph before acting. Past incidents with similar services/errors surface what worked before.

Example Incident

Here's what it looks like when Ghost Operator handles a real incident end-to-end:

Scenario: A Render-hosted API service crashed due to memory exhaustion (OOM). Ghost Operator detected the suspended service, identified the root cause, restarted it, scaled to 2 instances for resilience, and wrote a post-mortem — all within 15 seconds, with zero human intervention.

Incident Timeline
─────────────────
10:34:22  [DETECT]     Render health check: ghost-operator-api is suspended
10:34:23  [DETECT]     Tavily: "Render API degraded performance" reported
10:34:25  [ANALYZE]    Severity: CRITICAL | Services: render | Error: OOM
10:34:25  [ANALYZE]    Root cause: Memory exhaustion
10:34:28  [REMEDIATE]  Restarted ghost-operator-api → Success
10:34:31  [REMEDIATE]  Scaled ghost-operator-api to 2 instances → Success
10:34:35  [REPORT]     Post-mortem generated and stored in Neo4j + Senso
10:34:35  [SYSTEM]     Cycle complete — service recovered

The post-mortem from this incident is now stored in the knowledge graph. Next time a similar OOM event occurs, the remediator will find this record and already know that restart + scale worked.

Dashboard

Live at /dashboard. Real-time health stats, incident feed (color-coded by severity), agent activity log, and knowledge graph counts — all updated via SSE.

Trigger Scan — Fires the full pipeline immediately instead of waiting for the next 60s cycle. Hits POST /api/trigger.

API

Endpoint	Method	What
`/health`	GET	Health check
`/dashboard`	GET	Dashboard UI
`/events`	GET	SSE stream
`/api/incidents`	GET	All incidents
`/api/incidents/:id`	GET	Single incident
`/api/activity`	GET	Last 100 log entries
`/api/graph/stats`	GET	Node counts by type
`/api/graph`	GET	Full graph (nodes + edges)
`/api/trigger`	POST	Run detection cycle now

Setup

git clone https://github.com/arhrid/ghost_operator.git
cd ghost_operator
npm install
cp .env.example .env   # fill in your API keys

For Neo4j, create a free sandbox at sandbox.neo4j.com and drop the Bolt URL + credentials into .env. No schema setup needed — nodes are created on first run.

npm run dev             # development
npm run build && npm start  # production

Dashboard at http://localhost:3000/dashboard.

Deploy to Render

A render.yaml Blueprint is included, so Render can pick up the build and start commands automatically.

Fork or push this repo to your own GitHub account
Go to Render Dashboard → New → Blueprint
Connect your GitHub repo — Render will detect render.yaml

In the Render dashboard, set these environment variables for the service:

Variable	Where to get it
`TAVILY_API_KEY`	tavily.com — free tier available
`YUTORI_API_KEY`	yutori.com
`NEO4J_URI`	sandbox.neo4j.com — use the Bolt URL
`NEO4J_USER`	Defaults to `neo4j` on Sandbox
`NEO4J_PASSWORD`	From your Neo4j Sandbox instance
`RENDER_API_KEY`	Render dashboard → Account Settings → API Keys
`SENSO_API_KEY`	senso.ai
`SENSO_ORGANIZATION_ID`	From your Senso dashboard

Click Apply — Render will run npm install && npm run build, then start the server
Your dashboard will be live at https://<your-service-name>.onrender.com/dashboard

Project Structure

src/
├── index.ts              # Orchestrator, Express server, cron, SSE
├── config.ts             # Env var loading
├── types/index.ts        # Shared types
├── agents/
│   ├── detector.ts       # Tavily + Yutori + Render health
│   ├── analyzer.ts       # Entity extraction, severity, root cause
│   ├── remediator.ts     # Past-incident lookup + Render actions
│   └── reporter.ts       # Post-mortem generation + storage
├── services/
│   ├── tavily.ts         # Tavily search client
│   ├── yutori.ts         # Yutori scouting client
│   ├── neo4j.ts          # Neo4j graph client
│   ├── render.ts         # Render API client
│   └── senso.ts          # Senso memory client
└── dashboard/
    └── index.html        # Single-page dashboard

Stack

TypeScript, Node.js, Express, Neo4j Driver, Axios, node-cron.

Acknowledgements: Built with Claude Code & Antigravity

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
docs/screenshots		docs/screenshots
src		src
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
render.yaml		render.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ghost Operator

In Simple Terms

Architecture

Pipeline

Service Integrations

Knowledge Graph

Example Incident

Dashboard

API

Setup

Deploy to Render

Project Structure

Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ghost Operator

In Simple Terms

Architecture

Pipeline

Service Integrations

Knowledge Graph

Example Incident

Dashboard

API

Setup

Deploy to Render

Project Structure

Stack

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages