Skip to content

pkyanam/graphbrain

Repository files navigation

🧠 GraphBrain

A true graph database backend for GBrain — Neo4j-powered, one-click provisioning, drop-in compatible.

GBrain is an incredible knowledge graph. But under the hood, it stores links as rows in a Postgres table and does graph traversal with recursive CTEs. GraphBrain replaces that with real index-free adjacency — every link is a native Neo4j relationship, and traversal is O(1) per hop. One POST provisions a brain. You get a unique URL and API key. All of GBrain's operations work exactly the same.


Why This Exists

GBrain (by Garry Tan) is the best personal knowledge graph out there. Its engine interface is clean and well-designed. But it runs on Postgres + pgvector — a relational database with a graph-shaped API on top. That works fine at small scale, but:

  • Traversal uses recursive CTEs (O(log n) per hop instead of O(1))
  • "How am I connected to X?" requires painful multi-joins
  • Community detection / PageRank aren't possible in SQL
  • Batch link creation takes individual INSERTs (slow at scale)
  • Full-graph visualization requires loading everything into memory

GraphBrain solves all of these by implementing GBrain's exact engine interface on Neo4j — the industry-standard native graph database. Same API, real graph performance.

Operation Postgres (GBrain) Neo4j (GraphBrain)
Traversal (depth 5) Recursive CTE, ~50ms+ Index-free adjacency, ~1ms
Shortest path between two nodes Multi-join quagmire shortestPath() — one call
Batch link creation Sequential INSERTs Sub-second (UNWIND)
Community detection N/A Native Louvain algorithm
PageRank Painful SQL gds.pageRank() built-in
Full-graph visualization Load all into memory Stream from native graph

Current Status — v0.1.0

What works:

  • ✅ Full GBrain engine interface (pages, links, traversal, search, stats, timeline, tags)
  • ✅ Brain provisioning — POST /v1/brains creates an isolated Neo4j database
  • ✅ Per-brain API keys with auth on all endpoints
  • ✅ GBrain adapter — drop-in engine that talks to GraphBrain REST API (25/25 integration tests pass)
  • ✅ Native Neo4j graph traversal (BFS up to depth 10, cycle prevention)
  • ✅ Batch link creation via Cypher UNWIND
  • ✅ Full-text search via Neo4j fulltext indexes
  • ✅ Database-level isolation (each brain = separate Neo4j database)
  • ✅ Single-DB mode for AuraDB free tier (property-level isolation via brain_id)
  • ✅ Cloudflare Tunnel deployment (public HTTPS, no port forwarding)
  • ✅ One-click server setup script (curl | bash)
  • ✅ Custom domain support (live at graphbrain.belweave.ai)

Live instance: https://graphbrain.belweave.ai — running on a home server via Cloudflare Tunnel.

What's next (see Roadmap below):

  • 🔲 Persistent API key storage (currently in-memory)
  • 🔲 Migration tool — export from Postgres GBrain, import to GraphBrain Neo4j
  • 🔲 Rate limiting
  • 🔲 Horizontal scaling (Neo4j read replicas, multiple GraphBrain instances)
  • 🔲 Web dashboard for brain management

Architecture

┌──────────┐     ┌─────────────────┐     ┌──────────┐
│  GBrain  │────▶│  GraphBrain API │────▶│  Neo4j   │
│  (CLI)   │     │  (Hono / Bun)   │     │  (5.x)   │
└──────────┘     └─────────────────┘     └──────────┘
                       │
            ┌──────────┴──────────┐
            │   Multi-DB isolation │
            │   (production)       │
            │                      │
            │  POST /v1/brains     │
            │  → CREATE DATABASE   │
            │    brain_abc123      │
            │  → CREATE DATABASE   │
            │    brain_def456      │
            │                      │
            │  Each brain is a     │
            │  separate Neo4j      │
            │  database. No shared │
            │  namespace. No query │
            │  filtering. Real     │
            │  walled-off isolation.│
            └──────────────────────┘

Data Model

(:Page {slug, title, type, content, frontmatter})
    -[:LINKS_TO {type: "knows|messaged|works_at|invested_in|...", context, created_at}]->
(:Page)
    -[:HAS_TIMELINE]->
(:TimelineEntry {date, summary, detail, source})

Every GBrain concept maps cleanly: pages are nodes, links are typed relationships, timeline entries are connected nodes. No impedance mismatch.


Quick Start

Prerequisites

1. Clone and Install

git clone https://github.com/pkyanam/graphbrain
cd graphbrain
bun install

2. Start Neo4j

docker run --name graphbrain-neo4j -d \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/graphbrain-dev \
  neo4j:5-community

Wait ~30 seconds for Neo4j to boot (docker logs graphbrain-neo4j).

3. Start GraphBrain

bun run dev
# → http://localhost:3000

4. Create Your First Brain

curl -X POST http://localhost:3000/v1/brains \
  -H "Content-Type: application/json" \
  -d '{"name": "my-brain"}'

Response:

{
  "brain_id": "brain_a1b2c3d4",
  "name": "my-brain",
  "url": "http://localhost:3000/v1/brain_a1b2c3d4",
  "api_key": "sk_...",
  "endpoints": {
    "pages":     "http://localhost:3000/v1/brain_a1b2c3d4/pages",
    "links":     "http://localhost:3000/v1/brain_a1b2c3d4/links",
    "traverse":  "http://localhost:3000/v1/brain_a1b2c3d4/traverse",
    "graph":     "http://localhost:3000/v1/brain_a1b2c3d4/graph",
    "stats":     "http://localhost:3000/v1/brain_a1b2c3d4/stats",
    "search":    "http://localhost:3000/v1/brain_a1b2c3d4/search"
  }
}

5. Use It

# Save these for convenience
KEY="sk_..."
BRAIN="http://localhost:3000/v1/brain_a1b2c3d4"

# Create pages
curl -X PUT "$BRAIN/pages/alice" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $KEY" \
  -d '{"title":"Alice Chen","type":"person","content":"Software engineer and open source contributor."}'

curl -X PUT "$BRAIN/pages/bob" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $KEY" \
  -d '{"title":"Bob Smith","type":"person","content":"Designer and co-founder."}'

# Create a typed link — this is a real Neo4j relationship
curl -X POST "$BRAIN/links" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $KEY" \
  -d '{"from_slug":"alice","to_slug":"bob","link_type":"knows","context":"Met at a conference"}'

# Traverse the graph — native Neo4j BFS, not SQL CTEs
curl -X POST "$BRAIN/traverse" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $KEY" \
  -d '{"start_slug":"alice","depth":3,"direction":"out"}'

# Full-text search
curl "$BRAIN/search?q=engineer" -H "X-API-Key: $KEY"

Run the Smoke Test

chmod +x scripts/smoke-test.sh
./scripts/smoke-test.sh

GBrain Integration

GraphBrain implements the exact same engine interface as GBrain's src/core/engine.ts. To use it as GBrain's backend, drop in the adapter:

import { GraphBrainEngine } from "graphbrain/src/adapter";

const engine = new GraphBrainEngine({
  url: "https://your-graphbrain.example.com/v1/brain_abc123",
  apiKey: "sk_..."
});

// All existing GBrain code works unchanged:
await engine.putPage(brainId, { slug: "alice", title: "Alice Chen", type: "person" });
await engine.addLink(brainId, { from_slug: "alice", to_slug: "bob", link_type: "knows" });
await engine.traverseGraph(brainId, "alice", 3, "out");

The adapter handles all HTTP communication — your GBrain CLI, MCP server, and cron jobs don't change. Full integration test suite: 25/25 passing.


API Reference

All brain endpoints require authentication via X-API-Key: sk_... header (or Authorization: Bearer sk_...).

Provisioning

Method Path Description
POST /v1/brains Create a new brain
GET /v1/brains List all brains
DELETE /v1/brains/:brainId Delete a brain

Pages

Method Path Description
PUT /v1/:brainId/pages/:slug Create or update a page
GET /v1/:brainId/pages/:slug Get a page by slug
DELETE /v1/:brainId/pages/:slug Delete a page
GET /v1/:brainId/pages List pages (?type=person&limit=50&offset=0)

Put page:

{
  "title": "Alice Chen",
  "type": "person",
  "content": "Software engineer and open source contributor.",
  "frontmatter": { "tags": ["engineering"] }
}

Links (Graph Edges)

Method Path Description
POST /v1/:brainId/links Create a typed edge
POST /v1/:brainId/links/batch Batch create edges (UNWIND — sub-second for 1K+)
DELETE /v1/:brainId/links Remove an edge
GET /v1/:brainId/links/:slug Get outgoing edges from a page
GET /v1/:brainId/backlinks/:slug Get incoming edges to a page

Create link:

{
  "from_slug": "alice",
  "to_slug": "bob",
  "link_type": "knows",
  "context": "Met at ReactConf 2025"
}

Batch create:

{
  "links": [
    { "from_slug": "alice", "to_slug": "bob", "link_type": "knows" },
    { "from_slug": "alice", "to_slug": "carol", "link_type": "works_at" },
    { "from_slug": "bob", "to_slug": "carol", "link_type": "knows" }
  ]
}

Graph Operations

Method Path Description
POST /v1/:brainId/traverse BFS graph traversal
GET /v1/:brainId/graph Full graph for visualization
GET /v1/:brainId/orphans Pages with no inbound links

Traverse:

{
  "start_slug": "alice",
  "depth": 3,
  "direction": "out",
  "link_type": "knows"
}

Timeline

Method Path Description
POST /v1/:brainId/timeline Add a timeline entry
POST /v1/:brainId/timeline/batch Batch add timeline entries
GET /v1/:brainId/timeline/:slug Get timeline for a page

Search & Stats

Method Path Description
GET /v1/:brainId/search?q=query&limit=20 Full-text search across pages
GET /v1/:brainId/stats Brain statistics

Stats response:

{
  "page_count": 47,
  "link_count": 68,
  "brain_score": 44,
  "pages_by_type": { "person": 18, "company": 16, "vc_firm": 5 },
  "most_connected": [{ "slug": "alice", "title": "Alice Chen", "link_count": 12 }]
}

Deployment

Option 1: One-Click Server Setup

For any Ubuntu/Debian server with Docker:

curl -fsSL https://raw.githubusercontent.com/pkyanam/graphbrain/main/scripts/setup-server.sh | bash

This installs Bun, Docker, Neo4j, GraphBrain, and creates a Cloudflare Tunnel — you get a public https://*.trycloudflare.com URL in ~5 minutes.

Option 2: Docker Compose (Local Dev)

docker compose up -d
# GraphBrain: http://localhost:3000
# Neo4j Browser: http://localhost:7474 (neo4j / graphbrain-dev)

Option 3: Railway + Neo4j

  1. Deploy from GitHub — Railway auto-detects Bun
  2. Add Neo4j as a service (or connect external)
  3. Set NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD, PUBLIC_URL
  4. Multi-DB isolation is active by default

Option 4: AuraDB Free Tier (Single-DB Mode)

Set NEO4J_SINGLE_DB=true to work with AuraDB's free tier (limited to 1 database). Brains are isolated via a brain_id property on every node. Not for production.


Roadmap

v0.2 — Production Readiness

  • Persistent API key storage — SQLite or Postgres backing for brain keys (currently in-memory, lost on restart)
  • Rate limiting — per-brain, per-endpoint request caps
  • Migration tooling — export from Postgres GBrain, import into GraphBrain Neo4j
  • Health dashboard/health expanded with Neo4j connectivity, uptime, throughput

v0.3 — Web Dashboard

  • Browser-based brain management — create/delete brains, view stats, explore graph
  • Graph visualization — interactive D3/Three.js force-directed graph of your brain
  • API key management — rotate, revoke, view usage per key
  • Custom domains — done: live at graphbrain.belweave.ai via Cloudflare Tunnel

v0.4 — Scale

  • Horizontal scaling — multiple GraphBrain API instances behind a load balancer
  • Neo4j read replicas — route read queries to replicas, writes to primary
  • Causal clustering — Neo4j Enterprise for HA and multi-region
  • Usage metering — per-brain page/link/traversal counts

v0.5 — Ecosystem

  • GBrain CLI plugingbrain engine use graphbrain without manual config
  • MCP server support — expose GraphBrain via Model Context Protocol for AI agents
  • LangChain / LlamaIndex integration — use your brain as a knowledge graph RAG backend
  • Community templates — pre-built brain schemas (CRM, investor CRM, research graph)

FAQ

Does this replace GBrain? No. GraphBrain is a backend for GBrain. GBrain's CLI, MCP server, sync, and enrichment pipeline remain unchanged. GraphBrain replaces the storage layer for graph operations.

Do I need to migrate my data? Yes — you'd export pages and links from GBrain's Postgres database and import them into GraphBrain. A migration script is planned (v0.2).

Can I use both Postgres and Neo4j? Yes. The hybrid approach is recommended for large brains: Postgres for content/search/embeddings, GraphBrain (Neo4j) for links/traversal/graph algorithms.

What about embeddings / pgvector? pgvector is excellent for vector search and GBrain should keep using it. GraphBrain handles the graph layer, not embeddings. A hybrid deployment pairs Postgres (embeddings + full-text) with Neo4j (graph traversal + algorithms).

Can I use a custom domain? Yes. This instance runs at https://graphbrain.belweave.ai on a home server behind a Cloudflare Tunnel. To set up your own:

  1. Add your domain to Cloudflare DNS (free tier)
  2. Create a named tunnel in the Cloudflare Zero Trust dashboard
  3. Install cloudflared as a systemd service with the tunnel token:
    sudo cloudflared service install <token>
  4. Add a public hostname in the dashboard pointing to localhost:3000

Zero port forwarding, auto-renewing SSL, survives reboots.

Is this production-ready? v0.1.0 — functional and tested at small scale. Needs persistent key storage, rate limiting, and migration tooling before production workloads. See the roadmap above.


License

MIT

About

graphbrainA true graph database backend for GBrain — Neo4j-powered, one-click provisioning, drop-in compatible.

Resources

Stars

Watchers

Forks

Contributors