Cerebrum

The context runtime for enterprise agents, by Concatenate.

Cerebrum compiles scattered company knowledge — repos, Slack decisions, specs, policies, ownership — into typed, permissioned, source-backed Context Packets that coding agents consume before they act.

Not a chatbot. Not enterprise search. A compile step for agent context.

Why

Companies already have the knowledge agents need — it's just scattered across GitHub, Slack, Google Drive, tickets, and local repos. Humans infer context from the mess. Agents can't. Without a runtime context layer, agents guess, use stale facts, miss hidden decisions, leak secrets, and produce unauditable work.

Concatenate is the compile step that fixes this:

const packet = await concatenate.compile({
  actor, // who is asking
  agent, // which agent is acting
  task: "Implement usage based billing for the API",
  intent: "coding_task",
  mode: "hybrid", // raw code stays local by default
  budget: { max_tokens: 12000 },
});

The result is a Context Packet: facts, decisions, policies, workflows, repo context, allowed and blocked tools, required approvals, citations, freshness and confidence reports — every important claim source-backed, every packet audit-logged, every inclusion explainable in the Context Inspector.

Core principles

Principle	How it shows up
Evidence first	Every important fact carries citations back to its source
Permissioned by construction	Membership scoping on every query; access checks before retrieval and before packet assembly
No silent memory	Agents propose durable memory; humans approve or reject; everything is audited
Temporal truth	Facts are superseded, never silently overwritten
Local privacy	Raw code stays on the developer's machine by default (hybrid mode)
Explainability	The Context Inspector answers "why did the agent know this?"

Monorepo layout

concatenate-app/
├── apps/
│   ├── web/        Next.js 16 control plane — auth, workspaces, dashboard,
│   │               connectors, Context Inspector, Memory Review, agents
│   ├── api/        API service (placeholder — grows with the compile API)
│   ├── worker/     Background jobs (placeholder — sync, extraction, embeddings)
│   └── daemon/     Local daemon (placeholder — repo indexing, local MCP server)
├── packages/
│   ├── database/    Drizzle ORM schema, migrations, seeds, domain modules
│   ├── shared/      Cross-package types and utilities
│   ├── context/     Extraction, embeddings, retrieval, and the compiler core
│   ├── connectors/  Sync framework — native (GitHub/Slack/Drive/Linear) and
│   │                generic (Custom API, Custom Webhook) connectors
│   ├── cli/         @concatenate/cli — local CLI (login, status, index)
│   ├── sdk/         @concatenate/sdk client (upcoming)
│   └── mcp/         MCP server + gateway (upcoming)
├── tests/          Unit, integration, and Playwright e2e suites
└── concatenate_master_spec.md   The locked build spec — single source of truth

Data model (implemented)

Thirty-one tables across twenty-three migrations, all workspace-scoped with cascade deletes:

Identity & tenancy — users (WorkOS identity sync), workspaces, workspace_members (six locked roles: owner, admin, developer, reviewer, viewer, agent)
Sources — connectors, source_documents, source_events (dedupe-keyed), citations, source_permissions
Temporal graph — entities (normalized-name deduping), entity_aliases, claims, relationships — all citation-linked, confidence-scored, with valid_from/valid_until supersession. Claims also carry a real decision lifecycle (alternatives considered, rationale, decided-by, status, and an explicit supersession chain distinct from the generic claim status)
Context Packets — context_packets, context_packet_items, context_packet_citations — every packet gets an audit id at save time; compile_traces opens a workspace-scoped trace shell for each compile request before assembly; context_packet_feedback captures real production usage signal (helpful/not helpful), separate from synthetic eval scoring
Memory — memory_proposals with guarded pending-only review transitions, including a durable intent memory type populated by a recurring-intent detector that groups repeated compile patterns
Agents & policy — agents, tools (MVP catalog of 7 coding-agent tools), agent_policies, tool_permissions (six states from allowed to disabled)
Audit — audit_events with a real SHA-256 hash chain (per-workspace, pg_advisory_xact_lock-serialized against concurrent writers, verifiable via verifyAuditChain), written transactionally on workspace creation, policy changes, and memory reviews

Retrieval gives approved decisions and intents a small, capped ranking boost over equal-confidence plain facts — durable organizational memory outranks a plain fact, and a claim an actor personally reviewed ranks higher still for that actor.

Quickstart

Requires Node 24+, pnpm 10+, and Docker.

git clone <this-repo> && cd concatenate-app
pnpm install

# Start Postgres 17 with pgvector 0.8.2 + Redis 7.4
docker compose up -d

# Apply migrations and deterministic seed data
pnpm db:migrate
pnpm db:seed

# Run the control plane
pnpm --filter @concatenate/web dev

Auth configuration

The web app uses WorkOS AuthKit. Create apps/web/.env.local:

WORKOS_CLIENT_ID=...
WORKOS_API_KEY=...
WORKOS_COOKIE_PASSWORD=...   # 32+ characters
NEXT_PUBLIC_WORKOS_REDIRECT_URI=http://localhost:3000/auth/callback
NEXT_PUBLIC_CONCATENATE_API_URL=http://localhost:3000
DATABASE_URL=postgresql://concatenate:concatenate@localhost:5432/concatenate
GITHUB_CLIENT_ID=...
GITHUB_CLIENT_SECRET=...
SLACK_CLIENT_ID=...
SLACK_CLIENT_SECRET=...
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
LINEAR_CLIENT_ID=...
LINEAR_CLIENT_SECRET=...
CONNECTOR_ENCRYPTION_KEY=... # 32 random bytes encoded as base64url

Configure the WorkOS sign-in initiation URL as http://localhost:3000/auth/sign-in.

Checks

pnpm lint               # ESLint (typed, flat config)
pnpm typecheck          # TypeScript 6 strict across all workspaces
pnpm test               # Vitest unit suites
pnpm test:integration   # Vitest against live Postgres
pnpm test:e2e           # Playwright auth-boundary coverage
pnpm build              # All workspaces, including production Next.js build

CI runs the full pyramid on every push: frozen install, format check, lint, strict typecheck, unit, integration (with provisioned Postgres + Redis), migrations, seed, and builds.

Benchmarks

Nine benchmark suites (B1–B9) prove the improvements shipped in Phases 11–17 against a real baseline — see docs/benchmarks/benchmark-plan.md for methodology, honest findings, and integrity guardrails. Run any of them locally against Postgres:

pnpm --filter @concatenate/context bench:retrieval               # B1 retrieval ablation
pnpm --filter @concatenate/context bench:packet-quality           # B2 packet quality
pnpm --filter @concatenate/context bench:latency                  # B4 latency/throughput
pnpm --filter @concatenate/context bench:index-impact              # B5 index impact
pnpm --filter @concatenate/connectors bench:resilience             # B6 connector fault injection
pnpm --filter @concatenate/context bench:dedup-quality              # B7 entity dedup precision/recall
pnpm --filter @concatenate/context bench:permission-enforcement    # B8 permission-leakage check
pnpm --filter @concatenate/context bench:cost-efficiency           # B9 cost/token efficiency

B3 (agentic task success) is explicitly deferred — it needs a real agent loop burning model tokens plus a task-set/judge design decision, not just another script.

Build progress

The build follows the locked sequential spec in concatenate_master_spec.md — one task at a time, five gates each (product behavior, typed/tested API, migrations, security checks, explainability). For the remaining manual steps to demo the live deployment, see docs/investor-demo-setup.md.

Phase	Scope	Status
0 — Repository foundation	Monorepo, tooling, CI, database foundation	✅ done
1 — Auth, workspace, app shell	WorkOS auth, workspaces + membership scoping, sidebar/topbar shell	✅ done¹
2 — Core data model	Sources, temporal graph, packets, memory, agents, audit	✅ done
3 — Web dashboard & basic UI	Live dashboard, connectors, Context Inspector, Memory Review, agents	✅ done
4 — Connectors MVP	GitHub ✅; Slack and Drive implemented with batched live OAuth review pending	⏳ provider acceptance
5 — Extraction & indexing	Extraction/chunking, pgvector embeddings, entities, cited claims/decisions, temporal relationships	✅ done
6 — Context Compiler MVP	Compile schema/API, retrieval planner, packet assembler, audit integration	✅ done
7 — Local daemon & MCP	CLI, repo indexing, local daemon, local MCP server, agent configs	✅ done
8 — Memory Review MVP	Proposal APIs, review actions, MCP propose_memory	✅ done
9 — Evals MVP	Eval suites, runner, scoring UI	✅ done
10 — Production hardening	Rate limits, retention, redaction, e2e demo, client readiness	✅ done
11 — Decision & Intent Intelligence	Decision lifecycle, durable intent memory + recurring-intent detector, retrieval boost, packet feedback, Memory Review surfaces	✅ done
12 — Connector Expansion	Generic Custom API + Custom Webhook connectors, Linear connector, connector roadmap	✅ done²
13 — UI Design System Upgrade	Design tokens, real fonts/depth/motion, status semantics, evidence components, real Audit/Local/Settings pages, decision explorer, card-based lists	✅ done
14 — Enterprise Engine Hardening	Structured logging, missing indexes, Voyage AI embeddings + background worker, connector ACLs + retrieval-time enforcement, compiler error recovery, connector sync resilience, entity fuzzy-dedup	✅ done³
15 — Organizations & Org-Scoped Sign-In	Organization schema, org-scoped sign-in entry page, demo org, native WorkOS Organization mapping	✅ done⁴
16 — UI Usefulness & Demo-Ability	Workspace Policies page, dashboard next-step guidance, recent Context Packets on the dashboard, Context Packet search	✅ done
17 — Production readiness	Railway worker deployment + prod migrations, `/api/health`, audit hash-chain hardening, DPA/compliance docs, B6–B9 benchmark suites	✅ done⁵
18 — Investor demo readiness	Entry-funnel restyle, WorkOS de-branding (copy + hosted AuthKit page), org-scoped sign-in UI, an auth-middleware bug fix	⏳ one manual step left⁶
19 — Full UI redesign	Command Center dashboard, 4-pane Context Inspector, real Packet Cards, Evidence Graph (React Flow), Memory Review Inbox with a real diff, Tailwind/shadcn/ui/TanStack Query adopted	✅ done⁷

¹ T0101 auth is complete, including a live WorkOS sign-in, protected session, and logout round trip. ² Linear (T1203) ships needs_review, same precedent as Slack/Drive — needs a real Linear OAuth app's credentials for live acceptance. ³ T1401–T1402 and T1405–T1409 are done and merged. T1403 (Voyage AI embeddings) is live in production (VOYAGE_API_KEY set, worker deployed on Railway and polling cleanly). See concatenate_master_spec.md §40 Phase 14 for exact per-task status.

⁴ T1501–T1503 are done and merged; T1504 (native WorkOS Organization mapping) ships needs_review — code-complete, merged, and has a UI entry point ("Sign in as this organization" on the Organizations page). Still needs one human interactive sign-in against the live deployment to close it out — see docs/investor-demo-setup.md.

⁵ See concatenate_master_spec.md §47 for the full breakdown — Railway apps/worker deployment, /api/health, real audit hash-chaining (was a placeholder), a DPA draft (docs/client-readiness/dpa.md), and B6–B9 benchmark suites (B3 explicitly deferred: it needs a real agent loop burning model tokens plus a task-set/judge design decision).

⁶ Full detail in concatenate_master_spec.md §48. The entry-funnel pages are restyled with the existing design system, the WorkOS branding leak is fixed (login copy and the AuthKit hosted sign-in page both), org-scoped sign-in has a UI entry point, and a QA pass caught and fixed an auth-middleware bug that broke the 404 page. The one thing left is a human interactive sign-in against the live deployment — see docs/investor-demo-setup.md for the exact steps.

⁷ Full detail in concatenate_master_spec.md §49. Adopted the original §9.1 stack (Tailwind v4, shadcn/ui, TanStack Query, React Flow, Monaco) for the first time — the app was 100% server components and hand-rolled CSS before this. Zero purple anywhere (verified by grep), a ⌘K command palette, and two real bugs fixed along the way (a tool-permission status color that silently disagreed between two pages; an unpaginated audit log hard-capped at 200 events). Also fixed in this pass, unrelated to styling: all four connector OAuth "Connect" buttons were 500ing in production from a missing NEXT_PUBLIC_CONCATENATE_API_URL env var. The CLI was found to be genuinely not ready (never published to npm; the Local tab references init/doctor commands that don't exist) — tracked as a separate, later initiative, not fixed here.

Connectors

Rather than hand-writing a bespoke integration for every source, the connector framework (packages/connectors) generalizes twice: a config-driven Custom API connector for polling REST sources, and a config-driven Custom Webhook connector for push-based sources with HMAC-verified inbound delivery. Native connectors are reserved for the highest-value, richest-normalization cases.

Connector	Path to add	Status
GitHub	Native OAuth	✅ done
Slack	Native OAuth	✅ done
Google Drive	Native OAuth	✅ done
Linear	Native OAuth	✅ done, `needs_review` pending live OAuth credentials
Notion	Native OAuth	Future
Gmail	Native OAuth	Future
Postgres (as a source)	Native, read-only	Future
Jira, Confluence, HubSpot, Salesforce	REST API	Addable today via the generic Custom API connector, no new code
Sentry, Datadog	Outbound webhook	Addable today via the generic Custom Webhook connector, no new code
Custom MCP server	Requires an MCP client	Deferred

Security posture

All app routes under /app and /api/workspaces require an authenticated session; anonymous requests are redirected or receive 401/404
Every workspace query joins through active membership — cross-workspace reads return nothing
Secrets live in env files that are git-ignored and validated at startup (fail-closed in production)
Sensitive actions (workspace creation, tool permission changes, memory reviews) write audit events transactionally, hash-chained per workspace and verifiable via verifyAuditChain
Raw local code never leaves the machine by default; hybrid mode is the locked default

Draft (pending legal review) Terms of Service, Privacy Policy, DPA, and a security overview live in docs/client-readiness/ — see that directory's README.md for the full onboarding-safety checklist.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.claude		.claude
.github/workflows		.github/workflows
apps		apps
docs		docs
packages		packages
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
CONCATENATE_PROGRESS.md		CONCATENATE_PROGRESS.md
README.md		README.md
compose.yaml		compose.yaml
concatenate_master_spec.md		concatenate_master_spec.md
eslint.config.mjs		eslint.config.mjs
package.json		package.json
playwright.config.ts		playwright.config.ts
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
railway.toml		railway.toml
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts
vitest.integration.config.ts		vitest.integration.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cerebrum

Why

Core principles

Monorepo layout

Data model (implemented)

Quickstart

Auth configuration

Checks

Benchmarks

Build progress

Connectors

Security posture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cerebrum

Why

Core principles

Monorepo layout

Data model (implemented)

Quickstart

Auth configuration

Checks

Benchmarks

Build progress

Connectors

Security posture

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages