design: type proliferation — 94 types should be ~14 (DRY/MECE unification proposal)

## Problem

A production brain with 186K pages has organically accumulated **94 distinct page types**. Most are duplicates, near-duplicates, or one-off types that should be subtypes via frontmatter fields. This violates DRY (same concept under multiple types) and MECE (types overlap, creating ambiguity about where new content goes).

The type system is the foundation for schema packs, search filtering, extract behavior, enrichment routing, and `expert_routing`. When types are noisy, every downstream feature degrades.

## Production Data

### Cluster 1: Social posts — 7 types, 106K pages

| Type | Count |
|------|-------|
| tweet | 33,409 |
| tweet-bundle | 42,498 |
| media/x-tweet/bundle | 9,246 |
| tweet-stub | 1,625 |
| media/x-account/daily | 19,631 |
| media/x-account/monthly | 67 |
| media/x-account | 15 |

**Should be:** `tweet` with `subtype: single|bundle|stub` + `social-digest` with `period: daily|monthly`. 7 → 2.

### Cluster 2: Articles — 5 types, 3.6K pages

| Type | Count |
|------|-------|
| article | 1,418 |
| media/article | 1,518 |
| sources/article | 635 |
| source/article | 40 |
| source | 24 |

**Should be:** ONE type `article`. 5 → 1.

### Cluster 3: Companies — 4 types, 13.5K pages

| Type | Count |
|------|-------|
| company | 5,210 |
| yc-company | 5,721 |
| product | 2,629 |
| organization | 2 |

"Accelerator membership" should be a field (`batch: S23`), not a separate type. **Should be:** `company` with `kind: company|product|org`. 4 → 1.

### Cluster 4: Atoms — 6 types, 18K pages

| Type | Count |
|------|-------|
| atom | 13,634 |
| atom-extraction | 395 |
| content-atom | 8 |
| atom-partner-link | 8 |
| partner-atom-link | 6 |
| lore | 4,088 |

`atom-partner-link` and `partner-atom-link` should be LINKS, not page types. **Should be:** `atom` with `origin: extraction|manual|lore`. 6 → 1.

### Cluster 5: Media/Content — 8 types, 8.7K pages

| Type | Count |
|------|-------|
| media | 7,510 |
| video | 618 |
| youtube-video | 130 |
| writing | 251 |
| essay | 159 |
| blog-post | 27 |
| book | 8 |
| podcast | 1 |

Content format is a frontmatter field, not a type. **Should be:** `media` with `format: video|article|essay|book|podcast`. 8 → 1.

### Cluster 6: Analysis — 8 types, 40 pages(!)

| Type | Count |
|------|-------|
| analysis | 9 |
| media/analysis | 2 |
| media-analysis | 1 |
| media/x-account/analysis | 1 |
| research | 8 |
| organization-research | 1 |
| competitive-intel | 1 |
| yc/competitive-intel | 17 |

8 types for 40 pages is the clearest sign of ad-hoc proliferation. **Should be:** `analysis` with `domain` field. 8 → 1.

### Cluster 7: Concept redirects outnumber concepts

- concept: 4,304
- concept-redirect: 5,519

More redirect pages than real pages. Redirects should be an **alias table**, not 5.5K stub pages that inflate orphan counts and waste embedding tokens.

### Cluster 8: One-off types — 25+ types with 1-2 pages each

civic, framework, insight, anecdote, principle, memo, rfs-draft, pitch-deck, policy-criticism, production-doc, recording-snippet, registry, reference, schema, video-script, web_page, log, agent-log, content-mining, meta-prompt, queue, eval-test...

**Should be:** tags or subtypes of `note`.

### Cluster 9: Symlinks as pages — 54 pages

symlink, partner-symlink, symlink-manifest — filesystem operations stored as brain pages.

## Why This Matters

1. **Schema packs can't be MECE** — the pack declares types but the brain has 94, many undeclared. `schema_review_orphans` can't distinguish intentional from noise.
2. **Search filtering is ambiguous** — `--type article` misses 2.2K articles typed as `media/article`, `sources/article`, etc.
3. **Enrichment routing is incomplete** — `enrichable_types` can only list a few. 80+ types means most pages never get enriched.
4. **Agent confusion** — when ingesting a new article, should it be `article`, `media/article`, `sources/article`, or `source/article`?
5. **Orphan inflation** — concept-redirect pages (5.5K) inflate orphan count without adding knowledge value.

## Proposed Target Taxonomy

| Type | Covers | Current types merged |
|---|---|---|
| `person` | People | person, partner, partner-profile |
| `company` | Companies, orgs, products | company, yc-company, product, organization |
| `concept` | Ideas | concept (redirects → alias table) |
| `atom` | Knowledge units | atom, atom-extraction, content-atom, lore |
| `tweet` | Social posts | tweet, tweet-bundle, tweet-stub, media/x-tweet/bundle |
| `social-digest` | Social summaries | media/x-account/* |
| `article` | Web content | article, media/article, sources/article, source/* |
| `media` | Rich content | media, video, youtube-video, book, podcast |
| `writing` | Original writing | writing, essay, blog-post |
| `meeting` | Temporal discussions | meeting, call, interview |
| `analysis` | Research + intel | analysis, research, competitive-intel, all variants |
| `event` | Events | event, convention |
| `deal` | Deals | deal |
| `note` | Everything else | note, memo, insight, principle, framework, all one-offs |

**94 types → 14 types.** Distinctions move to frontmatter fields (`subtype`, `format`, `origin`, `period`, `domain`).

## Migration Path

1. Schema pack declares the 14 canonical types with `aliases` covering the old names
2. Migration script retypes pages (e.g., `media/article` → `article` with `source_collection: media`)
3. concept-redirect pages → alias table entries + soft-delete
4. symlink/atom-partner-link pages → proper link table entries + soft-delete
5. One-off types → retype to `note` with original type preserved as `legacy_type` tag

## Impact on Existing Features

- `inferType` path prefix mapping shrinks dramatically
- Schema pack `page_types` goes from 30+ to 14 entries
- `enrichable_types` covers more of the brain naturally
- `extract` type filters work correctly across the whole corpus
- `find_experts` expert_routing covers all entity types cleanly
- `schema_review_orphans` becomes meaningful (currently noisy)
- Agent ingestion becomes unambiguous (one type per domain)

## Related

- #1383 (gbrain onboard — migration prompts would drive this)
- #1409 (consolidated design doc)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design: type proliferation — 94 types should be ~14 (DRY/MECE unification proposal) #1479

Problem

Production Data

Cluster 1: Social posts — 7 types, 106K pages

Cluster 2: Articles — 5 types, 3.6K pages

Cluster 3: Companies — 4 types, 13.5K pages

Cluster 4: Atoms — 6 types, 18K pages

Cluster 5: Media/Content — 8 types, 8.7K pages

Cluster 6: Analysis — 8 types, 40 pages(!)

Cluster 7: Concept redirects outnumber concepts

Cluster 8: One-off types — 25+ types with 1-2 pages each

Cluster 9: Symlinks as pages — 54 pages

Why This Matters

Proposed Target Taxonomy

Migration Path

Impact on Existing Features

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Type	Count
tweet	33,409
tweet-bundle	42,498
media/x-tweet/bundle	9,246
tweet-stub	1,625
media/x-account/daily	19,631
media/x-account/monthly	67
media/x-account	15

Type	Count
article	1,418
media/article	1,518
sources/article	635
source/article	40
source	24

Type	Count
atom	13,634
atom-extraction	395
content-atom	8
atom-partner-link	8
partner-atom-link	6
lore	4,088

Type	Count
media	7,510
video	618
youtube-video	130
writing	251
essay	159
blog-post	27
book	8
podcast	1

Type	Count
analysis	9
media/analysis	2
media-analysis	1
media/x-account/analysis	1
research	8
organization-research	1
competitive-intel	1
yc/competitive-intel	17

Type	Covers	Current types merged
`person`	People	person, partner, partner-profile
`company`	Companies, orgs, products	company, yc-company, product, organization
`concept`	Ideas	concept (redirects → alias table)
`atom`	Knowledge units	atom, atom-extraction, content-atom, lore
`tweet`	Social posts	tweet, tweet-bundle, tweet-stub, media/x-tweet/bundle
`social-digest`	Social summaries	media/x-account/*
`article`	Web content	article, media/article, sources/article, source/*
`media`	Rich content	media, video, youtube-video, book, podcast
`writing`	Original writing	writing, essay, blog-post
`meeting`	Temporal discussions	meeting, call, interview
`analysis`	Research + intel	analysis, research, competitive-intel, all variants
`event`	Events	event, convention
`deal`	Deals	deal
`note`	Everything else	note, memo, insight, principle, framework, all one-offs

design: type proliferation — 94 types should be ~14 (DRY/MECE unification proposal) #1479

Description

Problem

Production Data

Cluster 1: Social posts — 7 types, 106K pages

Cluster 2: Articles — 5 types, 3.6K pages

Cluster 3: Companies — 4 types, 13.5K pages

Cluster 4: Atoms — 6 types, 18K pages

Cluster 5: Media/Content — 8 types, 8.7K pages

Cluster 6: Analysis — 8 types, 40 pages(!)

Cluster 7: Concept redirects outnumber concepts

Cluster 8: One-off types — 25+ types with 1-2 pages each

Cluster 9: Symlinks as pages — 54 pages

Why This Matters

Proposed Target Taxonomy

Migration Path

Impact on Existing Features

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions