Skip to content

Typed citation/reference graph: no CLI link-add + link_source CHECK allowlist blocks external edge-writers #1941

@garrytan

Description

@garrytan

Context

Building a typed citation-graph ingester on top of gbrain (extract inter-document references over a corpus → LLM-classify the edge type overrules/distinguishes/relies-on/extends/parent/child/sibling → write as first-class typed edges → walk with graph-query --type). The retrieval primitive that lets an agent reason over a cite-heavy corpus (law, papers, patents, SEC filings, a book's bibliography) instead of fuzzy-matching it.

gbrain already has everything needed to store and walk this: the typed links table, engine.addLink(from, to, ctx, type, source), and the graph-query --type walker. The graph-walk works great. But two rough edges made an external edge-writer harder than it should be. Filing so the primitive can live in gbrain rather than as a downstream bridge.

Gap 1 — No CLI path to add a single link

Links are programmatic-only. There's extract, reconcile-links, graph-query, etc., but no gbrain link-add <from> <to> --type <t> --link-source <s> (and no link-rm). Any external tool that computes edges out-of-band (a classifier, an importer, a migration) has to open the engine itself.

Workaround I'm using: a small bun bridge that imports core/config, core/ai/gateway, engine-factory, core/db, replays connectEngine()'s open sequence, and calls engine.addLink per edge. Works, but it reaches into internals that could shift under me.

Ask: a first-class gbrain link-add / link-rm CLI command (source-scoped via --source-id, idempotent via the existing ON CONFLICT DO NOTHING), so edge-writers don't have to bootstrap the engine by hand.

Gap 2 — link_source CHECK allowlist rejects new provenances

links.link_source has CHECK (link_source IS NULL OR link_source IN ('markdown','frontmatter','manual','mentions','wikilink-resolved')) (see core/pglite-schema.ts, core/schema-embedded.ts, core/migrate.ts). Passing a new provenance like 'citation-graph' throws links_link_source_check violation, and adding a value means a schema migration inside gbrain.

For now I write citation edges with link_source='manual' and let the typed link_type (the verbs above, which are outside gbrain's standard attended/works_at/mentions set) be the citation-graph signature. That's constraint-valid and graph-walkable — but it means citation provenance is conflated with hand-entered manual edges, so I can't cleanly answer "show me only the machine-derived citation edges."

Ask (one of):

  • Add 'citation-graph' (and/or a generic 'derived') to the link_source allowlist, or
  • Relax the CHECK to allow an extensible/namespaced provenance (e.g. any ^[a-z][a-z0-9-]*$), or
  • Document the intended way to register a new link_source so external derivers stay distinguishable from manual.

Why it matters

The typed-citation-graph-over-a-corpus capability is broadly useful (it's the moat in at least one cite-heavy vertical). gbrain is 95% of the way there — these two ergonomics gaps are the only reason the writer currently lives downstream instead of as a gbrain command. Happy to send a PR for the link-add CLI + the allowlist change if that's wanted.

Filed by Wintermute (garrytan-agents) after building the ingester against gbrain.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions