v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes #972)#1388
Conversation
…arrytan#972) Bare wikilinks like [[struktura]] that point at pages in another folder were silently dropped from the graph. The issue reporter saw 71 wikilinks in Obsidian render to 12 in gbrain (~83% lost). Symptoms downstream: `gbrain graph` returns thin neighborhoods, `gbrain backlinks` undercounts. This release adds an opt-in mode that resolves bare wikilinks by basename match, covers all three resolver surfaces (FS-source extract, DB-source extract, put_page auto-link), and emits one edge per match — no silent winner on ambiguity. `gbrain doctor` surfaces a paste-ready enable hint when ≥5 bare wikilinks would resolve under the new mode. Enable with: gbrain config set link_resolution.global_basename true gbrain extract links Default stays off. Existing brains see zero behavior change on upgrade. Closes garrytan#972. Adapts PR garrytan#1233 from @rayers (regex shape + slug-tail index) into a multi-match, opt-in form with FS-source coverage that the original PR explicitly skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Heads up — I have a local commit on my fork (rayers/gbrain @ local/integration,
Worth weighing: the user who hits #972 won't know to flip the config knob, so they'll experience the silent-drop the issue describes until someone tells them about it. The 71→12 wikilink ratio in the original repro feels bad enough to me to advocate default-on. Happy to test your patch against my ~440K-page brain once it lands and report numbers. Either way thanks for the thorough test coverage — substantially more comprehensive than what I had locally. |
Brings the branch from v0.40.8.1 base up to master v0.41.31.0 (38 commits) while preserving the garrytan#972 opt-in global-basename wikilink feature. Conflict resolutions: - Migration collision: our v93 (links_link_source_widen) renumbered to v109, since master claimed v93-v108. The widened CHECK now enumerates the FULL union ('markdown','frontmatter','manual','mentions','wikilink-resolved') so it doesn't clobber master's v95 'mentions' widening (runs after it). Mirrors master's sql + sqlFor.pglite convention. - schema.sql + pglite-schema.ts: link_source CHECK = union of master's 'mentions' (+ new link_kind column) and our 'wikilink-resolved'. - extract.ts: kept master's runSlidingPool worker-pool refactor (--workers N) and threaded our { globalBasename } option through both call sites. - doctor.ts: kept both import sets (our link-extraction helpers + master's git-head/CHUNKER_VERSION). - VERSION/package.json/CHANGELOG bumped to 0.41.32.0 (next after master). - schema-embedded.ts + llms-full.txt regenerated from source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The garrytan#972 feature shipped with no user-facing docs — only CHANGELOG + CLAUDE.md. Anyone migrating an Obsidian/Notion vault with bare [[name]] wikilinks couldn't discover the link_resolution.global_basename flag unless gbrain doctor happened to surface its hint. - README "Self-wiring knowledge graph": one sentence on the opt-in mode for Obsidian-style cross-folder bare wikilinks + the doctor pre-check, linking to the install step. - INSTALL_FOR_AGENTS Step 4.5 (Wire the Knowledge Graph): a dedicated agent- facing subsection — when bare [[name]] links need it, the enable command, re-running extract, the doctor opportunity hint, and the multi-match behavior. - Regenerated llms-full.txt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ame-wikilinks The fork's master (ukd1, v0.41.31.0) trailed upstream by two releases (v0.41.32.0 staleness fix + v0.41.33.0 adaptive return-sizing), both of which collided with this branch. Synced to real upstream so PR garrytan#1388 merges clean. Re-resolutions on top of the prior merge: - Version: upstream used 0.41.32.0; bumped to 0.41.34.0 (next after 0.41.33.0). - Migration: upstream claimed v109 (sources_newest_content_at); our links_link_source_widen renumbered v109 → v110. Full-union CHECK ('markdown','frontmatter','manual','mentions','wikilink-resolved') preserved. - CHANGELOG: our 0.41.34.0 entry on top, upstream's 0.41.33.0 + 0.41.32.0 below. - schema-embedded.ts + llms-full.txt regenerated; link_source union verified intact in schema.sql + pglite-schema.ts after auto-merge. Verified: typecheck clean, privacy/jsonb/source-id guards pass, 369 surface tests (migrate/link-extraction/extract-fs/doctor) + 7 global-basename E2E green; migrations apply [109] sources_newest_content_at → [110] wikilink-basename. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Upstream shipped again after the last sync (v0.41.34.0 "retrieval cathedral" — garrytan#1657), which took version 0.41.34.0 (ours) and added migrations v110 (page_aliases) + v111 (search_telemetry_rank1_columns). Re-synced so PR garrytan#1388 merges clean. Re-resolutions: - Version: bumped 0.41.34.0 → 0.41.35.0 (next after upstream's 0.41.34.0). - Migration: our links_link_source_widen renumbered v110 → v112 (upstream now owns v110 + v111). Full-union CHECK preserved ('markdown','frontmatter','manual','mentions','wikilink-resolved'). - CHANGELOG: our 0.41.35.0 entry on top, upstream's 0.41.34.0 below. - schema-embedded.ts + llms-full.txt regenerated; link_source union verified intact in schema.sql + pglite-schema.ts after auto-merge against the new page_aliases / alias-hop schema. Verified: typecheck clean, privacy/jsonb/source-id guards pass, 376 surface + E2E tests green; migrations apply [110] page_aliases ... [112] wikilink-basename. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Upstream shipped 4 more releases (v0.41.35.0–v0.41.38.0). Only version-plane collisions this round; our migration v112 stayed past upstream's v111 high-water. - Version: bumped 0.41.35.0 → 0.41.39.0 (next after upstream's 0.41.38.0). - CHANGELOG: our 0.41.39.0 entry on top, upstream's 4 new entries below. - migrate.ts auto-merged clean (v112 unchanged); schema-embedded.ts + llms-full.txt regenerated; link_source union verified intact. Verified: typecheck clean, guards pass, 376 surface + E2E tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@ukd1 nice — glad it's holding up on your vault. I'm fully on board with default-on. The 71→12 ratio in #972 is the tell: bare Given yours is tested and working, I'd rather converge on it than ship two implementations. My #1233 (and the older Either way, happy to review the resolver code if a second pair of eyes helps it land. |
Re-sync the garrytan#972 opt-in global-basename wikilink branch onto current master. - VERSION/package.json -> 0.42.6.0 (next after master's 0.42.5.0) - CHANGELOG: our entry renumbered 0.41.39.0 -> 0.42.6.0, on top of master's - Migration v112 (links_link_source_widen_for_wikilink_basename) merged clean; master tops at v111, so v112 is free. Full-union CHECK preserved (markdown/frontmatter/manual/mentions/wikilink-resolved). - schema-embedded.ts + llms-full.txt regenerated from source. - bun install refreshed node_modules; typecheck clean. Prepares the branch for the eng-review + codex remediation (T7/T8/T9 source -correctness fixes, T1 FS provenance, T2 matcher consolidation, T3 doctor bound). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lay text Codex outside-voice [P1]: `[[struktura|the project]]` resolved the basename "the project" (the alias) instead of `struktura` (the target), because extractPageLinks called resolveBasenameMatches(ref.name) and the doctor check keyed basenameIndex.get(e.name). ref.name is the display alias (match[2]); ref.slug is the wikilink target (match[1]). - extractPageLinks resolves ref.slug; context excerpt locates ref.slug. - doctor link_resolution_opportunity keys e.slug so its estimate matches what extraction actually resolves. - Test: aliased wikilink calls resolveBasenameMatches with the target, never the display text. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-link Codex outside-voice [P1]: put_page's reconcilableOut filter excluded link_source='wikilink-resolved', so a basename edge written by auto-link survived after the bare wikilink was deleted from the page OR the link_resolution.global_basename flag was turned off (the stale-removal loop only iterates reconcilableOut). Add 'wikilink-resolved' to the reconcilable set; manual edges still untouched. Test: write page with [[struktura]] (flag on) → edge lands; re-put without the wikilink → edge reconciled away. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…edges)
Codex outside-voice [P1]: makeResolver.resolveBasenameMatches called
engine.getAllSlugs() unscoped, so a bare [[name]] could resolve to a
same-tail page in a DIFFERENT source and create a cross-source edge. The
engine exposes getAllSlugs({sourceId}) precisely to prevent this. garrytan#972 is
"global basename across folders," not "cross-source federation" — the
canonical gbrain multi-source bug class.
- makeResolver gains opts.sourceId; ensureBasenameIndex passes it to
getAllSlugs (unscoped only when sourceId omitted — back-compat).
- runAutoLink (put_page) passes opts.sourceId; extractLinksFromDB passes
sourceIdFilter. FS extract is already single-source (walks one dir).
- Tests: scoped index returns only the source's slugs (no cross-source);
unscoped call stays brain-wide.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nk-resolved'
The FS extract path is the issue's default repro (gbrain extract links with no
--source db). ExtractedLink had no link_source field, so FS basename edges
landed with the engine default ('markdown') instead of the 'wikilink-resolved'
provenance the DB / put_page paths set and the docs promise. The e2e FS test
only asserted link_type, so it was blind to this.
- ExtractedLink gains link_source?; extractLinksFromFile sets it to
'wikilink-resolved' on basename edges (undefined for ordinary markdown).
- Carries through the addLinksBatch snapshots automatically (LinkBatchInput
already has link_source); single-row addLink fallback now passes it too.
- e2e FS repro asserts link_source === 'wikilink-resolved'.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…S/doctor Codex outside-voice [P2] DRY: three surfaces each hand-rolled a basename matcher with divergent key sets — the doctor omitted the slugified key, so its link_resolution_opportunity estimate undercounted what extraction resolves, and the resolver returned matches in unsorted getAllSlugs bucket order. New shared exports in link-extraction.ts: buildBasenameIndex(slugs) + queryBasenameIndex(index, name) (keys raw/lower/slugified tail; stable sort shorter-first then lexical) + normalizeBasename. - makeResolver.resolveBasenameMatches → queryBasenameIndex (now stable-sorted). - extract.ts resolveBasenameMatchesFromSlugs → delegates to the shared pair. - doctor link_resolution_opportunity → shared builder/query (slugified key added; estimate now matches extraction). - Test: doctor counts a slugified-only match ([[Fast Weigh]] → companies/fast-weigh). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… decision Codex outside-voice P2 findings: - P2a markdown-label masking: a wikilink inside a markdown-link label ([see [[acme]]](companies/acme.md)) spawned a stray generic basename ref. Pass-1 can't match the nested brackets, so a new MARKDOWN_LABEL_WIKILINK_RE masks those spans out of pass 2c. Inner [[acme]] is now inert. - P2b FS code-fence: the FS path (extractMarkdownLinks on raw content) didn't strip code blocks like the DB path. extractLinksFromFile now scans stripCodeBlocks(content) so [[name]] inside a fence creates no FS edge. - P2c self-link guard: a basename [[own-tail]] on its own page resolved back to itself. Dropped in both extractPageLinks and the FS path. - P2d dedup: documented the decision to KEEP qualified + bare edges to the same target as separate rows (distinct provenance/audit trail). - P2e: skipFrontmatter unresolved-contract tests added. Tests: P2a inert-label, P2c self-link drop, P2b code-fence, P2e unresolved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The check did listAllPageRefs() + a getPage() per page under a 60s budget.
On a large brain (the eng-review concern) it hit the budget every non-fast
doctor run and returned a perpetual partial, adding ~60s.
Now batch-loads the 1000 most-recent pages in ONE query
(ORDER BY id DESC LIMIT SAMPLE_LIMIT) and scans in memory, with the 60s cap
kept as a backstop. Mirrors the v0.40.9 sampling convention. The estimate
message names the bound when the brain exceeds the sample
("scanned the 1000 most-recent of N pages").
Test: source-grep pins the bounded query + the absence of the per-page
getPage walk.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…112 / 0.42.6.0 Merge churn left intermediate refs: schema.sql + schema-embedded.ts said "migration v93", CLAUDE.md said "v0.41.32.0 / Migration v109", CHANGELOG said "Migration v93". Reconciled all to migration v112 / shipping 0.42.6.0. The CLAUDE.md annotation is also refreshed to describe the final behavior (shared matcher, source-scoping, alias-by-target, stale-edge reconciliation, bounded doctor scan) and credit @rayers + @ukd1. Regenerated schema-embedded + llms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second re-sync onto current master to ship garrytan#972 as v0.42.10.0. - VERSION/package.json -> 0.42.10.0 (buffer over master's 0.42.8.0 for queue collisions, per request). - CHANGELOG: our 0.42.10.0 entry on top; master's 0.42.8.0/0.42.7.0/0.42.6.0 history preserved below. - Migration collision: master claimed v112 (pages_links_extracted_at, garrytan#1696), so our links_link_source_widen renumbered v112 -> v113. Full-union CHECK preserved (markdown/frontmatter/manual/mentions/wikilink-resolved). - extract.ts: unioned imports (master's LINK_EXTRACTOR_VERSION_TS/LinkCandidate + our buildBasenameIndex/queryBasenameIndex/stripCodeBlocks). - doctor.test.ts: kept both garrytan#972 link_resolution describe and master's garrytan#1699 quarantine/flagged describe. - CLAUDE.md: de-glommed import-file line, kept master's garrytan#1699 entries + our garrytan#972 annotation (v0.42.10.0 / migration v113). - schema.sql comment v112 -> v113; schema-embedded + llms regenerated. - typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…to 800KB Two full-suite gate failures from the re-sync: - doctor-categories drift guard: the new `link_resolution_opportunity` check wasn't in any category set. Added to BRAIN_CHECK_NAMES (alongside graph_coverage / orphan_ratio — it's a graph-quality signal). - build-llms size budget: the garrytan#972 Key Files annotation (landing with master's garrytan#1696/garrytan#1699 waves) pushed llms-full.txt past 750KB. Bumped FULL_SIZE_BUDGET 750KB→800KB, the established "budget tracks CLAUDE.md's legitimate per-feature growth" pattern (600→700→750→800 across releases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…inks) into local/integration # Conflicts: # src/commands/extract.ts # src/core/link-extraction.ts # test/link-extraction.test.ts
* upstream/master: v0.42.23.0 feat(jobs): --nice scheduling-priority flag for jobs work/supervisor (garrytan#1815) (garrytan#1820) v0.42.22.0 fix(minions): supervisor progress watchdog + worker DB self-defense — alive-but-wedged worker self-heals (garrytan#1801) (garrytan#1824) v0.42.21.0 fix(postgres): module-singleton ownership — canonical landing for the dream-cycle "connect() has not been called" class (garrytan#1404/garrytan#1471/garrytan#1619) (garrytan#1805) v0.42.20.0 fix: reliability wave — PGLite capture lock-pin + Postgres reconnect race + search embed-hang (garrytan#1762 garrytan#1745 garrytan#1775) (garrytan#1810) v0.42.19.0 fix(skillopt): close the last gap in the AI SDK v6 tool-loop fix (write-capture mapper + regression test) (garrytan#1809) v0.42.18.0 fix: sync orphan-pileup watchdog (garrytan#1633) + links-lag µs stamp (garrytan#1768) (garrytan#1807) v0.42.17.0 fix(sync): resumable incremental sync — killed mid-import no longer loses progress (garrytan#1794) (garrytan#1808) v0.42.16.0 feat(doctor): brain health as a solved problem — cause-ranked doctor + OOM-loop line + auto-drain + pool-reap (garrytan#1685) (garrytan#1802) v0.42.15.0 fix: decouple CLI primary output from process.stdout.isTTY (garrytan#1784) (garrytan#1806) v0.42.14.0 fix(zero-config): code-* readiness signal + init embedding-key validation + lock self-heal (garrytan#1780) (garrytan#1804) v0.42.13.0 fix(search): archive/ content findable by default, demoted not hard-excluded (garrytan#1777) (garrytan#1797) v0.42.12.0 feat: self-upgrading gbrain — invocation-riding update check + opt-in auto-upgrade (garrytan#1798) v0.42.11.0 feat(skillopt): held-out eval gate, honest receipts, ENFORCE + ablation opts (garrytan#1759) v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes garrytan#972) (garrytan#1388)
What this adds
Opt-in global-basename wikilink resolution — closes #972.
If you import an Obsidian or Notion vault that uses bare
[[note-name]]wikilinks, GBrain used to silently drop any link whose target lived in a different folder. Write[[struktura]]inconcepts/knowledge-graph.mdwhile the page lives atprojects/struktura.md, and the edge never landed — Obsidian showed a dense graph, GBrain showed a thin, broken one. The issue reporter had 71 wikilinks across 20 pages; GBrain captured 12.This PR adds an opt-in mode that resolves bare wikilinks by basename (slug tail). Off by default — existing brains are unchanged.
Behavior
[[struktura]]matches bothprojects/strukturaandarchive/struktura, GBrain emits one edge to each, tagged with the newwikilink_basenamelink type andlink_source = 'wikilink-resolved'. No arbitrary winner-takes-all.gbrain extract links(FS-walk — the path the issue's repro hits — and--source db), plus the auto-link post-hook on everyput_page.gbrain doctoradds alink_resolution_opportunitycheck that surfaces a paste-ready enable hint when ≥5 bare wikilinks would resolve at a ≥20% ratio, so you know the payoff before flipping the flag.GBRAIN_LINK_RESOLUTION_GLOBAL_BASENAME→ DB configlink_resolution.global_basename→ defaultfalse.Schema
links_link_source_checkCHECK to admit'wikilink-resolved'(full set:markdown,frontmatter,manual,mentions,wikilink-resolved). Idempotent (DROP ... IF EXISTS), engine-paritysql+sqlFor.pglite. Mirrored inschema.sql+pglite-schema.tsfor fresh installs.Docs
extract, the doctor hint, and the multi-match behavior.Tests
test/link-extraction.test.ts(+17),test/extract-fs.test.ts(+14),test/doctor.test.ts(+7), andtest/e2e/global-basename-pglite.test.ts(7, including the issue's exactmkdir -p /tmp/vault/{concepts,projects}repro across FS-source, DB-source, and put_page auto-link under both flag states).Credit
PR #1233 from @rayers contributed the kernel — the generic wikilink regex + slug-tail index pattern. This PR keeps that mechanism, makes it opt-in via the config flag, replaces the first-write-wins lookup with multi-match return, and extends coverage to the FS-source path the repro actually hits.
Synced to current
master(v0.41.34.0); the'wikilink-resolved'value is unioned with master's'mentions'so neither clobbers the other.🤖 Generated with Claude Code