Skip to content

v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes #972)#1388

Merged
garrytan merged 17 commits into
garrytan:masterfrom
ukd1:fix-972-global-basename-wikilinks
Jun 2, 2026
Merged

v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes #972)#1388
garrytan merged 17 commits into
garrytan:masterfrom
ukd1:fix-972-global-basename-wikilinks

Conversation

@ukd1

@ukd1 ukd1 commented May 24, 2026

Copy link
Copy Markdown
Contributor

What this adds

Opt-in global-basename wikilink resolutioncloses #972.

If you import an Obsidian or Notion vault that uses bare [[note-name]] wikilinks, GBrain used to silently drop any link whose target lived in a different folder. Write [[struktura]] in concepts/knowledge-graph.md while the page lives at projects/struktura.md, and the edge never landed — Obsidian showed a dense graph, GBrain showed a thin, broken one. The issue reporter had 71 wikilinks across 20 pages; GBrain captured 12.

This PR adds an opt-in mode that resolves bare wikilinks by basename (slug tail). Off by default — existing brains are unchanged.

gbrain config set link_resolution.global_basename true
gbrain extract links

Behavior

  • Multi-match by design. If [[struktura]] matches both projects/struktura and archive/struktura, GBrain emits one edge to each, tagged with the new wikilink_basename link type and link_source = 'wikilink-resolved'. No arbitrary winner-takes-all.
  • All three resolution surfaces covered: gbrain extract links (FS-walk — the path the issue's repro hits — and --source db), plus the auto-link post-hook on every put_page.
  • Discoverable. gbrain doctor adds a link_resolution_opportunity check that surfaces a paste-ready enable hint when ≥5 bare wikilinks would resolve at a ≥20% ratio, so you know the payoff before flipping the flag.
  • Resolution order for the flag: env GBRAIN_LINK_RESOLUTION_GLOBAL_BASENAME → DB config link_resolution.global_basename → default false.

Schema

  • Migration v110 widens the links_link_source_check CHECK to admit 'wikilink-resolved' (full set: markdown, frontmatter, manual, mentions, wikilink-resolved). Idempotent (DROP ... IF EXISTS), engine-parity sql + sqlFor.pglite. Mirrored in schema.sql + pglite-schema.ts for fresh installs.

Docs

  • README "Self-wiring knowledge graph" names the Obsidian case + the doctor pre-check.
  • INSTALL_FOR_AGENTS.md → Step 4.5 gets an agent-facing subsection: when to enable, the command, re-running extract, the doctor hint, and the multi-match behavior.

Tests

  • test/link-extraction.test.ts (+17), test/extract-fs.test.ts (+14), test/doctor.test.ts (+7), and test/e2e/global-basename-pglite.test.ts (7, including the issue's exact mkdir -p /tmp/vault/{concepts,projects} repro across FS-source, DB-source, and put_page auto-link under both flag states).

Credit

PR #1233 from @rayers contributed the kernel — the generic wikilink regex + slug-tail index pattern. This PR keeps that mechanism, makes it opt-in via the config flag, replaces the first-write-wins lookup with multi-match return, and extends coverage to the FS-source path the repro actually hits.


Synced to current master (v0.41.34.0); the 'wikilink-resolved' value is unioned with master's 'mentions' so neither clobbers the other.

🤖 Generated with Claude Code

…arrytan#972)

Bare wikilinks like [[struktura]] that point at pages in another folder
were silently dropped from the graph. The issue reporter saw 71 wikilinks
in Obsidian render to 12 in gbrain (~83% lost). Symptoms downstream:
`gbrain graph` returns thin neighborhoods, `gbrain backlinks` undercounts.

This release adds an opt-in mode that resolves bare wikilinks by basename
match, covers all three resolver surfaces (FS-source extract, DB-source
extract, put_page auto-link), and emits one edge per match — no silent
winner on ambiguity. `gbrain doctor` surfaces a paste-ready enable hint
when ≥5 bare wikilinks would resolve under the new mode.

Enable with:
  gbrain config set link_resolution.global_basename true
  gbrain extract links

Default stays off. Existing brains see zero behavior change on upgrade.

Closes garrytan#972. Adapts PR garrytan#1233 from @rayers (regex shape + slug-tail index)
into a multi-match, opt-in form with FS-source coverage that the original
PR explicitly skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rayers

rayers commented May 25, 2026

Copy link
Copy Markdown

Heads up — I have a local commit on my fork (rayers/gbrain @ local/integration, 85ba5f1d from 2026-05-09) that addresses the same bug class by removing the DIR_PATTERN gate in WIKILINK_RE outright. Approaches diverge on the policy question:

  • Yours: opt-in via link_resolution.global_basename, default off — strict backward compat.
  • Mine: bare wikilinks just work by default — no flag needed.

Worth weighing: the user who hits #972 won't know to flip the config knob, so they'll experience the silent-drop the issue describes until someone tells them about it. The 71→12 wikilink ratio in the original repro feels bad enough to me to advocate default-on.

Happy to test your patch against my ~440K-page brain once it lands and report numbers. Either way thanks for the thorough test coverage — substantially more comprehensive than what I had locally.

ukd1 and others added 3 commits May 30, 2026 19:21
Brings the branch from v0.40.8.1 base up to master v0.41.31.0 (38 commits)
while preserving the garrytan#972 opt-in global-basename wikilink feature.

Conflict resolutions:
- Migration collision: our v93 (links_link_source_widen) renumbered to v109,
  since master claimed v93-v108. The widened CHECK now enumerates the FULL
  union ('markdown','frontmatter','manual','mentions','wikilink-resolved') so
  it doesn't clobber master's v95 'mentions' widening (runs after it). Mirrors
  master's sql + sqlFor.pglite convention.
- schema.sql + pglite-schema.ts: link_source CHECK = union of master's
  'mentions' (+ new link_kind column) and our 'wikilink-resolved'.
- extract.ts: kept master's runSlidingPool worker-pool refactor (--workers N)
  and threaded our { globalBasename } option through both call sites.
- doctor.ts: kept both import sets (our link-extraction helpers + master's
  git-head/CHUNKER_VERSION).
- VERSION/package.json/CHANGELOG bumped to 0.41.32.0 (next after master).
- schema-embedded.ts + llms-full.txt regenerated from source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The garrytan#972 feature shipped with no user-facing docs — only CHANGELOG + CLAUDE.md.
Anyone migrating an Obsidian/Notion vault with bare [[name]] wikilinks couldn't
discover the link_resolution.global_basename flag unless gbrain doctor happened
to surface its hint.

- README "Self-wiring knowledge graph": one sentence on the opt-in mode for
  Obsidian-style cross-folder bare wikilinks + the doctor pre-check, linking to
  the install step.
- INSTALL_FOR_AGENTS Step 4.5 (Wire the Knowledge Graph): a dedicated agent-
  facing subsection — when bare [[name]] links need it, the enable command,
  re-running extract, the doctor opportunity hint, and the multi-match behavior.
- Regenerated llms-full.txt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ame-wikilinks

The fork's master (ukd1, v0.41.31.0) trailed upstream by two releases
(v0.41.32.0 staleness fix + v0.41.33.0 adaptive return-sizing), both of which
collided with this branch. Synced to real upstream so PR garrytan#1388 merges clean.

Re-resolutions on top of the prior merge:
- Version: upstream used 0.41.32.0; bumped to 0.41.34.0 (next after 0.41.33.0).
- Migration: upstream claimed v109 (sources_newest_content_at); our
  links_link_source_widen renumbered v109 → v110. Full-union CHECK
  ('markdown','frontmatter','manual','mentions','wikilink-resolved') preserved.
- CHANGELOG: our 0.41.34.0 entry on top, upstream's 0.41.33.0 + 0.41.32.0 below.
- schema-embedded.ts + llms-full.txt regenerated; link_source union verified
  intact in schema.sql + pglite-schema.ts after auto-merge.

Verified: typecheck clean, privacy/jsonb/source-id guards pass, 369 surface
tests (migrate/link-extraction/extract-fs/doctor) + 7 global-basename E2E green;
migrations apply [109] sources_newest_content_at → [110] wikilink-basename.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ukd1 ukd1 changed the title v0.40.8.2 fix(extract): opt-in global-basename wikilink resolution (closes #972) v0.41.34.0 feat(extract): opt-in global-basename wikilink resolution (closes #972) May 30, 2026
@ukd1

ukd1 commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

@rayers it's working well for mine! LMK!

The 71→12 wikilink ratio in the original repro feels bad enough to me to advocate default-on.

@garrytan thoughts on default on?

Upstream shipped again after the last sync (v0.41.34.0 "retrieval cathedral" —
garrytan#1657), which took version 0.41.34.0 (ours) and added migrations v110 (page_aliases)
+ v111 (search_telemetry_rank1_columns). Re-synced so PR garrytan#1388 merges clean.

Re-resolutions:
- Version: bumped 0.41.34.0 → 0.41.35.0 (next after upstream's 0.41.34.0).
- Migration: our links_link_source_widen renumbered v110 → v112 (upstream now
  owns v110 + v111). Full-union CHECK preserved
  ('markdown','frontmatter','manual','mentions','wikilink-resolved').
- CHANGELOG: our 0.41.35.0 entry on top, upstream's 0.41.34.0 below.
- schema-embedded.ts + llms-full.txt regenerated; link_source union verified
  intact in schema.sql + pglite-schema.ts after auto-merge against the new
  page_aliases / alias-hop schema.

Verified: typecheck clean, privacy/jsonb/source-id guards pass, 376 surface +
E2E tests green; migrations apply [110] page_aliases ... [112] wikilink-basename.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ukd1 ukd1 changed the title v0.41.34.0 feat(extract): opt-in global-basename wikilink resolution (closes #972) v0.41.35.0 feat(extract): opt-in global-basename wikilink resolution (closes #972) May 30, 2026
Upstream shipped 4 more releases (v0.41.35.0–v0.41.38.0). Only version-plane
collisions this round; our migration v112 stayed past upstream's v111 high-water.

- Version: bumped 0.41.35.0 → 0.41.39.0 (next after upstream's 0.41.38.0).
- CHANGELOG: our 0.41.39.0 entry on top, upstream's 4 new entries below.
- migrate.ts auto-merged clean (v112 unchanged); schema-embedded.ts + llms-full.txt
  regenerated; link_source union verified intact.

Verified: typecheck clean, guards pass, 376 surface + E2E tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ukd1 ukd1 changed the title v0.41.35.0 feat(extract): opt-in global-basename wikilink resolution (closes #972) v0.41.39.0 feat(extract): opt-in global-basename wikilink resolution (closes #972) May 30, 2026
@rayers

rayers commented May 31, 2026

Copy link
Copy Markdown

@ukd1 nice — glad it's holding up on your vault.

I'm fully on board with default-on. The 71→12 ratio in #972 is the tell: bare [[note-name]] is the common Obsidian/Notion shape, not the exception, so silently dropping cross-folder links is a broken default, not a safe one. Strict backward-compat is protecting behavior nobody actually wants.

Given yours is tested and working, I'd rather converge on it than ship two implementations. My #1233 (and the older local/integration commit that just strips the DIR_PATTERN gate in WIKILINK_RE) solves the same bug class a blunter way. If you flip the flag to default-on, I'll close #1233 in favor of #1388 — one resolver path is better than two. If @garrytan would rather keep it opt-in for one release and flip the default later, that works too; I'll just rebase #1233 down to whatever's left.

Either way, happy to review the resolver code if a second pair of eyes helps it land.

garrytan and others added 10 commits June 1, 2026 22:13
Re-sync the garrytan#972 opt-in global-basename wikilink branch onto current master.

- VERSION/package.json -> 0.42.6.0 (next after master's 0.42.5.0)
- CHANGELOG: our entry renumbered 0.41.39.0 -> 0.42.6.0, on top of master's
- Migration v112 (links_link_source_widen_for_wikilink_basename) merged clean;
  master tops at v111, so v112 is free. Full-union CHECK preserved
  (markdown/frontmatter/manual/mentions/wikilink-resolved).
- schema-embedded.ts + llms-full.txt regenerated from source.
- bun install refreshed node_modules; typecheck clean.

Prepares the branch for the eng-review + codex remediation (T7/T8/T9 source
-correctness fixes, T1 FS provenance, T2 matcher consolidation, T3 doctor bound).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lay text

Codex outside-voice [P1]: `[[struktura|the project]]` resolved the basename
"the project" (the alias) instead of `struktura` (the target), because
extractPageLinks called resolveBasenameMatches(ref.name) and the doctor check
keyed basenameIndex.get(e.name). ref.name is the display alias (match[2]);
ref.slug is the wikilink target (match[1]).

- extractPageLinks resolves ref.slug; context excerpt locates ref.slug.
- doctor link_resolution_opportunity keys e.slug so its estimate matches
  what extraction actually resolves.
- Test: aliased wikilink calls resolveBasenameMatches with the target, never
  the display text.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-link

Codex outside-voice [P1]: put_page's reconcilableOut filter excluded
link_source='wikilink-resolved', so a basename edge written by auto-link
survived after the bare wikilink was deleted from the page OR the
link_resolution.global_basename flag was turned off (the stale-removal loop
only iterates reconcilableOut). Add 'wikilink-resolved' to the reconcilable
set; manual edges still untouched.

Test: write page with [[struktura]] (flag on) → edge lands; re-put without
the wikilink → edge reconciled away.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…edges)

Codex outside-voice [P1]: makeResolver.resolveBasenameMatches called
engine.getAllSlugs() unscoped, so a bare [[name]] could resolve to a
same-tail page in a DIFFERENT source and create a cross-source edge. The
engine exposes getAllSlugs({sourceId}) precisely to prevent this. garrytan#972 is
"global basename across folders," not "cross-source federation" — the
canonical gbrain multi-source bug class.

- makeResolver gains opts.sourceId; ensureBasenameIndex passes it to
  getAllSlugs (unscoped only when sourceId omitted — back-compat).
- runAutoLink (put_page) passes opts.sourceId; extractLinksFromDB passes
  sourceIdFilter. FS extract is already single-source (walks one dir).
- Tests: scoped index returns only the source's slugs (no cross-source);
  unscoped call stays brain-wide.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nk-resolved'

The FS extract path is the issue's default repro (gbrain extract links with no
--source db). ExtractedLink had no link_source field, so FS basename edges
landed with the engine default ('markdown') instead of the 'wikilink-resolved'
provenance the DB / put_page paths set and the docs promise. The e2e FS test
only asserted link_type, so it was blind to this.

- ExtractedLink gains link_source?; extractLinksFromFile sets it to
  'wikilink-resolved' on basename edges (undefined for ordinary markdown).
- Carries through the addLinksBatch snapshots automatically (LinkBatchInput
  already has link_source); single-row addLink fallback now passes it too.
- e2e FS repro asserts link_source === 'wikilink-resolved'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…S/doctor

Codex outside-voice [P2] DRY: three surfaces each hand-rolled a basename
matcher with divergent key sets — the doctor omitted the slugified key, so its
link_resolution_opportunity estimate undercounted what extraction resolves, and
the resolver returned matches in unsorted getAllSlugs bucket order.

New shared exports in link-extraction.ts: buildBasenameIndex(slugs) +
queryBasenameIndex(index, name) (keys raw/lower/slugified tail; stable sort
shorter-first then lexical) + normalizeBasename.

- makeResolver.resolveBasenameMatches → queryBasenameIndex (now stable-sorted).
- extract.ts resolveBasenameMatchesFromSlugs → delegates to the shared pair.
- doctor link_resolution_opportunity → shared builder/query (slugified key
  added; estimate now matches extraction).
- Test: doctor counts a slugified-only match ([[Fast Weigh]] → companies/fast-weigh).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… decision

Codex outside-voice P2 findings:
- P2a markdown-label masking: a wikilink inside a markdown-link label
  ([see [[acme]]](companies/acme.md)) spawned a stray generic basename ref.
  Pass-1 can't match the nested brackets, so a new MARKDOWN_LABEL_WIKILINK_RE
  masks those spans out of pass 2c. Inner [[acme]] is now inert.
- P2b FS code-fence: the FS path (extractMarkdownLinks on raw content) didn't
  strip code blocks like the DB path. extractLinksFromFile now scans
  stripCodeBlocks(content) so [[name]] inside a fence creates no FS edge.
- P2c self-link guard: a basename [[own-tail]] on its own page resolved back
  to itself. Dropped in both extractPageLinks and the FS path.
- P2d dedup: documented the decision to KEEP qualified + bare edges to the
  same target as separate rows (distinct provenance/audit trail).
- P2e: skipFrontmatter unresolved-contract tests added.

Tests: P2a inert-label, P2c self-link drop, P2b code-fence, P2e unresolved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The check did listAllPageRefs() + a getPage() per page under a 60s budget.
On a large brain (the eng-review concern) it hit the budget every non-fast
doctor run and returned a perpetual partial, adding ~60s.

Now batch-loads the 1000 most-recent pages in ONE query
(ORDER BY id DESC LIMIT SAMPLE_LIMIT) and scans in memory, with the 60s cap
kept as a backstop. Mirrors the v0.40.9 sampling convention. The estimate
message names the bound when the brain exceeds the sample
("scanned the 1000 most-recent of N pages").

Test: source-grep pins the bounded query + the absence of the per-page
getPage walk.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…112 / 0.42.6.0

Merge churn left intermediate refs: schema.sql + schema-embedded.ts said
"migration v93", CLAUDE.md said "v0.41.32.0 / Migration v109", CHANGELOG said
"Migration v93". Reconciled all to migration v112 / shipping 0.42.6.0. The
CLAUDE.md annotation is also refreshed to describe the final behavior (shared
matcher, source-scoping, alias-by-target, stale-edge reconciliation, bounded
doctor scan) and credit @rayers + @ukd1. Regenerated schema-embedded + llms.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second re-sync onto current master to ship garrytan#972 as v0.42.10.0.

- VERSION/package.json -> 0.42.10.0 (buffer over master's 0.42.8.0 for queue
  collisions, per request).
- CHANGELOG: our 0.42.10.0 entry on top; master's 0.42.8.0/0.42.7.0/0.42.6.0
  history preserved below.
- Migration collision: master claimed v112 (pages_links_extracted_at, garrytan#1696),
  so our links_link_source_widen renumbered v112 -> v113. Full-union CHECK
  preserved (markdown/frontmatter/manual/mentions/wikilink-resolved).
- extract.ts: unioned imports (master's LINK_EXTRACTOR_VERSION_TS/LinkCandidate
  + our buildBasenameIndex/queryBasenameIndex/stripCodeBlocks).
- doctor.test.ts: kept both garrytan#972 link_resolution describe and master's garrytan#1699
  quarantine/flagged describe.
- CLAUDE.md: de-glommed import-file line, kept master's garrytan#1699 entries + our
  garrytan#972 annotation (v0.42.10.0 / migration v113).
- schema.sql comment v112 -> v113; schema-embedded + llms regenerated.
- typecheck clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v0.41.39.0 feat(extract): opt-in global-basename wikilink resolution (closes #972) v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes #972) Jun 2, 2026
…to 800KB

Two full-suite gate failures from the re-sync:
- doctor-categories drift guard: the new `link_resolution_opportunity` check
  wasn't in any category set. Added to BRAIN_CHECK_NAMES (alongside
  graph_coverage / orphan_ratio — it's a graph-quality signal).
- build-llms size budget: the garrytan#972 Key Files annotation (landing with master's
  garrytan#1696/garrytan#1699 waves) pushed llms-full.txt past 750KB. Bumped FULL_SIZE_BUDGET
  750KB→800KB, the established "budget tracks CLAUDE.md's legitimate per-feature
  growth" pattern (600→700→750→800 across releases).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit f09f917 into garrytan:master Jun 2, 2026
rayers added a commit to rayers/gbrain that referenced this pull request Jun 3, 2026
…inks) into local/integration

# Conflicts:
#	src/commands/extract.ts
#	src/core/link-extraction.ts
#	test/link-extraction.test.ts
mgunnin added a commit to mgunnin/gbrain that referenced this pull request Jun 3, 2026
* upstream/master:
  v0.42.23.0 feat(jobs): --nice scheduling-priority flag for jobs work/supervisor (garrytan#1815) (garrytan#1820)
  v0.42.22.0 fix(minions): supervisor progress watchdog + worker DB self-defense — alive-but-wedged worker self-heals (garrytan#1801) (garrytan#1824)
  v0.42.21.0 fix(postgres): module-singleton ownership — canonical landing for the dream-cycle "connect() has not been called" class (garrytan#1404/garrytan#1471/garrytan#1619) (garrytan#1805)
  v0.42.20.0 fix: reliability wave — PGLite capture lock-pin + Postgres reconnect race + search embed-hang (garrytan#1762 garrytan#1745 garrytan#1775) (garrytan#1810)
  v0.42.19.0 fix(skillopt): close the last gap in the AI SDK v6 tool-loop fix (write-capture mapper + regression test) (garrytan#1809)
  v0.42.18.0 fix: sync orphan-pileup watchdog (garrytan#1633) + links-lag µs stamp (garrytan#1768) (garrytan#1807)
  v0.42.17.0 fix(sync): resumable incremental sync — killed mid-import no longer loses progress (garrytan#1794) (garrytan#1808)
  v0.42.16.0 feat(doctor): brain health as a solved problem — cause-ranked doctor + OOM-loop line + auto-drain + pool-reap (garrytan#1685) (garrytan#1802)
  v0.42.15.0 fix: decouple CLI primary output from process.stdout.isTTY (garrytan#1784) (garrytan#1806)
  v0.42.14.0 fix(zero-config): code-* readiness signal + init embedding-key validation + lock self-heal (garrytan#1780) (garrytan#1804)
  v0.42.13.0 fix(search): archive/ content findable by default, demoted not hard-excluded (garrytan#1777) (garrytan#1797)
  v0.42.12.0 feat: self-upgrading gbrain — invocation-riding update check + opt-in auto-upgrade (garrytan#1798)
  v0.42.11.0 feat(skillopt): held-out eval gate, honest receipts, ENFORCE + ablation opts (garrytan#1759)
  v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes garrytan#972) (garrytan#1388)
@ukd1 ukd1 deleted the fix-972-global-basename-wikilinks branch June 7, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: opt-in global-basename wikilink resolution for Obsidian-convention vaults

3 participants