Skip to content

fs wikilink extraction should resolve against synced slugified page slugs #874

@Cossackx

Description

@Cossackx

Summary

gbrain extract links --source fs --dir <vault> can discover RAZSOC/Obsidian wikilinks in dry-run mode, but inserts very few or no links when the synced page slugs differ from raw markdown relative paths.

In this vault, synced pages use slugified lowercase path segments, while the extractor appears to resolve links against raw file-ish paths. The result is that GBrain can have thousands of pages and many Obsidian [[wikilinks]], but low link count / high orphan count after native extraction.

Observed impact

On a large Obsidian vault migration:

  • Native extraction dry-run reported thousands of candidate wikilinks.
  • Actual DB link count stayed near baseline because most candidates did not match synced page slugs.
  • A local slugify-aware bridge resolved ~7.6k unique wikilink pairs and restored graph health.

Current post-bridge baseline:

  • Pages: 7,091
  • Links: 7,637
  • Raw wikilinks scanned: 16,480
  • Unique resolved page pairs: 7,615
  • Remaining unresolved: 76
  • Ambiguous: 0

Expected behavior

Native fs wikilink extraction should resolve targets using the same slugification/canonicalization that sync uses for page slugs, including:

  • lowercase path segments
  • spaces -> hyphens
  • stripped punctuation/diacritics
  • basename/title/alias fallback where appropriate

Repro shape

  1. Sync an Obsidian vault where filenames/paths include spaces, mixed case, punctuation, and aliases.
  2. Run gbrain extract links --source fs --dir <vault> --dry-run.
  3. Run native extraction without dry-run.
  4. Compare DB link count / orphan count with the number of detected wikilinks.

Local workaround

I added a RAZSOC-local bridge script that walks markdown, resolves links against GBrain pages.slug, and inserts wikilink edges idempotently. It is intentionally local, but the same resolution logic probably belongs in upstream extraction.

Related PR: #871 fixes a separate large-migration failure where git diff --name-status can exceed Node's default exec buffer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions