Skip to content

fix(extract): normalize slugs to lowercase via pathToSlug() (T-OBS-1)#736

Closed
Freddy-Cach wants to merge 2 commits intogarrytan:masterfrom
Freddy-Cach:claude/gbrain-extract-lowercase-slugs
Closed

fix(extract): normalize slugs to lowercase via pathToSlug() (T-OBS-1)#736
Freddy-Cach wants to merge 2 commits intogarrytan:masterfrom
Freddy-Cach:claude/gbrain-extract-lowercase-slugs

Conversation

@Freddy-Cach
Copy link
Copy Markdown
Contributor

@Freddy-Cach Freddy-Cach commented May 8, 2026

The extractor was generating from_slug and the allSlugs lookup set from
relPath.replace('.md', '') in 5 places, producing CAPS slugs for files
named ETHOS.md, AGENTS.md, ROADMAP.md, etc.

Pages persist in the DB with lowercase slug (core/sync.ts pathToSlug()
applies .toLowerCase()). The CAPS extractor output mismatched the DB rows,
so INSERT ... JOIN pages ON pages.slug = v.from_slug silently dropped
links from CAPS-named source files. The link batch returned 'inserted'
counts that were lower than the wikilinks actually present, with no error.

Reproduction (in a brain with CAPS-named canonical docs):

  1. echo 'See agents.' > ETHOS.md
  2. gbrain put ethos < ETHOS.md # page row: slug='ethos'
  3. gbrain extract links --source fs
  4. gbrain backlinks agents → [] (expected: contains 'ethos')

Fix: import pathToSlug from core/sync.ts and use it in all 5 sites:

  • extractLinksFromFile (line 200): from_slug derivation
  • runIncrementalExtractInternal (line 456): allSlugs set
  • extractLinksFromDir (line 552): allSlugs set
  • timeline loop (line 643): from_slug for timeline entries
  • extractLinksForSlugs (line 673): allSlugs set used by sync hook

This single-line-per-site change keeps the extractor consistent with the
sync layer's slug normalization and doesn't introduce any new behavior
for already-lowercase paths (idempotent).

Tests: added 'extractLinksFromFile — slug normalization (T-OBS-1
regression)' suite with 4 cases covering CAPS, mixed-case, idempotent
lowercase, and nested path. Full extract suite (54 → 58 tests) passes.


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Freddy-Cach and others added 2 commits May 7, 2026 16:14
… SDK and Voyage's actual contract

The @ai-sdk/openai-compatible package treats Voyage as if it were
OpenAI-shaped, but Voyage's /v1/embeddings endpoint diverges in three places
that combine into a hard-blocking incompatibility:

OUTBOUND request:
  - 'encoding_format=float' (SDK default) is rejected; Voyage only accepts 'base64'
  - 'dimensions' parameter (OpenAI name) is rejected; Voyage uses 'output_dimension'

INBOUND response:
  - With encoding_format=base64, 'embedding' is returned as a base64 string,
    but the SDK's Zod schema (openaiTextEmbeddingResponseSchema) expects an
    'array of number'. The schema fails with 'Invalid JSON response' even
    though the JSON is well-formed.
  - 'usage' lacks 'prompt_tokens'; the schema requires it when usage is present.

Without this patch, ALL embedding requests to Voyage fail. Reproducible by
running 'gbrain put <slug> < text' with embedding_model=voyage:voyage-* and
any current voyage model (voyage-3-large, voyage-3, voyage-4-large).

Solution: pass a custom 'fetch' to createOpenAICompatible only when
recipe.id === 'voyage'. The fetch wrapper:
  1. Forces encoding_format='base64' on outbound (Voyage's only accepted value)
  2. Translates dimensions -> output_dimension on outbound
  3. Drops Content-Length so the runtime recomputes from the mutated body
  4. Decodes base64 embeddings to Float32 arrays on inbound (so the Zod schema
     sees what it expects)
  5. Synthesizes prompt_tokens from total_tokens when missing

This is a minimal, targeted fix. It only activates for Voyage and falls
through cleanly for all other providers. No public API changes.
The extractor was generating from_slug and the allSlugs lookup set from
`relPath.replace('.md', '')` in 5 places, producing CAPS slugs for files
named ETHOS.md, AGENTS.md, ROADMAP.md, etc.

Pages persist in the DB with lowercase slug (core/sync.ts pathToSlug()
applies .toLowerCase()). The CAPS extractor output mismatched the DB rows,
so INSERT ... JOIN pages ON pages.slug = v.from_slug silently dropped
links from CAPS-named source files. The link batch returned 'inserted'
counts that were lower than the wikilinks actually present, with no error.

Reproduction (in a brain with CAPS-named canonical docs):
  1. echo 'See [agents](agents.md).' > ETHOS.md
  2. gbrain put ethos < ETHOS.md  # page row: slug='ethos'
  3. gbrain extract links --source fs
  4. gbrain backlinks agents → []  (expected: contains 'ethos')

Fix: import pathToSlug from core/sync.ts and use it in all 5 sites:
  - extractLinksFromFile (line 200): from_slug derivation
  - runIncrementalExtractInternal (line 456): allSlugs set
  - extractLinksFromDir (line 552): allSlugs set
  - timeline loop (line 643): from_slug for timeline entries
  - extractLinksForSlugs (line 673): allSlugs set used by sync hook

This single-line-per-site change keeps the extractor consistent with the
sync layer's slug normalization and doesn't introduce any new behavior
for already-lowercase paths (idempotent).

Tests: added 'extractLinksFromFile — slug normalization (T-OBS-1
regression)' suite with 4 cases covering CAPS, mixed-case, idempotent
lowercase, and nested path. Full extract suite (54 → 58 tests) passes.

Reported by Claude Code (Opus 4.7) during Obsidian PKM integration on
the gstack-plan Living Repo, where ~111 wikilinks pointing to ETHOS,
AGENTS, ROADMAP, etc. failed to count toward brain_score (54/100 vs
expected 75+/100). Documented as T-OBS-1 in the consumer's blocked.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan
Copy link
Copy Markdown
Owner

Closing — your fix landed in master via the v0.30.3 fix-wave PR #776 (merged at ff53a4c9): "normalize slugs to lowercase via pathToSlug()".

Thank you for the contribution — credit is preserved in the v0.30.3 CHANGELOG entry. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants