Skip to content

code-stage sync writes pages with malformed frontmatter (MISSING_OPEN); reindex-code reports 'No code pages to reindex' #712

@metaWuming

Description

@metaWuming

What happened?

After running code-stage sync against 7 separate repos, gbrain doctor reports 1237 frontmatter MISSING_OPEN issues across all 7 code sources. As a downstream consequence, gbrain reindex-code --yes returns "No code pages to reindex" even though the sync output explicitly says pages were imported and embedded.

Inspection: pages imported from code-stage sync end up with type=note in the brain (presumably the default fallback when frontmatter parsing fails to find a type), instead of type=code — so reindex-code (which presumably filters on type=code) finds nothing to do.

What did you expect?

  1. Code-stage sync should write valid frontmatter with opening --- delimiter so the parser recognises type: code.
  2. Pages imported via --strategy code should be queryable via reindex-code.
  3. Doctor's frontmatter_integrity should not warn about content the tool itself wrote.

Steps to reproduce

  1. cd <git-repo-with-origin>
  2. gbrain sources add my-code --path . --federated && gbrain sync --strategy code --source my-code
  3. Sync reports e.g. 17 file(s) imported, 75 chunks, 17 pages embedded
  4. Run gbrain doctor --json | jq '.checks[] | select(.name=="frontmatter_integrity")' → see MISSING_OPEN count for that source
  5. Run gbrain reindex-code --yes"No code pages to reindex"
  6. Run gbrain stats → page-type breakdown shows new pages classified as note, not code

Across my 7 code sources I get:

Source MISSING_OPEN
gstack-code-tixtrack-pro-822357 366
gstack-code-iva-ticketing-140d80 446
gstack-code-tix-workspace-b6a666 82
gstack-code-il-mcp-server-721c7a 90
gstack-code-garrytan-gbrain 236 (also 2 NESTED_QUOTES)
gstack-code-ocial-content-dcd58d 12
gstack-code-nal-assistant-5c840f 4
gstack-code-dev-research-d88794 1
Total 1237

Likely root cause (guess from outside)

The code page writer probably emits something like:

type: code
language: typescript
---
<file body>

…missing the leading --- line that opens the YAML frontmatter block. The parser then can't find an opening delimiter, falls back to treating the whole thing as body, and the type: code line ends up as content rather than metadata. frontmatter validate --fix workaround is mentioned in the doctor output but I haven't tried it yet.

Environment

  • gbrain version: 0.28.6
  • OS: macOS Darwin 25.3.0
  • Bun version: 1.3.12
  • Database: PGLite

gbrain doctor --json excerpt

{
  "name": "frontmatter_integrity",
  "status": "warn",
  "message": "1237 frontmatter issue(s) across 8 source(s). gstack-code-dev-research-d88794: 1 (MISSING_OPEN=1); gstack-code-garrytan-gbrain: 236 (MISSING_OPEN=234, NESTED_QUOTES=2); gstack-code-il-mcp-server-721c7a: 90 (MISSING_OPEN=90); gstack-code-iva-ticketing-140d80: 446 (MISSING_OPEN=446); gstack-code-nal-assistant-5c840f: 4 (MISSING_OPEN=4); gstack-code-ocial-content-dcd58d: 12 (MISSING_OPEN=12); gstack-code-tix-workspace-b6a666: 82 (MISSING_OPEN=82); gstack-code-tixtrack-pro-822357: 366 (MISSING_OPEN=366). Fix: gbrain frontmatter validate <source-path> --fix"
}

Possible fix path

  • Add ---\n prefix to whatever code-stage sync's page-write helper emits before the YAML block
  • After fix, run gbrain frontmatter validate <each-source> --fix on existing brains to retro-fix
  • After retro-fix, reindex-code should find these pages and presumably move them to type=code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions