Skip to content

fix(frontmatter): defense-in-depth against JSON-style arrays in YAML#1238

Closed
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:fix/frontmatter-json-array-guard
Closed

fix(frontmatter): defense-in-depth against JSON-style arrays in YAML#1238
garrytan-agents wants to merge 1 commit into
garrytan:masterfrom
garrytan-agents:fix/frontmatter-json-array-guard

Conversation

@garrytan-agents

Copy link
Copy Markdown
Contributor

Problem

The #1 source of NESTED_QUOTES frontmatter errors is JSON-style arrays in YAML:

# This is what LLMs and ingestion code produce:
tags: ["yc", "w2025"]

# This is what YAML needs:
tags: ['yc', 'w2025']

JSON.stringify() wraps values in double quotes. When code does tags: [${items.map(t => JSON.stringify(t)).join(', ')}], it produces broken YAML. This caused 6,981 validation errors across a 105K-page brain — the single largest contributor to the frontmatter integrity health score.

Fix: Four Layers

1. Auto-fix on frontmatter validate --fix (brain-writer.ts)

New step 3a in autoFixFrontmatter() detects JSON-style arrays and rewrites them to single-quoted YAML. Handles apostrophes by falling back to double quotes.

2. Better detection in validator (markdown.ts)

NESTED_QUOTES detection now has two sub-patterns:

  • 5a: JSON-style arrays (clearer message: "use single quotes")
  • 5b: Original nested scalar quotes

3. Auto-normalize on put_page (operations.ts)

Every put_page call now runs autoFixFrontmatter() on incoming content before import. Non-blocking — if normalization throws, original content passes through. Agent-written pages with JSON arrays are silently fixed on write.

4. Agent guidance (frontmatter-guard SKILL.md)

New "Prevention" section with correct/incorrect YAML examples, explaining WHY JSON.stringify causes the bug and what to do instead.

Companion PRs

Impact

  • Prevents ~7K validation errors per brain
  • Silently heals agent-written pages on ingest
  • Teaches agents to avoid the pattern in the first place

Three layers to stop NESTED_QUOTES from recurring:

1. **autoFixFrontmatter (brain-writer.ts):** New step 3a detects and
   rewrites JSON-style arrays (`["x", "y"]` → `['x', 'y']`) before
   the existing nested-quote scalar fix. Handles apostrophes in values
   by falling back to double quotes. Runs on `frontmatter validate --fix`
   and `writeBrainPage({autoFix: true})`.

2. **Validator (markdown.ts):** NESTED_QUOTES detection now has two
   sub-patterns — 5a catches JSON-style arrays specifically (with a
   clearer error message: "use single quotes") and 5b catches the
   original nested scalar quotes.

3. **put_page normalization (operations.ts):** Every `put_page` call now
   runs `autoFixFrontmatter()` on incoming content before import.
   Non-blocking — if normalization throws, original content is used.
   This means agent-written pages with JSON arrays are silently fixed
   on write instead of accumulating thousands of validation errors.

4. **Agent guidance (frontmatter-guard SKILL.md):** New "Prevention"
   section with correct/incorrect YAML examples, explaining WHY
   JSON.stringify causes the bug and what to do instead. Agents that
   read this skill before writing frontmatter will avoid the pattern.

Root cause: LLMs and ingestion code use JSON.stringify for YAML array
items, producing `tags: ["yc", "w2025"]` which breaks YAML parsing.
This caused 6,981 errors across a 105K-page brain.

Companion to PR garrytan#1217 (serializer fix in frontmatter-inference.ts).
@garrytan

Copy link
Copy Markdown
Owner

Superseded by #1252 (v0.37.6.0 wave).

This PR's four-layer defense-in-depth was reviewed against the already-merged PR #1229 (validator fix, shipped as v0.37.5.0) and absorbed into the wave with two changes:

Kept (with refinements):

  • ✅ Layer 1 (auto-fix engine): narrowed allow-list to tags: / aliases: keys only. The original broad regex ([A-Za-z_][\w-]*) would have rewritten typed-numeric arrays like scores: ["1", "2"] into string arrays — caught by codex outside-voice review.
  • ✅ Layer 1: shared nestedQuotesFixed dedup gate with existing step 3 so a file with both JSON-array AND nested-scalar rewrites surfaces as ONE NESTED_QUOTES audit entry, not two.
  • ✅ Layer 4 (SKILL.md Prevention section): absorbed verbatim with v0.37.5.0-aware framing.

Dropped:

Thank you @garrytan-agents — Layer 1 + Layer 4 + the original framing made it into v0.37.6.0 via #1252. Attribution preserved via Co-Authored-By: trailer on the wave commit.

@garrytan garrytan closed this May 21, 2026
garrytan added a commit that referenced this pull request May 21, 2026
Aligns the auto-fix engine, the inferred-frontmatter serializer, and the
agent-facing skill on a single canonical YAML shape for tag arrays. v0.37.5.0
fixed the validator (it stopped flagging valid YAML); this release lines up
everything else with that fix.

Layer 1 (brain-writer.ts step 3a): allow-listed to `tags:` / `aliases:` keys.
Rewrites `tags: ["yc"]` to `tags: ['yc']`; apostrophe fallback for
`"Men's Fashion"`. Shares a NESTED_QUOTES dedup gate with the existing
step 3 so one file with both rewrites surfaces as one audit entry, not two.

Layer 4 (frontmatter-inference.ts): serializer emits the same canonical
single-quote form by default. Inferred frontmatter on import and `--fix`
output now match byte-for-byte.

Layer 5 (frontmatter-guard SKILL.md): new "Prevention" section showing
canonical vs JSON-style arrays + the JSON.stringify trap that produces
the non-canonical form. Future agent writes start canonical.

Parity test added to markdown-validation.test.ts pinning agreement between
per-value safeLoad parsing and gray-matter full-document parse on the
load-bearing inputs.

PR #1238's "Layer 3" (put_page auto-normalization) was dropped during
plan review: put_page parses YAML into typed fields and hashes them, so
single-quoted vs double-quoted arrays are functionally identical in
storage. The fix lives where the writes happen, not on the read path.

Source PRs absorbed: #1217 (closed, serializer fix) + #1238 (closed,
four-layer defense-in-depth narrowed to three layers). PR #1229 already
merged as v0.37.5.0.

Co-Authored-By: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan added a commit that referenced this pull request May 21, 2026
…ays (#1252)

Aligns the auto-fix engine, the inferred-frontmatter serializer, and the
agent-facing skill on a single canonical YAML shape for tag arrays. v0.37.5.0
fixed the validator (it stopped flagging valid YAML); this release lines up
everything else with that fix.

Layer 1 (brain-writer.ts step 3a): allow-listed to `tags:` / `aliases:` keys.
Rewrites `tags: ["yc"]` to `tags: ['yc']`; apostrophe fallback for
`"Men's Fashion"`. Shares a NESTED_QUOTES dedup gate with the existing
step 3 so one file with both rewrites surfaces as one audit entry, not two.

Layer 4 (frontmatter-inference.ts): serializer emits the same canonical
single-quote form by default. Inferred frontmatter on import and `--fix`
output now match byte-for-byte.

Layer 5 (frontmatter-guard SKILL.md): new "Prevention" section showing
canonical vs JSON-style arrays + the JSON.stringify trap that produces
the non-canonical form. Future agent writes start canonical.

Parity test added to markdown-validation.test.ts pinning agreement between
per-value safeLoad parsing and gray-matter full-document parse on the
load-bearing inputs.

PR #1238's "Layer 3" (put_page auto-normalization) was dropped during
plan review: put_page parses YAML into typed fields and hashes them, so
single-quoted vs double-quoted arrays are functionally identical in
storage. The fix lives where the writes happen, not on the read path.

Source PRs absorbed: #1217 (closed, serializer fix) + #1238 (closed,
four-layer defense-in-depth narrowed to three layers). PR #1229 already
merged as v0.37.5.0.

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants