v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes #1533) by garrytan · Pull Request #1543 · garrytan/gbrain

garrytan · 2026-05-27T06:38:05Z

Summary

Your reformatted Circleback meeting transcripts now actually parse. Pre-fix the parser reported no_match on every reformatted meeting page even when 78% of lines were valid chat. Post-fix: 113 of 367 Circleback pages parse, 20,167 messages flow through to the fact extractor (was 0).

Two stacked problems:

Detector miss. scorePattern() only looked at the first 10 lines. Meeting pages start with ## Summary + blockquote + ## Transcript (~10 lines of prose) before the actual chat — every pattern scored 0 in the head window, so parseConversation() short-circuited to no_match without ever seeing the transcript.
Missing pattern. None of the 12 built-in patterns recognized the **Speaker** (HH:MM): or **Speaker** (HH:MM:SS): shape that wintermute's meeting-ingestion pipeline produces from Circleback exports.

Fix shape (in 3 commits):

Threshold-gated fallback (commit 8d7a18a, closes bug: conversation-parser no_match on pages with 78% pattern match rate #1533 bug class): SCORING_HEAD_TRIGGER_THRESHOLD = 0.3 triggers a full-body re-score when the head pass falls below 0.3 (NOT just === 0 — Codex consult caught the stray-head-match bug class where a blockquote that accidentally matches an unrelated pattern at 0.1 used to suppress the fallback entirely). SCORING_MIN_ACCEPTANCE = 0.05 is a final-acceptance floor that blocks essay false-positives.
New bold-paren-time built-in pattern (commit 80208f2, closes user-facing half of bug: conversation-parser no_match on pages with 78% pattern match rate #1533): regex captures both (HH:MM) and (HH:MM:SS) sub-shapes via a non-capturing optional seconds group. Date from frontmatter, time treated as wall-clock 24h (the elapsed-time-not-wall-clock limitation is documented as v0.42+ scope).
Version + docs (commits 9b766a1 + 1710020).

Test Coverage

11 new unit cases in test/conversation-parser/parse.test.ts (92 tests pass parser-wide):

[+] src/core/conversation-parser/parse.ts
  ├── scorePattern() — head=10 fast path
  │   ├── [★★★ TESTED] empty/blank/mismatch returns 0
  │   ├── [★★★ TESTED] 10-match + 1-nonmatch scores 1.0 (line 11 ignored)
  │   └── [★★★ TESTED] 9-nonmatch + 1-match at line 10 scores 0.1 (pins cap behavior)
  ├── scorePatternFull() — full-body NEW
  │   └── [★★★ TESTED] empty / preamble+match / preamble-only-no-match
  └── parseConversation() fallback + acceptance gate
      ├── [★★★ TESTED] #1533 IRON-RULE regression (meeting page → regex_match, 20 messages)
      ├── [★★★ TESTED] honest unmatched_line_count after fallback
      ├── [★★★ TESTED] pure-prose 50-line essay stays no_match
      ├── [★★★ TESTED] 300-line preamble + 50 chat lines hits fallback
      ├── [★★★ TESTED] Codex P1 #1 stray-head-match (irc-classic 0.1 doesn't win; imessage-slack wins on full body)
      └── [★★★ TESTED] Codex P1 #2 essay false-positive guard (1/301 score below floor → no_match)

[+] src/core/conversation-parser/builtins.ts
  └── bold-paren-time pattern (Circleback)
      ├── [★★★ TESTED] matches (HH:MM) shape
      ├── [★★★ TESTED] matches (HH:MM:SS) shape
      ├── [★★★ TESTED] imessage-slack still wins on full-datetime overlap
      └── [★★★ TESTED] meeting page with preamble + bold-paren-time hits fallback

COVERAGE: 13/13 paths tested (100%) | All new branches covered
QUALITY: ★★★:13

Tests: 88 → 99 (+11 new). All pass. Typecheck clean.

Pre-Landing Review

Ran /plan-eng-review on the design (CLEAR, 0 issues, 2 user decisions captured) + /codex consult on the plan (10 findings, 7 absorbed, 2 dismissed with rationale, 1 partial). Codex's P1 #1 (trigger-threshold bug class) became D-CODEX-1.B and shaped the threshold-gated fallback. Diff is small (438 ins / 26 del); no anti-patterns (no console.log, no any, no TODO/FIXME).

Plan Completion

All P1/P2 plan items DONE across 3 commits:

T1+T2 (DRY refactor + threshold gates + acceptance floor + scorePatternFull) → 8d7a18a
T3 (7 unit cases for bug: conversation-parser no_match on pages with 78% pattern match rate #1533 + Codex P1 feat: GBrain v0.1.0 — Postgres-native personal knowledge brain #1 + P1 feat: GBrain v0.2.0 — incremental sync, file storage, install skill #2) → 8d7a18a
T5 (cap test reshape per Codex P2 SQLite integration #6) → 8d7a18a
D-FOLLOWUP-1.B (bold-paren-time pattern) → 80208f2
T6 (manual smoke against 3 real Circleback files) → 528 messages verified
T7 (close bug: conversation-parser no_match on pages with 78% pattern match rate #1533 with comments) → done pre-PR
T8 (branch rename for Conductor IRON RULE) → done (garrytan/fix-conversation-parser-line-count → garrytan/worcester-v1)
D-FOLLOWUP-2.B (honest dream-cycle claim + drop T4) → in this PR's CHANGELOG + Context

Plan + decision trail: ~/.claude/plans/system-instruction-you-are-working-starry-frost.md

Empirical Smoke Test

Ran fixed parser against ALL 367 Circleback meetings at ~/git/brain/meetings/*.md:

	Files	Messages flowing through
Pre-fix parsed	0	0
Post-fix parsed	113 (112 via bold-paren-time + 1 telegram-bracket)	20,167
Remain no_match	254	notes-only meetings without inline transcripts

3 representative files: 2026-03-19-yc-partner-strategy-ai-leverage-review.md → 225 messages; 2026-01-15-ro-khanna-c4.md → 294 messages; 2026-04-01-narrative-arc-equity-regrets.md → 9 messages (HH:MM:SS variant).

Pre-existing test flakes (NOT introduced by this branch)

Full-suite local run had 4 pre-existing serial-pass failures that all pass in isolation — confirmed cross-test cycle-lock state pollution + parallel shard OOM (rc=137) on this Mac. Files: test/schema-cli.test.ts, test/cycle-last-full-cycle-at.test.ts, test/dream.test.ts. Zero relation to parser code. Parser tests (92/92) clean.

TODOS

No TODOS.md items completed by this PR.

Documentation

CLAUDE.md src/core/conversation-parser/ entry updated to v0.41.21.0 (13 patterns, threshold gates, empirical numbers).
llms-full.txt regenerated.

Test plan

bun test test/conversation-parser/ — 88 pass
bun test test/conversation-parser/parse.test.ts — 53 pass
bun test test/extract-conversation-facts.test.ts — 27 pass
bun run typecheck — clean
Manual smoke against 3 real Circleback meeting files via the fixed parser (528 messages parsed)
All 367 Circleback meetings re-scanned: 113 parse, 20,167 messages flowing

Closes #1533.

🤖 Generated with Claude Code

…closes #1533) `gbrain conversation-parser scan` reported `phase: no_match` on meeting pages where 175 of 226 lines (77.8%) were valid `imessage-slack` format. The 36 reformatted Circleback meetings could not flow through the conversation facts pipeline. Root cause: `scorePattern` only scans the first 10 non-blank lines. A meeting page's `## Summary` + blockquote + `## Transcript` preamble takes all 10 head slots, so every pattern scored 0 and the orchestrator short-circuited to `no_match` without ever seeing the transcript. Fix: two-tier scoring with threshold gates. 1. Fast path unchanged: chat-only pages match on line 1, scoring 1.0, skipping the fallback entirely. 2. Full-body fallback fires when `top.score < SCORING_HEAD_TRIGGER_THRESHOLD` (0.3). NOT `=== 0` — Codex P1 #1 caught the bug class where a stray head match (blockquote that accidentally matches an unrelated pattern at 0.1) would suppress the fallback. 0.3 leaves the fast path untouched while triggering on any preamble-dominated page. 3. Minimum acceptance floor `SCORING_MIN_ACCEPTANCE` (0.05) prevents essay false positives: a 300-line essay with one stray `**Name** (date time):` line scores ~0.003 — without the floor it would flip to `regex_match` with `messages.length = 1`. Closes Codex P1 #2. DRY refactor: extract `getNonBlankLines` + `scoreFromLines` so the quick_reject + regex loop lives in one place. New exported `scorePatternFull` for direct unit testing. Fallback pre-splits the body ONCE per pass to avoid 12 redundant splits. Plan + decisions + Codex consult absorption at: ~/.claude/plans/system-instruction-you-are-working-starry-frost.md Tests: 10 new cases in test/conversation-parser/parse.test.ts (87 pass). Highlights: - #1533 IRON-RULE regression pin (meeting page → regex_match, imessage-slack, 20 messages) - Stray-head-match guard (Codex P1 #1: irc-classic 0.1 in head does not suppress fallback; imessage-slack wins on full body) - Essay false-positive guard (Codex P1 #2: 1/301 score below acceptance floor stays no_match) - 300-line preamble + 50 chat lines hits fallback - Cap test reshaped (Codex P2 #6): pins behavior not constant value Once landed and a brain has `cycle.conversation_facts_backfill.enabled = true` (opt-in), the 36 Circleback meetings flow through the fact extractor automatically. Operators on the manual path run `gbrain extract-conversation-facts <source>` directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…es user-facing half of #1533) Per /codex follow-up D-FOLLOWUP-1.B: the threshold-gated fallback fix (8d7a18a) closed the bug CLASS — but the user's actual 112 Circleback meeting files at ~/git/brain/meetings/ use a transcript shape that no existing built-in pattern matches: **Participant 2** (00:00): Companies that we... ← (HH:MM) **Participant 1** (00:00:00): We found the... ← (HH:MM:SS) Without a pattern that matches this shape, even the fallback re-scoring all 12 candidates against the full body returned zero matches → no_match. Add `bold-paren-time` as the 13th built-in. Two sub-shapes covered via non-capturing optional seconds group; capture indexes stay identical. date_source: frontmatter so the page's `date:` provides the day anchor. Time semantics: Circleback timestamps are elapsed-time-from-meeting- start, not wall-clock. Parser treats them as wall-clock 24h on the frontmatter date, so every message lands on the same day at HH:MM. The fact extractor only cares about speaker + content, so this is honest enough; precise per-line wall-clock would need an elapsed_time flag on PatternEntry (v0.42+ scope). Declaration position: after imessage-slack and telegram-bracket so on the rare tie those more-specific patterns win. The regex requires `\)` immediately after the time group, so imessage-slack's `(2024-03-15 9:00 AM)` shape falls through correctly. EMPIRICAL RESULT against all Circleback meetings in ~/git/brain/meetings: - 367 total files with `source: circleback` - Pre-fix: 0 parsed (no pattern matched the shape) - Post-fix: 113 parsed (112 via bold-paren-time, 1 via telegram-bracket) - 20,167 messages flow through to the fact extractor (was 0) - 254 remain no_match (notes-only meetings without inline transcripts — transcripts in those cases live in separate files referenced via blockquote, not in the meeting body) Smoke-tested manually against 3 representative files: - 2026-03-19-yc-partner-strategy-ai-leverage-review.md → 225 messages - 2026-01-15-ro-khanna-c4.md → 294 messages - 2026-04-01-narrative-arc-equity-regrets.md → 9 messages (HH:MM:SS variant) Tests: 54 unit cases pass (88 across full parser suite). 3 new cases in parse.test.ts pin the contract: (HH:MM) shape matches, (HH:MM:SS) shape matches, imessage-slack still wins on full-datetime overlap, and meeting page with preamble + bold-paren-time transcript hits the threshold fallback correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sation-parser-line-count

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eshold gates) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…+ CLAUDE.md CI verify failed on check:privacy — the new bold-paren-time pattern added in 80208f2 referenced the private OpenClaw fork name in builtins.ts:125 (comment) and builtins.ts:178 (source_doc), and the CLAUDE.md doc-sync commit 1710020 leaked it once more. CLAUDE.md privacy rule (line 550): the private fork name is banned in CHANGELOG.md, README.md, docs/, skills/, PR titles + bodies, commit messages, and comments in checked-in code. Canonical replacement: "your OpenClaw" or "OpenClaw reference deployment". This commit rewrites all three sites. Source pipeline attribution stays accurate ("OpenClaw meeting-ingestion pipeline reformat of Circleback transcripts") without naming the specific private fork. bun run verify: 28/28 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md # VERSION # package.json

Queue reservation by user — 0.41.22.0 / 0.41.23.0 slots left for sibling worktrees. Bumps VERSION, package.json, CHANGELOG header, and the CLAUDE.md entry's version tag in lockstep. llms-full.txt regenerated. bun run verify: 28/28 green. Parser tests: 92/92. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md # VERSION # package.json

* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)

garrytan and others added 8 commits May 26, 2026 22:50

Merge remote-tracking branch 'origin/master' into garrytan/fix-conver…

fa3f743

…sation-parser-line-count

chore: bump version and changelog (v0.41.21.0)

9b766a1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs: bump conversation-parser entry to v0.41.21.0 (13 patterns + thr…

1710020

…eshold gates) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into garrytan/worcester-v1

fa861f9

# Conflicts: # CHANGELOG.md # VERSION # package.json

garrytan added 3 commits May 27, 2026 07:03

Merge remote-tracking branch 'origin/master' into garrytan/worcester-v1

4ba658e

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/worcester-v1

2677462

# Conflicts: # CHANGELOG.md # VERSION # package.json

Merge remote-tracking branch 'origin/master' into garrytan/worcester-v1

36e7af9

# Conflicts: # CHANGELOG.md # VERSION # package.json

garrytan merged commit 726dfff into master May 27, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes #1533)#1543

v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes #1533)#1543
garrytan merged 11 commits into
masterfrom
garrytan/worcester-v1

garrytan commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant