v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes #1533)#1543
Merged
Conversation
…closes #1533) `gbrain conversation-parser scan` reported `phase: no_match` on meeting pages where 175 of 226 lines (77.8%) were valid `imessage-slack` format. The 36 reformatted Circleback meetings could not flow through the conversation facts pipeline. Root cause: `scorePattern` only scans the first 10 non-blank lines. A meeting page's `## Summary` + blockquote + `## Transcript` preamble takes all 10 head slots, so every pattern scored 0 and the orchestrator short-circuited to `no_match` without ever seeing the transcript. Fix: two-tier scoring with threshold gates. 1. Fast path unchanged: chat-only pages match on line 1, scoring 1.0, skipping the fallback entirely. 2. Full-body fallback fires when `top.score < SCORING_HEAD_TRIGGER_THRESHOLD` (0.3). NOT `=== 0` — Codex P1 #1 caught the bug class where a stray head match (blockquote that accidentally matches an unrelated pattern at 0.1) would suppress the fallback. 0.3 leaves the fast path untouched while triggering on any preamble-dominated page. 3. Minimum acceptance floor `SCORING_MIN_ACCEPTANCE` (0.05) prevents essay false positives: a 300-line essay with one stray `**Name** (date time):` line scores ~0.003 — without the floor it would flip to `regex_match` with `messages.length = 1`. Closes Codex P1 #2. DRY refactor: extract `getNonBlankLines` + `scoreFromLines` so the quick_reject + regex loop lives in one place. New exported `scorePatternFull` for direct unit testing. Fallback pre-splits the body ONCE per pass to avoid 12 redundant splits. Plan + decisions + Codex consult absorption at: ~/.claude/plans/system-instruction-you-are-working-starry-frost.md Tests: 10 new cases in test/conversation-parser/parse.test.ts (87 pass). Highlights: - #1533 IRON-RULE regression pin (meeting page → regex_match, imessage-slack, 20 messages) - Stray-head-match guard (Codex P1 #1: irc-classic 0.1 in head does not suppress fallback; imessage-slack wins on full body) - Essay false-positive guard (Codex P1 #2: 1/301 score below acceptance floor stays no_match) - 300-line preamble + 50 chat lines hits fallback - Cap test reshaped (Codex P2 #6): pins behavior not constant value Once landed and a brain has `cycle.conversation_facts_backfill.enabled = true` (opt-in), the 36 Circleback meetings flow through the fact extractor automatically. Operators on the manual path run `gbrain extract-conversation-facts <source>` directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…es user-facing half of #1533) Per /codex follow-up D-FOLLOWUP-1.B: the threshold-gated fallback fix (8d7a18a) closed the bug CLASS — but the user's actual 112 Circleback meeting files at ~/git/brain/meetings/ use a transcript shape that no existing built-in pattern matches: **Participant 2** (00:00): Companies that we... ← (HH:MM) **Participant 1** (00:00:00): We found the... ← (HH:MM:SS) Without a pattern that matches this shape, even the fallback re-scoring all 12 candidates against the full body returned zero matches → no_match. Add `bold-paren-time` as the 13th built-in. Two sub-shapes covered via non-capturing optional seconds group; capture indexes stay identical. date_source: frontmatter so the page's `date:` provides the day anchor. Time semantics: Circleback timestamps are elapsed-time-from-meeting- start, not wall-clock. Parser treats them as wall-clock 24h on the frontmatter date, so every message lands on the same day at HH:MM. The fact extractor only cares about speaker + content, so this is honest enough; precise per-line wall-clock would need an elapsed_time flag on PatternEntry (v0.42+ scope). Declaration position: after imessage-slack and telegram-bracket so on the rare tie those more-specific patterns win. The regex requires `\)` immediately after the time group, so imessage-slack's `(2024-03-15 9:00 AM)` shape falls through correctly. EMPIRICAL RESULT against all Circleback meetings in ~/git/brain/meetings: - 367 total files with `source: circleback` - Pre-fix: 0 parsed (no pattern matched the shape) - Post-fix: 113 parsed (112 via bold-paren-time, 1 via telegram-bracket) - 20,167 messages flow through to the fact extractor (was 0) - 254 remain no_match (notes-only meetings without inline transcripts — transcripts in those cases live in separate files referenced via blockquote, not in the meeting body) Smoke-tested manually against 3 representative files: - 2026-03-19-yc-partner-strategy-ai-leverage-review.md → 225 messages - 2026-01-15-ro-khanna-c4.md → 294 messages - 2026-04-01-narrative-arc-equity-regrets.md → 9 messages (HH:MM:SS variant) Tests: 54 unit cases pass (88 across full parser suite). 3 new cases in parse.test.ts pin the contract: (HH:MM) shape matches, (HH:MM:SS) shape matches, imessage-slack still wins on full-datetime overlap, and meeting page with preamble + bold-paren-time transcript hits the threshold fallback correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sation-parser-line-count
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eshold gates) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ CLAUDE.md CI verify failed on check:privacy — the new bold-paren-time pattern added in 80208f2 referenced the private OpenClaw fork name in builtins.ts:125 (comment) and builtins.ts:178 (source_doc), and the CLAUDE.md doc-sync commit 1710020 leaked it once more. CLAUDE.md privacy rule (line 550): the private fork name is banned in CHANGELOG.md, README.md, docs/, skills/, PR titles + bodies, commit messages, and comments in checked-in code. Canonical replacement: "your OpenClaw" or "OpenClaw reference deployment". This commit rewrites all three sites. Source pipeline attribution stays accurate ("OpenClaw meeting-ingestion pipeline reformat of Circleback transcripts") without naming the specific private fork. bun run verify: 28/28 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md # VERSION # package.json
Queue reservation by user — 0.41.22.0 / 0.41.23.0 slots left for sibling worktrees. Bumps VERSION, package.json, CHANGELOG header, and the CLAUDE.md entry's version tag in lockstep. llms-full.txt regenerated. bun run verify: 28/28 green. Parser tests: 92/92. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md # VERSION # package.json
# Conflicts: # CHANGELOG.md # VERSION # package.json
# Conflicts: # CHANGELOG.md # VERSION # package.json
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: v0.41.26.1 fix: lock-renewal cathedral — closes ~39 worker crashes/day (supersedes garrytan#1567) (garrytan#1572) v0.41.26.0 fix: dream --source + ingest junk titles + emoji-crash (supersedes garrytan#1559, garrytan#1561) (garrytan#1571) v0.41.25.0 perf(sync): batched deletes + global page-generation clock (supersedes garrytan#1538) (garrytan#1566) v0.41.24.0 fix(conversation-parser): threshold gates + bold-paren-time pattern — 20,167 Circleback messages unblocked (closes garrytan#1533) (garrytan#1543) v0.41.23.0 feat: extract operator surfaces + pack-driven extractables (garrytan#1541) v0.41.22.1 feat: brainstorm/lsd judge fixes (closes garrytan#1540 end-to-end) (garrytan#1562) v0.41.22.0 feat: type-unification cathedral — 94 types → 15 canonical (closes garrytan#1479) (garrytan#1542) v0.41.21.0 feat(ops): 5 daily-driver pains fixed in one wave (garrytan#1545) v0.41.20.0 feat: gbrain status + doctor --scope=brain (fix wave 2: items garrytan#6 + garrytan#7) (garrytan#1544) feat: v0.41.19.0 Supavisor Retry Cathedral (garrytan#1537) v0.41.18.0: gbrain onboard — the activation surface gbrain didn't have before (garrytan#1521) v0.41.17.0 feat: --workers N on every bulk command + facts dim doctor parity (garrytan#1519) v0.41.16.0 feat: conversation parser cathedral + progressive-batch primitive (closes garrytan#1461) (garrytan#1510) v0.41.15.0 feat(sync): --timeout + --max-age + partial status (closes garrytan#1472 RFC) (garrytan#1506)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Your reformatted Circleback meeting transcripts now actually parse. Pre-fix the parser reported
no_matchon every reformatted meeting page even when 78% of lines were valid chat. Post-fix: 113 of 367 Circleback pages parse, 20,167 messages flow through to the fact extractor (was 0).Two stacked problems:
scorePattern()only looked at the first 10 lines. Meeting pages start with## Summary+ blockquote +## Transcript(~10 lines of prose) before the actual chat — every pattern scored 0 in the head window, soparseConversation()short-circuited tono_matchwithout ever seeing the transcript.**Speaker** (HH:MM):or**Speaker** (HH:MM:SS):shape that wintermute's meeting-ingestion pipeline produces from Circleback exports.Fix shape (in 3 commits):
SCORING_HEAD_TRIGGER_THRESHOLD = 0.3triggers a full-body re-score when the head pass falls below 0.3 (NOT just=== 0— Codex consult caught the stray-head-match bug class where a blockquote that accidentally matches an unrelated pattern at 0.1 used to suppress the fallback entirely).SCORING_MIN_ACCEPTANCE = 0.05is a final-acceptance floor that blocks essay false-positives.bold-paren-timebuilt-in pattern (commit 80208f2, closes user-facing half of bug: conversation-parser no_match on pages with 78% pattern match rate #1533): regex captures both(HH:MM)and(HH:MM:SS)sub-shapes via a non-capturing optional seconds group. Date from frontmatter, time treated as wall-clock 24h (the elapsed-time-not-wall-clock limitation is documented as v0.42+ scope).Test Coverage
11 new unit cases in
test/conversation-parser/parse.test.ts(92 tests pass parser-wide):Tests: 88 → 99 (+11 new). All pass. Typecheck clean.
Pre-Landing Review
Ran
/plan-eng-reviewon the design (CLEAR, 0 issues, 2 user decisions captured) +/codex consulton the plan (10 findings, 7 absorbed, 2 dismissed with rationale, 1 partial). Codex's P1 #1 (trigger-threshold bug class) became D-CODEX-1.B and shaped the threshold-gated fallback. Diff is small (438 ins / 26 del); no anti-patterns (noconsole.log, noany, no TODO/FIXME).Plan Completion
All P1/P2 plan items DONE across 3 commits:
garrytan/fix-conversation-parser-line-count→garrytan/worcester-v1)Plan + decision trail:
~/.claude/plans/system-instruction-you-are-working-starry-frost.mdEmpirical Smoke Test
Ran fixed parser against ALL 367 Circleback meetings at
~/git/brain/meetings/*.md:3 representative files:
2026-03-19-yc-partner-strategy-ai-leverage-review.md→ 225 messages;2026-01-15-ro-khanna-c4.md→ 294 messages;2026-04-01-narrative-arc-equity-regrets.md→ 9 messages (HH:MM:SS variant).Pre-existing test flakes (NOT introduced by this branch)
Full-suite local run had 4 pre-existing serial-pass failures that all pass in isolation — confirmed cross-test cycle-lock state pollution + parallel shard OOM (rc=137) on this Mac. Files:
test/schema-cli.test.ts,test/cycle-last-full-cycle-at.test.ts,test/dream.test.ts. Zero relation to parser code. Parser tests (92/92) clean.TODOS
No TODOS.md items completed by this PR.
Documentation
src/core/conversation-parser/entry updated to v0.41.21.0 (13 patterns, threshold gates, empirical numbers).llms-full.txtregenerated.Test plan
bun test test/conversation-parser/— 88 passbun test test/conversation-parser/parse.test.ts— 53 passbun test test/extract-conversation-facts.test.ts— 27 passbun run typecheck— cleanCloses #1533.
🤖 Generated with Claude Code