v0.41.29.0 feat(conversation-parser): bold-name-no-time builtin + fix(orphans): source-scoped orphan_ratio (supersedes #1613)#1620
Merged
Merged
Conversation
…Granola/Zoom, no timestamp) The 14th built-in pattern parses `**Speaker:** text` transcripts with NO per-line timestamp — the shape Circleback / Granola / Zoom emit. Every prior builtin required a time anchor, so this shape matched nothing: a production brain had 104 conversation pages + 3,423 eligible pages silently extracting zero facts. Messages anchor at T00:00:00Z of the frontmatter date (no fabricated wall-clock; line order preserves sequence), same convention as irc-classic. Hardening beyond the original community proposal: - regex `/^\*\*(?!\[)(.+?):\*\*\s*(.*)$/`: the colon-inside-bold (NOT declaration order) is what prevents shadowing bold-paren-time; the `(?!\[)` lookahead rejects telegram-bracket `**[18:37] Name:**` so disabling telegram-bracket yields an honest no_match instead of speaker="[18:37] Name". - new optional PatternEntry.score_full_body: `**Label:** text` is a common prose idiom, so a notes page with bold labels clustered in its first 10 lines scored 0.3 on the head pass (NOT < SCORING_HEAD_TRIGGER_THRESHOLD, so the full-body fallback never fired) and cleared the 0.05 floor. parse.ts now recomputes the winner's score over the full body before the floor, so such a page drops to its true low density and stays no_match. - scrubbed pre-existing real names from bold-paren-time test_positive samples (privacy rule). Fixtures use placeholder names only. Pinned by new bold-name-no-time + clustered-head no_match cases in parse.test.ts and the eval corpus. Co-Authored-By: garrytan-agents <noreply@github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…linkable denominator
`gbrain doctor --source <id>` and `gbrain orphans --source <id>` now scope
the orphan scan to that source instead of reporting brain-wide. Three fixes:
- findOrphanPages(opts?: { sourceId?, sourceIds? }) on both engines scopes the
CANDIDATE set (scalar `= $1` or federated `= ANY($1::text[])`). Inbound links
from ANY source still count, so a page in source X linked FROM source Y is
reachable and NOT an orphan of X (the deliberate, less-surprising definition).
- corrected the total_linkable denominator in findOrphans: it now enumerates
all live pages (scoped) and subtracts every excluded-by-slug page, not just
excluded orphans. The old `total - excludedOrphans` left excluded NON-orphan
pages (templates/, scratch/) with inbound links in the denominator, inflating
it and suppressing warnings. Changes orphan_ratio output for every brain, in
the accurate direction.
- the find_orphans MCP op threads sourceScopeOpts(ctx), closing a cross-source
read leak where a source-bound OAuth client saw brain-wide orphans (v0.34.1
source-isolation class).
doctor uses an explicit `--source` flag parse (NOT resolveSourceWithTier, which
would scope bare invocations to a default), and under explicit --source reports
the ratio with a low-scale caveat below 100 entity pages instead of a vacuous
"ok". Thin-client doctor --source orphan_ratio deferred (TODOS.md).
Pinned by test/orphans-source-scope.test.ts (PGLite: scoping, cross-source
inbound, denominator, find_orphans op scope) + a Postgres↔PGLite parity case
in test/e2e/engine-parity.test.ts (scalar + federated binding).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
VERSION + package.json → 0.41.29.0; CHANGELOG entry; CLAUDE.md conversation-parser (13→14 patterns) + orphans source-scoping notes; regenerated llms bundles; TODOS for thin-client doctor --source + check-test-real-names widening. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
Jun 3, 2026
* upstream/master: v0.41.29.0 feat(conversation-parser): bold-name-no-time builtin + fix(orphans): source-scoped orphan_ratio (supersedes garrytan#1613) (garrytan#1620) v0.41.27.0 fix: withRetry self-heals on null singleton + facts:absorb drain + disconnect audit (closes garrytan#1570) (garrytan#1608) v0.41.27.0 fix(doctor): git-aware sync_freshness (supersedes garrytan#1564) (garrytan#1573)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Supersedes #1613 (re-homed from the fork so CI can run with secrets; real names scrubbed for the privacy rule; orphan_ratio
--sourcescoping bundled in).What ships (two fixes, 3 bisect-friendly commits)
1.
bold-name-no-timeconversation parser builtin (the 14th)Parses
**Speaker:** texttranscripts with no per-line timestamp — the shape Circleback / Granola / Zoom emit. Every prior builtin required a time anchor, so this shape matched nothing: a production brain had 104 conversation pages + 3,423 eligible pages silently extracting zero conversation-facts. This is the unlock for whole-brain conversation-facts extraction.T00:00:00Zof the frontmatter date (no fabricated wall-clock; line order preserves sequence), same convention asirc-classic./^\*\*(?!\[)(.+?):\*\*\s*(.*)$/: the colon-inside-bold (not declaration order) prevents shadowingbold-paren-time; the(?!\[)lookahead rejects telegram-bracket**[18:37] Name:**so disabling telegram-bracket yields an honestno_matchinstead ofspeaker="[18:37] Name".PatternEntry.score_full_body:**Label:** textis a common prose idiom, so a notes page with bold labels clustered in its first 10 lines scored 0.3 on the head pass (NOT< SCORING_HEAD_TRIGGER_THRESHOLD, so the full-body fallback never fired) and cleared the 0.05 floor.parse.tsnow recomputes the winner over the full body before the floor, so such a page staysno_match.bold-paren-timetest samples (privacy rule).2. orphan_ratio /
find_orphanssource scopinggbrain doctor --source <id>andgbrain orphans --source <id>now scope to that source instead of reporting brain-wide.findOrphanPages(opts?: { sourceId?, sourceIds? })on both engines scopes the candidate set (scalar= $1/ federated= ANY($1::text[])). Cross-source inbound links still count, so a page in X linked from Y is reachable (not an orphan of X).total_linkabledenominator: excluded pages (templates/, scratch/) that have inbound links no longer inflate it and suppress warnings. Changes orphan_ratio output for every brain, in the accurate direction.find_orphansMCP op threadssourceScopeOpts(ctx), closing a cross-source read leak for source-bound OAuth clients (v0.34.1 source-isolation class).--sourcebelow 100 entity pages,orphan_ratioreports the ratio with a low-scale caveat instead of a vacuous "ok". Thin-clientdoctor --sourcedeferred (TODOS.md).Review
/plan-eng-review(cleared) +/codexoutside-voice (8 findings, 7 actioned + 1 confirmed-good). Codex caught thescore_full_bodyfloor gap, the bracket-timestamp mis-capture, the pre-existing denominator bug, and the find_orphans MCP leak.Tests
bun run verify— 29 checks green (typecheck, privacy, conversation-parser eval).doctor-orphan-ratio+orphan-reductionE2E): 73 pass.engine-parityon real Postgres (scalar + federated parity): 11 pass.🤖 Generated with Claude Code