v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval)#1296
Merged
Merged
Conversation
…2.0 Commit 1)
Substrate work for v0.40.2.0 Track B (trajectory routing for temporal +
knowledge_update). This commit lands the schema + the shared formatter;
think wiring + LongMemEval extractor + intent routing come in Commits 2-4.
Migration v81 (facts_event_type_column):
ALTER TABLE facts ADD COLUMN event_type TEXT (nullable, metadata-only).
Lets the v0.35.4 typed-claim substrate carry event-shaped rows
(event_type='meeting'/'job_change'/'location_change') alongside the
metric-shaped rows (claim_metric/claim_value etc) it has carried since
v67. Temporal-reasoning questions ("when did I last meet Marco") need
the event shape; the metric shape doesn't fit them.
Engine changes (pglite + postgres parity):
- TrajectoryPoint.event_type: string | null added; projection in both
findTrajectory SQL paths returns the column.
- TrajectoryOpts.kind?: 'metric' | 'event' | 'all' added (default 'all').
Defensive opt that future-proofs filtering once event rows accumulate.
- Both engines apply the new kind filter at SQL level when set.
Back-compat (codex outside-voice concern):
Existing callers (founder-scorecard, eval-trajectory) already defensively
skip metric === null rows in their per-metric math. Event-only rows
(metric=NULL, event_type='meeting') ride through invisibly to those
callers — verified by the new regression test that asserts byte-identical
computeFounderScorecard + computeTrajectoryStats output with and without
event rows in the input. Both callers now pass kind:'metric' explicitly
for call-site clarity (no behavior change).
MCP find_trajectory op:
- event_type added to the wire-shape map.
- kind param added to the op declaration (enum metric/event/all).
Shared formatter (src/core/trajectory-format.ts, new):
formatTrajectoryBlock(points, entitySlug, opts) — sibling shape to
renderTakesBlock + renderChatBlock. Groups by (metric ?? event_type).
Per-metric cap 20, total cap 100 (prompt-budget guardrail). For
knowledge_update intent, annotates value-change rows with
"(superseded prior)" — the explicit signal codex flagged was missing
from default RRF-ordered retrieval. Promoted to src/core/ so both
gbrain think (Commit 2) and the LongMemEval harness (Commit 4)
consume one source of truth.
Prompt-injection coverage (codex Problem 10):
src/core/think/sanitize.ts INJECTION_PATTERNS extended with three
new entries — close-trajectory, open-trajectory, xml-attr-inject —
so adversarial </trajectory> sequences in extracted text get
escaped before reaching the model. Parity with the existing
</take> coverage.
Tests (all hermetic, no DATABASE_URL):
- test/trajectory-format.test.ts (17 cases, all green): grouping,
caps, sanitization, supersession annotation, determinism,
provenance, text-cap, adversarial </trajectory> escape.
- test/engine-parity-event-type.test.ts (6 cases): PGLite round-trip
of the column + kind filter matrix.
- test/regressions/v0_40_2_0-trajectory-backcompat.test.ts (4 cases):
pins the byte-identical-output contract that founder-scorecard's
per-metric math ignores event rows.
- test/migrate.test.ts: v81 round-trip verified via existing
structural assertion harness.
- 209 tests across 5 impacted suites pass; bun run verify clean
(17 pre-checks including privacy, jsonb, type, fuzz purity).
Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md
GSTACK REVIEW REPORT: CEO + ENG + CODEX CLEARED.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….40.2.0 Commit 2)
Wires the v0.40.2.0 substrate (Commit 1's facts.event_type column +
formatTrajectoryBlock) into the production `gbrain think` surface.
Default ON; flip `think.trajectory_enabled=false` to opt out.
New pure modules (zero engine dependency):
- src/core/think/intent.ts — classifyIntent(question): regex-first
routing into 'temporal' | 'knowledge_update' | 'other'. KU wins over
temporal when both match. The 'other' fast path short-circuits with
zero SQL.
- src/core/think/entity-extract.ts — extractCandidateEntities() pulls
high-precision candidates from retrieval slugs (people/, companies/,
organizations/, deals/) and medium-precision noun phrases from the
question. Word-level tokenization + stop-word boundaries stitch
"Blue Bottle" as one candidate while splitting "I last meet Marco"
correctly. Leading-verb stripper drops "meet", "visit" etc so
"marco" surfaces cleanly. Cap of 5 per question.
Engine-touching wiring (src/core/think/index.ts):
- RunThinkOpts gains 4 fields: withTrajectory (default true),
sourceId, allowedSources, remote.
- readThinkTrajectoryEnabled() reads the config kill switch; default
true; survives missing config table on legacy brains.
- Trajectory orchestration sits between gather and prompt assembly:
intent classify → extract candidates → per-candidate
resolveEntitySlugWithSource → skip fallback_slugify → 5s timeout
Promise.race + 3-wide concurrency cap → formatTrajectoryBlock.
Any error degrades to "no block" + TRAJECTORY_INJECTION_FAILED
warning; the think call itself never crashes from trajectory.
- On success, TRAJECTORY_INJECTED_<N>_POINTS warning records the
count for downstream telemetry.
Prompt placement (src/core/think/prompt.ts) — Codex Problem 6 fix:
buildThinkUserMessage's trajectoryBlock slot honors BOTH existing
orderings — calibration mode inserts trajectory between calibration
and question; default mode inserts between retrieval and the output
instruction. NO third ordering is introduced. Empty trajectoryBlock
skips the "Known trajectory:" header entirely (don't cue the model
we tried).
Resolution-source signal (src/core/entities/resolve.ts) — Codex Problem 5:
New companion resolveEntitySlugWithSource() returns
{slug, source: 'exact_page' | 'fuzzy_match' | 'fallback_slugify'}
so trajectory routing can skip fallback-only resolutions —
querying findTrajectory on an invented slug always returns [] and
wastes a SQL round-trip. The original resolveEntitySlug keeps its
contract for pre-v0.40 callers.
MCP think op handler (src/core/operations.ts):
Extracts sourceScopeOpts(ctx) into scalar sourceId + allowedSources
+ remote, threads through to runThink. CLI callers omit (engine
default source, remote=false). Mirrors the same source-scope
discipline applied to all other read paths in v0.34.1.0.
Sanitization (Commit 1 already extended INJECTION_PATTERNS for
</trajectory> — consumed here).
Test coverage (all hermetic, no DATABASE_URL, no API keys):
- test/think-intent.test.ts (14 cases) — temporal, KU, other,
precedence (KU wins when both match), defensive non-string inputs.
- test/think-entity-extract.test.ts (10 cases) — retrieved-slug
source, noun-phrase source, stop-word stripping, leading-verb
stripping, dedup across sources, 5-candidate cap.
- test/think-trajectory-injection.test.ts (7 cases against PGLite
in-memory) — temporal intent injection happy path with superseded-
prior annotation, "other" intent short-circuit, withTrajectory:
false bypass, think.trajectory_enabled=false config bypass,
empty-trajectory skip, engine.findTrajectory throw is caught
(Promise.allSettled defense), TRAJECTORY_INJECTED warning count.
- Existing test/think-pipeline.serial.test.ts re-asserted unchanged
(10 cases — calibration mode parity, gather, sanitization,
cite-render all intact).
72 tests pass across 7 impacted suites; bun run verify clean (17 pre-
checks). Defaulted on per CEO + Eng D1; kill switch via config.
Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(v0.40.2.0 Commit 3)
Populates the LongMemEval benchmark brain's facts table inline at
import time so Commit 4's intent routing has data to retrieve. Per the
CHANGELOG D1 decision, this is full-haystack preprocessing — disclosed
explicitly in the benchmark output's methodology_note field (Commit 4).
New module src/eval/longmemeval/extract.ts:
extractAndInsertClaims({engine, client, model, sessionSlug,
sessionId, sessionBody, sourceId, aliasMap})
- Hashes the session body (sha256) for cache lookup.
- Cache hit → reuses parsed claims (cuts a 3-iteration benchmark
run from $1.50 to $0.50 when sessions repeat across questions,
as they do in LongMemEval).
- Cache miss → one Haiku call. System prompt asks for
{entity, metric, value, unit, period, event_type, valid_from, text}[]
JSON. New parseExtractedJsonArray() helper does fence-strip + parse
(parseModelJSON from cross-modal-eval is shaped for scored objects,
not arrays — different parser needed here).
- Per-record validateClaim() drops malformed records (missing
entity, bad date) silently; the rest land in NewFact rows.
- Per-question AliasMap (Codex Problem 4 — semantics pinned):
"Marco" + "Marco Smith" + "marco" in the SAME question collapse
to one slug via first-mention-wins canonicalization. Across
questions, the harness creates a fresh map (no leak).
- Real-page-aware entity resolution via the v0.40.2.0
resolveEntitySlugWithSource (Commit 2). Slugify-fallback rows
still insert (we need the data); the resolution_source signal
is only consulted at trajectory retrieval time (Commit 4).
- Bulk insert via engine.insertFacts with the
`gbrain-allow-direct-insert` allow-list comment per the
check-system-of-record CI guard contract — benchmark brain is
ephemeral in-memory PGLite, no markdown source-of-truth applies.
- Fail-open posture: Haiku throw, malformed JSON, insert collision
all return inserted=0 without throwing. One bad session never
kills the per-question loop.
- getCacheStats() exposes hits/misses/size for the per-run stderr
telemetry Codex Problem 14 asked for (empirical hit-rate
reporting; the optimistic claim self-verifies).
Substrate plumbing (extends Commit 1):
- NewFact.event_type?: string | null added in engine.ts so the
extractor can pass event-shaped rows through to insertFacts.
- PGLite engine + Postgres engine insertFacts() now persist
event_type. Param-positional dispatch extended to 20/21 placeholders
(null-embedding vs embedding-present); tx.unsafe vector cast on
Postgres path unchanged.
Test coverage (test/longmemeval-extract.test.ts, 13 cases, hermetic):
- Happy path: typed-claim + event rows both insert with correct
kind (event_type='meeting' → kind='event'; claim_metric='mrr'
→ kind='fact').
- Alias map: per-session collapsing ("Marco" + "Marco Smith"),
cross-session persistence within one question, fresh map per
question (caller-clears semantics pinned).
- Content-hash cache: identical body → cache hit, only ONE Haiku
call across two sessions; different bodies miss; getCacheStats
reports hits/misses/size.
- Fail-open: malformed JSON, Haiku throw, empty array output,
invalid records (missing entity, bad date) — none crash; 0
inserted in each case.
55 tests pass across 4 impacted suites; bun run verify clean.
Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ology disclosure (v0.40.2.0 Commit 4)
The final wiring: per-question intent classification + trajectory call
+ block splice into the answer-gen prompt. Plus the methodology
disclosure stamps that close out the Codex D1 contract.
New module src/eval/longmemeval/intent.ts:
classifyIntent(q): prefers q.question_type from the dataset
(LongMemEval ships labels like 'temporal-reasoning',
'knowledge-update', 'single-session-user') before falling back to
the SHARED regex set imported from src/core/think/intent.ts.
Single source of truth for the regex — think and longmemeval
cannot drift.
Harness wiring in src/commands/eval-longmemeval.ts:
- runEvalLongMemEval() spawns an extractor model via resolveModel
(tier:'utility' → haiku) when trajectory routing is enabled.
Calls resetExtractorState() once per benchmark run so the
content-hash cache + counters start clean.
- runOneQuestion() creates a FRESH per-question AliasMap (Codex
Problem 4 — first-mention-wins canonicalization stays scoped to
one question, never leaks across).
- Per session: after importFromContent lands, extractAndInsertClaims
populates the facts table. Fail-open if the Haiku call errors;
next session keeps going.
- After hybridSearch returns: classifyIntent(q) routes
temporal/knowledge_update through extractCandidateEntities (the
SHARED helper from Commit 2's think/entity-extract) → per-candidate
findTrajectory with 5s Promise.race timeout → formatTrajectoryBlock.
First candidate with a non-empty trajectory wins.
- generateAnswer() splices the trajectory block BEFORE the
Retrieved sessions block. Empty block (no entity match / no
points) → no "Known trajectory:" header (don't cue the model
we tried).
- JSON envelope gains 5 fields per question when trajectory routing
is on: intent, trajectory_points, entity_resolved,
resolution_source, methodology_note. methodology_note also
written to stderr at run completion.
Resolution-source gate DIVERGES from think (intentional):
In the think production path, fallback_slugify results are skipped
because querying invented slugs wastes SQL — production brains have
canonical pages. In the LongMemEval benchmark, there ARE no
canonical pages; both the extractor and the lookup go through
slugify-fallback on the same free-form name, so they cohere on the
same slug. Applying the think-path gate here would permanently
block trajectory injection on the benchmark. Comment in
runOneQuestion documents the divergence.
New CLI flag --no-trajectory:
Bypasses BOTH the Haiku extractor AND the per-question intent
routing. Used by the measurement protocol to baseline default-on vs
no-trajectory across 3 seeds per condition with paired-bootstrap
CI. Documented in the help text.
New RunOpts fields:
- extractorClient?: ThinkLLMClient — separate stub from the
answer-gen client so tests can isolate the two surfaces.
- extractorModel?: string — model override for the Haiku call.
methodology_note = 'extractor=haiku-preprocess-full-haystack-v1'
stamped on:
- Every per-question JSON envelope row.
- Stderr summary at run completion.
This is the Codex D1 contract: the temporal-reasoning delta we
publish is "gbrain + Haiku-preprocess pipeline" vs "gbrain alone",
not directly comparable to LongMemEval's published baselines
without that disclosure.
Extractor cache hit-rate stderr summary (Codex Problem 14):
'[longmemeval] extractor.cache_hits: 412 / 489 sessions (84.2%,
cached_bodies=412)' — empirical verification of the optimistic
hit-rate claim. The optimistic number self-verifies per run.
Test coverage (all hermetic, no API keys):
- test/longmemeval-intent.test.ts (9 cases) — dataset
question_type → Intent mapping for all six LongMemEval labels;
dataset label trumps question-text signal; unknown labels fall
through to the regex classifier.
- test/longmemeval-trajectory-routing.test.ts (4 cases) —
end-to-end through runEvalLongMemEval with both clients stubbed:
trajectory block lands in answer-gen prompt for temporal
intent + absent for 'other'; --no-trajectory bypasses extractor
AND injection AND omits envelope fields; methodology_note
stamped on every routed row; perf gate preserved (< 10s for
2-question fixture).
118 tests pass across 11 impacted suites; bun run verify clean.
Wave complete. CHANGELOG draft + measurement plan live in the plan
file. v0.40.2.0 ready for /ship after a real-LLM spot-check run.
Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master shipped v0.38.0.0 (ingestion-cathedral wave, #1275) which claimed migration slot v81 with `pages_provenance_columns`. The v0.40.2.0 trajectory-routing wave's `facts_event_type_column` migration is renumbered to v82. Engine + test code that reference the new migration are updated to v82. Master's v81 tests (`pages_provenance_columns`) remain intact and test master's migration unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.40.2.0 trajectory routing wave — gbrain think now grounds answers about temporal/knowledge-update questions in the typed-claim timeline the brain has been quietly building via the extract_facts cycle phase. Default ON; flip think.trajectory_enabled=false to opt out. LongMemEval-side wiring lands the same plumbing in the benchmark harness with explicit methodology disclosure (extractor=haiku-preprocess- full-haystack-v1) in the JSON envelope and stderr summary — the published temporal-reasoning number is "gbrain + Haiku-preprocess" vs "gbrain alone", not directly comparable to LongMemEval's published baselines without that disclosure. Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md 3 review passes: CEO + ENG + CODEX all CLEARED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md, README.md, AGENTS.md extended with the v0.40.2.0 trajectory routing surface: gbrain think integration (default ON via think.trajectory_enabled config key), facts.event_type schema column + TrajectoryPoint.event_type + TrajectoryOpts.kind filter, shared formatTrajectoryBlock helper in src/core/trajectory-format.ts, LongMemEval extractor + intent routing + methodology disclosure, migration v82. llms-full.txt regenerated to match CLAUDE.md edits (CI test/build-llms gate). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
588d720 to
1f5fc17
Compare
… v82-v85) Master shipped v0.38.1.0 (provider-agnostic subagent loop, #1289) which claimed migration slots v82-v85: v82 — subagent_tool_executions_stable_id v83 — mcp_spend_reservations v84 — oauth_clients_budget_usd_per_day v85 — oauth_clients_agent_binding The v0.40.2.0 trajectory-routing wave's `facts_event_type_column` migration is renumbered to v86. Engine + test + CLAUDE.md references updated. CHANGELOG reconstructed: v0.40.2.0 entry kept at the top (our entry), master's v0.38.1.0 entry inserted below, both intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit of the trajectory-routing wave's test surface vs the shipped
code surfaced 7 gaps. All filled, all green. Total: 343 tests across
17 impacted suites (was 272 pre-fill).
Gap 1 — Migration v86 structural tests (11 new in test/migrate.test.ts):
- v86 entry exists with documented name + idempotent
- exactly one event_type column add to facts
- IF NOT EXISTS guard
- column is nullable (no NOT NULL, no DEFAULT regression guard)
- does NOT create any index (event_type is selectivity-poor)
- does NOT touch any other table (blast-radius pin)
- does NOT carry a sqlFor override (engine-shared SQL contract)
- PGLite round-trip: column exists with right type + nullable
- event_type INSERT/SELECT round-trip
- NULL round-trip for legacy + metric-only rows
- LATEST_VERSION >= 86 contract pin
Gap 2 — resolveEntitySlugWithSource branch coverage (12 new in
test/entity-resolve.test.ts):
- exact_page branch (full slug, slug-shape match)
- fuzzy_match branch (Title-cased display name, bare first name via
prefix expansion)
- fallback_slugify branch (unseeded name, multi-word non-match
phrase, accented input)
- null tail (empty + whitespace)
- back-compat parity with resolveEntitySlug for both exact_page and
fallback_slugify branches
Gap 3 — INJECTION_PATTERNS dedicated coverage for new entries (18 new
in test/think-sanitize-trajectory.test.ts):
- close-trajectory entry registered + matches canonical and
whitespace/case variations
- open-trajectory entry registered + matches both no-attr and
with-attrs forms
- xml-attr-inject strips entity=/metric=/event_type=/kind=
- does NOT strip non-trajectory attribute names (class/id/title)
- combined multi-vector attack: all three patterns fire
- formatTrajectoryBlock end-to-end with adversarial extractor text:
one live </trajectory> (the wrapper, not the injection); one
entity= attribute (the wrapper, not the injection)
- pattern ordering invariant: new entries land after close-take
Gap 4 — runThink calibration-mode placement contract (3 new in
test/think-trajectory-injection.test.ts):
- default mode: question → pages → takes → trajectory → instruction
- calibration mode: pages → takes → calibration → trajectory →
question → instruction (Codex P6 — no third ordering invented)
- empty trajectory in calibration mode preserves the existing
calibration shape (no false-positive cue)
Gap 5 — runThink resolution_source != fallback_slugify gate (1 new
in test/think-trajectory-injection.test.ts):
- candidate that only matches via fallback_slugify is NOT queried
(think-path divergence from longmemeval-path which accepts it)
Gap 6 — E2E for runThink trajectory injection (7 new in
test/e2e/think-trajectory-pglite.test.ts):
- full pipeline lands <trajectory> block in answer-gen prompt
- knowledge_update intent annotates value-change rows with
(superseded prior)
- 'other' intent short-circuits (no block, no SQL)
- think.trajectory_enabled=false config bypasses entire path
- empty brain → graceful no-op (no crash, no block)
- multi-entity deterministic ordering
- adversarial </trajectory> in seeded fact text is escaped before
reaching the LLM (end-to-end sanitization gate)
Gap 7 — longmemeval extractor stress + persistence pins (6 new in
test/longmemeval-extract.test.ts):
- alias map cross-session stress with 12 sessions in one question;
all 12 rows collapse under ONE entity_slug
- different entities stay separate across many sessions
- embedding + embedded_at both NULL on benchmark-inserted rows
(regression guard against accidental embed-on-write)
- row_num sequential + source_markdown_slug stamped per session
(v0.32.2 partial UNIQUE index contract)
- source field stamped "longmemeval:extractor" (audit-tag pin)
- cache key invariance: same body hash hits cache across different
sessionId/slug
bun run verify clean (17 pre-checks). No regressions in any of the
14 non-new impacted suites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… v86) Master shipped v0.38.2.0 (#1297 doctor frontmatter scan) and v0.39.0.0 (#1283 brainstorm cost cathedral). v0.39.0.0 claimed migration v86 with `page_links_view_alias`. The v0.40.2.0 trajectory-routing wave's `facts_event_type_column` migration renumbers v86 → v87. All references updated in: - src/core/migrate.ts: migration entry now v87, renumber comment notes the full v81→v82→v86→v87 history across three master merges. - src/core/engine.ts, src/core/pglite-engine.ts, src/core/postgres-engine.ts: inline comments bumped to v87. - test/migrate.test.ts: my describe blocks (11 structural + 4 round-trip cases) bumped to v87. LATEST_VERSION assertion bumped to >= 87. - CLAUDE.md: v0.40.2.0 entry mentions v87. Master's v0.39.0.0 references to v86 (page_links_view_alias) preserved intact. - CHANGELOG: reconstructed cleanly — v0.40.2.0 entry at top with v87 reference, master's v0.39.0.0 + v0.38.2.0 + v0.38.1.0 entries inserted in order below. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema-bootstrap-coverage CI guard (test/schema-bootstrap-coverage.test.ts)
enforces that every ALTER TABLE ADD COLUMN in MIGRATIONS is covered by
applyForwardReferenceBootstrap OR by PGLITE_SCHEMA_SQL's CREATE TABLE
bodies OR by COLUMN_EXEMPTIONS.
v0.40.2.0's migration v87 adds facts.event_type but deliberately ships
without a bootstrap probe because:
- No CREATE INDEX in PGLITE_SCHEMA_SQL references event_type
- No FK references event_type
- All existing callers (founder-scorecard, eval-trajectory, gbrain
think trajectory injection) defensively skip NULL-metric rows in
per-metric math, so event_type=NULL on pre-v87 brains is invisible
- Pre-v87 brains land event_type=NULL via the migration ALTER
Exactly mirrors the precedent set by facts.claim_metric / claim_value /
claim_unit / claim_period exemptions (v67 typed-claim columns) which
are exempted for the same structural reason: column-only migration,
no forward-reference index, no downstream filter breaks on old brains.
Adding facts.event_type to COLUMN_EXEMPTIONS with a brief rationale
comment matching the existing v0.35.6 entry shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… v87+v88) Master shipped v0.39.1.0 (#1248 schema packs — bring your own shape) which claimed migration slots v87 + v88: v87 — takes_kind_drop_check v88 — eval_candidates_schema_pack_per_source The v0.40.2.0 trajectory-routing wave's `facts_event_type_column` migration renumbers v87 → v89. All references updated in: - src/core/migrate.ts: migration entry now v89, renumber comment notes the full v81→v82→v86→v87→v89 history across four master merges. - src/core/engine.ts, src/core/pglite-engine.ts, src/core/postgres-engine.ts: inline comments bumped to v89. - test/migrate.test.ts: my describe blocks (11 structural + 4 round-trip cases) bumped to v89. LATEST_VERSION assertion bumped to >= 89. - test/schema-bootstrap-coverage.test.ts: COLUMN_EXEMPTIONS entry comment bumped to v89. Master's v0.39.1.0 also added eval_candidates.schema_pack_per_source to the exemption list — both kept (file has no conflicts after stitching). - CLAUDE.md: v0.40.2.0 entry mentions v89. Master's v0.39 references to v86 (page_links_view_alias) preserved intact. - CHANGELOG.md: my v0.40.2.0 entry's Substrate header bumped to v89. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nflict, CHANGELOG stitched) master shipped agent-voice (v0.40.0.0), capture-fix wave (v0.39.3.0), and autopilot per-source fan-out (v0.39.2.0) while v0.40.2.0 trajectory-routing was in review. No migration version collision — v89 facts_event_type_column slots cleanly past master's v88 eval_candidates_schema_pack_per_source. VERSION/package.json kept at 0.40.2.0. CHANGELOG.md stitched with v0.40.2.0 on top, then master's three new entries in date order, then existing entries from v0.39.0.0 down. Post-merge audit: VERSION/package.json/top-CHANGELOG all agree on 0.40.2.0. typecheck clean. 243 wave-impacted tests pass. llms-full.txt regenerated for the merged CLAUDE.md.
master shipped v0.40.1.0 Track D — eval infrastructure (hermetic qrels gate + nightly cross-modal probe + --by-type/--by-type-floor on longmemeval). Conflicts in CHANGELOG.md, CLAUDE.md, VERSION, package.json, llms-full.txt, and src/commands/eval-longmemeval.ts. VERSION/package.json kept at 0.40.2.0. CHANGELOG.md stitched with v0.40.2.0 on top, then v0.40.1.0, then existing entries. CLAUDE.md merged: kept both master's new entries (eval-replay-gate + nightly-quality-probe) AND v0.40.2.0 trajectory entries, and merged both v0.40.1.0 Track D + v0.40.2.0 extensions onto the same eval-longmemeval annotation. src/commands/eval-longmemeval.ts: combined both ParsedArgs additions (noTrajectory + byType + byTypeFloor), both CLI parser branches, both help-text blocks, and both terminal-output branches (extractor cache hit-rate + by-type summary emission). Post-merge audit: VERSION/package.json/top-CHANGELOG all agree on 0.40.2.0. typecheck clean. 65 longmemeval-impacted tests pass (--by-type + trajectory routing co-exist). llms-full.txt regenerated.
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: (22 commits) v0.41.4.0 wave: local providers + cross-platform stdin + gateway-routed dream judge (6 community PRs) (garrytan#1377) v0.41.3.0 fix(security/mcp): OAuth CORS lockdown + pre-register without DCR + validator surface (garrytan#1403) v0.41.2.0 feat: lens packs + epistemology unification — atoms + concepts as first-class units, calibration profile widening, gstack-learnings bridge (garrytan#1364) v0.41.1.0 feat: eval-loop wave — gbrain bench publish + gbrain eval gate close the LOOP (garrytan#1352) v0.41.0.0 feat(minions): fleet you supervise (4 field bugs + cathedral) (garrytan#1367) v0.40.10.0 feat: content sanity defense — junk-pattern throw + oversize-skip-embed (garrytan#1351) v0.40.9.0 feat(chunker): .sql indexing via tree-sitter + code-def on SQL DDL (garrytan#1173) (garrytan#1350) v0.40.8.1 docs: README rewrite + personal-brain + company-brain tutorials (garrytan#1345) v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes (garrytan#1313) v0.40.6.1 docs(todos): file v0.41 wave commitments + 7 verified-missing items (garrytan#1333) v0.40.7.0 Schema Cathedral v3 — agent-on-ramp + production rebuild of PR garrytan#1321 (garrytan#1327) v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR garrytan#1314) (garrytan#1324) v0.40.5.0 Federated Sync v2 — parallel source sync + push triggers + per-source health (garrytan#1322) v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification (garrytan#1300) v0.40.3.0 feat: contextual retrieval + cache invalidation gate + 4 deferred-item closures (garrytan#1323) v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) (garrytan#1296) v0.40.1.0 Track D — eval infrastructure (catch retrieval regressions, prove answer-quality wins) (garrytan#1298) v0.40.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm (garrytan#1128) v0.39.3.0: productionize the v0.38 ingestion cathedral (smoke-test fix wave from PR garrytan#1299) (garrytan#1308) v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy (garrytan#1295) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v0.40.2.0 trajectory routing wave — closes the gap between gbrain's
typed-claim substrate (shipped v0.35.7, currently dormant at answer-gen
time) and the production
gbrain thinksurface that should be groundingtemporal/knowledge-update answers with it.
Six commits, ~3.1K LOC, 81 new tests:
feat(facts)(Commit 1): Substrate. Migration v82 adds nullablefacts.event_type.TrajectoryPoint.event_type+TrajectoryOpts.kindfilter. New shared
src/core/trajectory-format.tsconsumed by boththink and longmemeval (no DRY violation).
INJECTION_PATTERNSextended to escape
</trajectory>adversarial sequences. Founder-scorecard + eval-trajectory pass
kind: 'metric'explicitly forclarity (no behavior change — they already skipped NULL-metric rows).
feat(think)(Commit 2):gbrain thinktrajectory integration,default ON. New
src/core/think/intent.ts(regex-first classifier,no LLM call on the
'other'fast path). Newsrc/core/think/entity-extract.tsshared with longmemeval. Per-candidate
findTrajectorywith 5s Promise.race timeout + concurrencycap 3. MCP think op handler extracts
sourceScopeOpts(ctx)forfederated-read OAuth client scoping.
think.trajectory_enabled=falseis the kill switch.
feat(longmemeval)(Commit 3): Inline Haiku claim extractor.Content-hash cache (cuts 3-iteration benchmark run from ~$1.50 to
~$0.50 when sessions repeat across questions). Per-question alias
map collapses "Marco" + "Marco Smith" + "marco" to one slug; fresh
map per question, no cross-question leak. Fail-open on malformed
JSON / Haiku throw / insert collision.
feat(longmemeval)(Commit 4): Intent routing + prompt splice +methodology disclosure. Per-question
classifyIntentprefersdataset's
question_typefield, falls back to the shared regex setfrom think (single source of truth).
--no-trajectoryCLI flag forA/B baselining. JSON envelope adds
intent,trajectory_points,entity_resolved,resolution_source,methodology_note: "extractor=haiku-preprocess-full-haystack-v1"per the Codex D1disclosure contract — the published temporal-reasoning number is
"gbrain + Haiku-preprocess pipeline" vs "gbrain alone", NOT
directly comparable to LongMemEval's published baselines without
this note.
merge master+chore: bump: master jumped to v0.38.0.0(ingestion cathedral, v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract #1275) mid-wave; my v81
facts_event_type_columnmigration renumbered to v82. Engine + test code updated to v82.
Master's v81 (
pages_provenance_columns) tests remain intact.docs:CLAUDE.md, README.md, AGENTS.md updated for the newsurface. llms-full.txt regenerated.
Test Coverage
test/trajectory-format.test.ts</trajectory>escape)test/engine-parity-event-type.test.tstest/regressions/v0_40_2_0-trajectory-backcompat.test.tstest/think-intent.test.tstest/think-entity-extract.test.tstest/think-trajectory-injection.test.tstest/longmemeval-extract.test.tstest/longmemeval-intent.test.tstest/longmemeval-trajectory-routing.test.ts272 tests pass across 14 impacted suites; `bun run verify` clean. No regressions.
Coverage gate: every changed codepath has a corresponding test. Fail-open
paths exercised (engine throw, Haiku throw, malformed JSON, empty array,
invalid records).
Pre-Landing Review
3 review passes completed during planning, all CLEARED:
buildThinkUserMessageinjection point, no back-compat filter needed (callers already defensively skip NULL-metric rows). D1 decided default-ON rollout.resolveEntitySlugWithSourceresolution_source signal, prompt-placement preserving BOTH calibration and default ordering, INJECTION_PATTERNS for</trajectory>, 5s findTrajectory + 10s extractor timeouts, doctor check deferred to v0.40.3+, real-LLM spot-check added, success metric broadened). The benchmark methodology contamination was the load-bearing decision — accepted with explicit CHANGELOG + JSON-envelope disclosure.Plan:
~/.claude/plans/system-instruction-you-are-working-crystalline-owl.mdDocumentation
facts.event_typecolumn +TrajectoryOpts.kindfilter; documented the new sharedsrc/core/trajectory-format.tshelper; extendedsrc/core/think/index.tsentry with v0.40.2.0 trajectory injection (default ON,think.trajectory_enabledconfig key,sourceScopeOptsthreading,GBRAIN_THINK_DEBUGenv); extendedsrc/commands/eval-longmemeval.tsentry with the inline Haiku extractor + intent routing + methodology disclosure note.gbrain thinknow grounds temporal/knowledge-update answers in the typed-claim timeline by default, with the opt-out config key and the LongMemEval methodology note disclosed.gbrain thinknow uses this substrate automatically + thekind: 'event' | 'all'filter for non-metric event rows.Test plan
Open follow-ups for v0.40.3+
trajectory_healthdoctor check (deferred per Codex P16 — premature on a column that's mostly NULL in production until users populateevent_typevia cycle phase).gbrain auto-think,gbrain dream synthesize, calibration recall-footer.event_type TEXTalone is impoverished for things like "moved to SF" → needs object/actor/location).extract_factscycle phase event extraction (so production users get event rows in theirfactstable without manual seeding).🤖 Generated with Claude Code