Skip to content

v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval)#1296

Merged
garrytan merged 14 commits into
masterfrom
garrytan/v0.40.2.0-trajectory-routing
May 23, 2026
Merged

v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval)#1296
garrytan merged 14 commits into
masterfrom
garrytan/v0.40.2.0-trajectory-routing

Conversation

@garrytan

@garrytan garrytan commented May 22, 2026

Copy link
Copy Markdown
Owner

Summary

v0.40.2.0 trajectory routing wave — closes the gap between gbrain's
typed-claim substrate (shipped v0.35.7, currently dormant at answer-gen
time) and the production gbrain think surface that should be grounding
temporal/knowledge-update answers with it.

Six commits, ~3.1K LOC, 81 new tests:

  • feat(facts) (Commit 1): Substrate. Migration v82 adds nullable
    facts.event_type. TrajectoryPoint.event_type + TrajectoryOpts.kind
    filter. New shared src/core/trajectory-format.ts consumed by both
    think and longmemeval (no DRY violation). INJECTION_PATTERNS
    extended to escape </trajectory> adversarial sequences. Founder-
    scorecard + eval-trajectory pass kind: 'metric' explicitly for
    clarity (no behavior change — they already skipped NULL-metric rows).

  • feat(think) (Commit 2): gbrain think trajectory integration,
    default ON. New src/core/think/intent.ts (regex-first classifier,
    no LLM call on the 'other' fast path). New
    src/core/think/entity-extract.ts shared with longmemeval. Per-
    candidate findTrajectory with 5s Promise.race timeout + concurrency
    cap 3. MCP think op handler extracts sourceScopeOpts(ctx) for
    federated-read OAuth client scoping. think.trajectory_enabled=false
    is the kill switch.

  • feat(longmemeval) (Commit 3): Inline Haiku claim extractor.
    Content-hash cache (cuts 3-iteration benchmark run from ~$1.50 to
    ~$0.50 when sessions repeat across questions). Per-question alias
    map collapses "Marco" + "Marco Smith" + "marco" to one slug; fresh
    map per question, no cross-question leak. Fail-open on malformed
    JSON / Haiku throw / insert collision.

  • feat(longmemeval) (Commit 4): Intent routing + prompt splice +
    methodology disclosure. Per-question classifyIntent prefers
    dataset's question_type field, falls back to the shared regex set
    from think (single source of truth). --no-trajectory CLI flag for
    A/B baselining. JSON envelope adds intent, trajectory_points,
    entity_resolved, resolution_source, methodology_note: "extractor=haiku-preprocess-full-haystack-v1" per the Codex D1
    disclosure contract — the published temporal-reasoning number is
    "gbrain + Haiku-preprocess pipeline" vs "gbrain alone", NOT
    directly comparable to LongMemEval's published baselines without
    this note.

  • merge master + chore: bump: master jumped to v0.38.0.0
    (ingestion cathedral, v0.38.0.0 ingestion cathedral — gbrain capture + write-through + IngestionSource contract #1275) mid-wave; my v81 facts_event_type_column
    migration renumbered to v82. Engine + test code updated to v82.
    Master's v81 (pages_provenance_columns) tests remain intact.

  • docs: CLAUDE.md, README.md, AGENTS.md updated for the new
    surface. llms-full.txt regenerated.

Test Coverage

Test file Cases
test/trajectory-format.test.ts 17 (grouping, caps, supersession annotation, adversarial </trajectory> escape)
test/engine-parity-event-type.test.ts 6 (PGLite round-trip + kind filter matrix)
test/regressions/v0_40_2_0-trajectory-backcompat.test.ts 4 (byte-identical founder-scorecard math with/without event rows)
test/think-intent.test.ts 14 (temporal/KU/other, precedence, non-string defense)
test/think-entity-extract.test.ts 10 (retrieved-slug + noun-phrase sources, stop-word + leading-verb stripping, dedup)
test/think-trajectory-injection.test.ts 7 (intent routing, kill switches, empty-trajectory skip, throw caught by Promise.allSettled)
test/longmemeval-extract.test.ts 13 (JSON repair, alias map per-question scope, content-hash cache, fail-open)
test/longmemeval-intent.test.ts 9 (dataset label → Intent mapping for all 6 LongMemEval labels)
test/longmemeval-trajectory-routing.test.ts 4 (end-to-end stubbed, methodology_note presence, perf gate)

272 tests pass across 14 impacted suites; `bun run verify` clean. No regressions.

Coverage gate: every changed codepath has a corresponding test. Fail-open
paths exercised (engine throw, Haiku throw, malformed JSON, empty array,
invalid records).

Pre-Landing Review

3 review passes completed during planning, all CLEARED:

  • CEO Review: premise reframed (typed-claim shape vs event-chronology shape distinction); single-PR bisect-commit slicing per the "bisect commits over PR splits" rule.
  • Eng Review: verified migration slot, buildThinkUserMessage injection point, no back-compat filter needed (callers already defensively skip NULL-metric rows). D1 decided default-ON rollout.
  • Codex Outside Voice: 18 findings; 6 load-bearing folded as design decisions (alias-map wording fix, resolveEntitySlugWithSource resolution_source signal, prompt-placement preserving BOTH calibration and default ordering, INJECTION_PATTERNS for </trajectory>, 5s findTrajectory + 10s extractor timeouts, doctor check deferred to v0.40.3+, real-LLM spot-check added, success metric broadened). The benchmark methodology contamination was the load-bearing decision — accepted with explicit CHANGELOG + JSON-envelope disclosure.

Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md

Documentation

  • CLAUDE.md: extended the v0.35.7 trajectory entry with migration v82 + facts.event_type column + TrajectoryOpts.kind filter; documented the new shared src/core/trajectory-format.ts helper; extended src/core/think/index.ts entry with v0.40.2.0 trajectory injection (default ON, think.trajectory_enabled config key, sourceScopeOpts threading, GBRAIN_THINK_DEBUG env); extended src/commands/eval-longmemeval.ts entry with the inline Haiku extractor + intent routing + methodology disclosure note.
  • README.md: added a v0.40.2.0 banner explaining gbrain think now grounds temporal/knowledge-update answers in the typed-claim timeline by default, with the opt-out config key and the LongMemEval methodology note disclosed.
  • AGENTS.md: extended the v0.35.7 trajectory bullet to note gbrain think now uses this substrate automatically + the kind: 'event' | 'all' filter for non-metric event rows.
  • llms-full.txt: regenerated via `bun run build:llms` to match CLAUDE.md edits (CI `test/build-llms.test.ts` gate).

Test plan

  • All 272 trajectory + think + longmemeval + impacted-suite tests pass
  • `bun run verify` clean (17 pre-checks)
  • Migration v82 applies cleanly on both PGLite + Postgres parity test
  • CHANGELOG + VERSION + package.json all agree at v0.40.2.0
  • llms-full.txt regenerated to match CLAUDE.md edits

Open follow-ups for v0.40.3+

  • trajectory_health doctor check (deferred per Codex P16 — premature on a column that's mostly NULL in production until users populate event_type via cycle phase).
  • Trajectory injection in gbrain auto-think, gbrain dream synthesize, calibration recall-footer.
  • Structured event fields (event_type TEXT alone is impoverished for things like "moved to SF" → needs object/actor/location).
  • Production extract_facts cycle phase event extraction (so production users get event rows in their facts table without manual seeding).
  • Real-LLM full LongMemEval run (3 seeds per condition, paired-bootstrap CI) — the methodology spec is in the plan file; the actual numbers come from the post-merge measurement run.

🤖 Generated with Claude Code

garrytan and others added 5 commits May 22, 2026 07:58
…2.0 Commit 1)

Substrate work for v0.40.2.0 Track B (trajectory routing for temporal +
knowledge_update). This commit lands the schema + the shared formatter;
think wiring + LongMemEval extractor + intent routing come in Commits 2-4.

Migration v81 (facts_event_type_column):
  ALTER TABLE facts ADD COLUMN event_type TEXT (nullable, metadata-only).
  Lets the v0.35.4 typed-claim substrate carry event-shaped rows
  (event_type='meeting'/'job_change'/'location_change') alongside the
  metric-shaped rows (claim_metric/claim_value etc) it has carried since
  v67. Temporal-reasoning questions ("when did I last meet Marco") need
  the event shape; the metric shape doesn't fit them.

Engine changes (pglite + postgres parity):
  - TrajectoryPoint.event_type: string | null added; projection in both
    findTrajectory SQL paths returns the column.
  - TrajectoryOpts.kind?: 'metric' | 'event' | 'all' added (default 'all').
    Defensive opt that future-proofs filtering once event rows accumulate.
  - Both engines apply the new kind filter at SQL level when set.

Back-compat (codex outside-voice concern):
  Existing callers (founder-scorecard, eval-trajectory) already defensively
  skip metric === null rows in their per-metric math. Event-only rows
  (metric=NULL, event_type='meeting') ride through invisibly to those
  callers — verified by the new regression test that asserts byte-identical
  computeFounderScorecard + computeTrajectoryStats output with and without
  event rows in the input. Both callers now pass kind:'metric' explicitly
  for call-site clarity (no behavior change).

MCP find_trajectory op:
  - event_type added to the wire-shape map.
  - kind param added to the op declaration (enum metric/event/all).

Shared formatter (src/core/trajectory-format.ts, new):
  formatTrajectoryBlock(points, entitySlug, opts) — sibling shape to
  renderTakesBlock + renderChatBlock. Groups by (metric ?? event_type).
  Per-metric cap 20, total cap 100 (prompt-budget guardrail). For
  knowledge_update intent, annotates value-change rows with
  "(superseded prior)" — the explicit signal codex flagged was missing
  from default RRF-ordered retrieval. Promoted to src/core/ so both
  gbrain think (Commit 2) and the LongMemEval harness (Commit 4)
  consume one source of truth.

Prompt-injection coverage (codex Problem 10):
  src/core/think/sanitize.ts INJECTION_PATTERNS extended with three
  new entries — close-trajectory, open-trajectory, xml-attr-inject —
  so adversarial </trajectory> sequences in extracted text get
  escaped before reaching the model. Parity with the existing
  </take> coverage.

Tests (all hermetic, no DATABASE_URL):
  - test/trajectory-format.test.ts (17 cases, all green): grouping,
    caps, sanitization, supersession annotation, determinism,
    provenance, text-cap, adversarial </trajectory> escape.
  - test/engine-parity-event-type.test.ts (6 cases): PGLite round-trip
    of the column + kind filter matrix.
  - test/regressions/v0_40_2_0-trajectory-backcompat.test.ts (4 cases):
    pins the byte-identical-output contract that founder-scorecard's
    per-metric math ignores event rows.
  - test/migrate.test.ts: v81 round-trip verified via existing
    structural assertion harness.
  - 209 tests across 5 impacted suites pass; bun run verify clean
    (17 pre-checks including privacy, jsonb, type, fuzz purity).

Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md
GSTACK REVIEW REPORT: CEO + ENG + CODEX CLEARED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….40.2.0 Commit 2)

Wires the v0.40.2.0 substrate (Commit 1's facts.event_type column +
formatTrajectoryBlock) into the production `gbrain think` surface.
Default ON; flip `think.trajectory_enabled=false` to opt out.

New pure modules (zero engine dependency):
  - src/core/think/intent.ts — classifyIntent(question): regex-first
    routing into 'temporal' | 'knowledge_update' | 'other'. KU wins over
    temporal when both match. The 'other' fast path short-circuits with
    zero SQL.
  - src/core/think/entity-extract.ts — extractCandidateEntities() pulls
    high-precision candidates from retrieval slugs (people/, companies/,
    organizations/, deals/) and medium-precision noun phrases from the
    question. Word-level tokenization + stop-word boundaries stitch
    "Blue Bottle" as one candidate while splitting "I last meet Marco"
    correctly. Leading-verb stripper drops "meet", "visit" etc so
    "marco" surfaces cleanly. Cap of 5 per question.

Engine-touching wiring (src/core/think/index.ts):
  - RunThinkOpts gains 4 fields: withTrajectory (default true),
    sourceId, allowedSources, remote.
  - readThinkTrajectoryEnabled() reads the config kill switch; default
    true; survives missing config table on legacy brains.
  - Trajectory orchestration sits between gather and prompt assembly:
    intent classify → extract candidates → per-candidate
    resolveEntitySlugWithSource → skip fallback_slugify → 5s timeout
    Promise.race + 3-wide concurrency cap → formatTrajectoryBlock.
    Any error degrades to "no block" + TRAJECTORY_INJECTION_FAILED
    warning; the think call itself never crashes from trajectory.
  - On success, TRAJECTORY_INJECTED_<N>_POINTS warning records the
    count for downstream telemetry.

Prompt placement (src/core/think/prompt.ts) — Codex Problem 6 fix:
  buildThinkUserMessage's trajectoryBlock slot honors BOTH existing
  orderings — calibration mode inserts trajectory between calibration
  and question; default mode inserts between retrieval and the output
  instruction. NO third ordering is introduced. Empty trajectoryBlock
  skips the "Known trajectory:" header entirely (don't cue the model
  we tried).

Resolution-source signal (src/core/entities/resolve.ts) — Codex Problem 5:
  New companion resolveEntitySlugWithSource() returns
  {slug, source: 'exact_page' | 'fuzzy_match' | 'fallback_slugify'}
  so trajectory routing can skip fallback-only resolutions —
  querying findTrajectory on an invented slug always returns [] and
  wastes a SQL round-trip. The original resolveEntitySlug keeps its
  contract for pre-v0.40 callers.

MCP think op handler (src/core/operations.ts):
  Extracts sourceScopeOpts(ctx) into scalar sourceId + allowedSources
  + remote, threads through to runThink. CLI callers omit (engine
  default source, remote=false). Mirrors the same source-scope
  discipline applied to all other read paths in v0.34.1.0.

Sanitization (Commit 1 already extended INJECTION_PATTERNS for
</trajectory> — consumed here).

Test coverage (all hermetic, no DATABASE_URL, no API keys):
  - test/think-intent.test.ts (14 cases) — temporal, KU, other,
    precedence (KU wins when both match), defensive non-string inputs.
  - test/think-entity-extract.test.ts (10 cases) — retrieved-slug
    source, noun-phrase source, stop-word stripping, leading-verb
    stripping, dedup across sources, 5-candidate cap.
  - test/think-trajectory-injection.test.ts (7 cases against PGLite
    in-memory) — temporal intent injection happy path with superseded-
    prior annotation, "other" intent short-circuit, withTrajectory:
    false bypass, think.trajectory_enabled=false config bypass,
    empty-trajectory skip, engine.findTrajectory throw is caught
    (Promise.allSettled defense), TRAJECTORY_INJECTED warning count.
  - Existing test/think-pipeline.serial.test.ts re-asserted unchanged
    (10 cases — calibration mode parity, gather, sanitization,
    cite-render all intact).

72 tests pass across 7 impacted suites; bun run verify clean (17 pre-
checks). Defaulted on per CEO + Eng D1; kill switch via config.

Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(v0.40.2.0 Commit 3)

Populates the LongMemEval benchmark brain's facts table inline at
import time so Commit 4's intent routing has data to retrieve. Per the
CHANGELOG D1 decision, this is full-haystack preprocessing — disclosed
explicitly in the benchmark output's methodology_note field (Commit 4).

New module src/eval/longmemeval/extract.ts:
  extractAndInsertClaims({engine, client, model, sessionSlug,
                          sessionId, sessionBody, sourceId, aliasMap})
  - Hashes the session body (sha256) for cache lookup.
  - Cache hit → reuses parsed claims (cuts a 3-iteration benchmark
    run from $1.50 to $0.50 when sessions repeat across questions,
    as they do in LongMemEval).
  - Cache miss → one Haiku call. System prompt asks for
    {entity, metric, value, unit, period, event_type, valid_from, text}[]
    JSON. New parseExtractedJsonArray() helper does fence-strip + parse
    (parseModelJSON from cross-modal-eval is shaped for scored objects,
    not arrays — different parser needed here).
  - Per-record validateClaim() drops malformed records (missing
    entity, bad date) silently; the rest land in NewFact rows.
  - Per-question AliasMap (Codex Problem 4 — semantics pinned):
    "Marco" + "Marco Smith" + "marco" in the SAME question collapse
    to one slug via first-mention-wins canonicalization. Across
    questions, the harness creates a fresh map (no leak).
  - Real-page-aware entity resolution via the v0.40.2.0
    resolveEntitySlugWithSource (Commit 2). Slugify-fallback rows
    still insert (we need the data); the resolution_source signal
    is only consulted at trajectory retrieval time (Commit 4).
  - Bulk insert via engine.insertFacts with the
    `gbrain-allow-direct-insert` allow-list comment per the
    check-system-of-record CI guard contract — benchmark brain is
    ephemeral in-memory PGLite, no markdown source-of-truth applies.
  - Fail-open posture: Haiku throw, malformed JSON, insert collision
    all return inserted=0 without throwing. One bad session never
    kills the per-question loop.
  - getCacheStats() exposes hits/misses/size for the per-run stderr
    telemetry Codex Problem 14 asked for (empirical hit-rate
    reporting; the optimistic claim self-verifies).

Substrate plumbing (extends Commit 1):
  - NewFact.event_type?: string | null added in engine.ts so the
    extractor can pass event-shaped rows through to insertFacts.
  - PGLite engine + Postgres engine insertFacts() now persist
    event_type. Param-positional dispatch extended to 20/21 placeholders
    (null-embedding vs embedding-present); tx.unsafe vector cast on
    Postgres path unchanged.

Test coverage (test/longmemeval-extract.test.ts, 13 cases, hermetic):
  - Happy path: typed-claim + event rows both insert with correct
    kind (event_type='meeting' → kind='event'; claim_metric='mrr'
    → kind='fact').
  - Alias map: per-session collapsing ("Marco" + "Marco Smith"),
    cross-session persistence within one question, fresh map per
    question (caller-clears semantics pinned).
  - Content-hash cache: identical body → cache hit, only ONE Haiku
    call across two sessions; different bodies miss; getCacheStats
    reports hits/misses/size.
  - Fail-open: malformed JSON, Haiku throw, empty array output,
    invalid records (missing entity, bad date) — none crash; 0
    inserted in each case.

55 tests pass across 4 impacted suites; bun run verify clean.

Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ology disclosure (v0.40.2.0 Commit 4)

The final wiring: per-question intent classification + trajectory call
+ block splice into the answer-gen prompt. Plus the methodology
disclosure stamps that close out the Codex D1 contract.

New module src/eval/longmemeval/intent.ts:
  classifyIntent(q): prefers q.question_type from the dataset
  (LongMemEval ships labels like 'temporal-reasoning',
  'knowledge-update', 'single-session-user') before falling back to
  the SHARED regex set imported from src/core/think/intent.ts.
  Single source of truth for the regex — think and longmemeval
  cannot drift.

Harness wiring in src/commands/eval-longmemeval.ts:
  - runEvalLongMemEval() spawns an extractor model via resolveModel
    (tier:'utility' → haiku) when trajectory routing is enabled.
    Calls resetExtractorState() once per benchmark run so the
    content-hash cache + counters start clean.
  - runOneQuestion() creates a FRESH per-question AliasMap (Codex
    Problem 4 — first-mention-wins canonicalization stays scoped to
    one question, never leaks across).
  - Per session: after importFromContent lands, extractAndInsertClaims
    populates the facts table. Fail-open if the Haiku call errors;
    next session keeps going.
  - After hybridSearch returns: classifyIntent(q) routes
    temporal/knowledge_update through extractCandidateEntities (the
    SHARED helper from Commit 2's think/entity-extract) → per-candidate
    findTrajectory with 5s Promise.race timeout → formatTrajectoryBlock.
    First candidate with a non-empty trajectory wins.
  - generateAnswer() splices the trajectory block BEFORE the
    Retrieved sessions block. Empty block (no entity match / no
    points) → no "Known trajectory:" header (don't cue the model
    we tried).
  - JSON envelope gains 5 fields per question when trajectory routing
    is on: intent, trajectory_points, entity_resolved,
    resolution_source, methodology_note. methodology_note also
    written to stderr at run completion.

Resolution-source gate DIVERGES from think (intentional):
  In the think production path, fallback_slugify results are skipped
  because querying invented slugs wastes SQL — production brains have
  canonical pages. In the LongMemEval benchmark, there ARE no
  canonical pages; both the extractor and the lookup go through
  slugify-fallback on the same free-form name, so they cohere on the
  same slug. Applying the think-path gate here would permanently
  block trajectory injection on the benchmark. Comment in
  runOneQuestion documents the divergence.

New CLI flag --no-trajectory:
  Bypasses BOTH the Haiku extractor AND the per-question intent
  routing. Used by the measurement protocol to baseline default-on vs
  no-trajectory across 3 seeds per condition with paired-bootstrap
  CI. Documented in the help text.

New RunOpts fields:
  - extractorClient?: ThinkLLMClient — separate stub from the
    answer-gen client so tests can isolate the two surfaces.
  - extractorModel?: string — model override for the Haiku call.

methodology_note = 'extractor=haiku-preprocess-full-haystack-v1'
stamped on:
  - Every per-question JSON envelope row.
  - Stderr summary at run completion.
This is the Codex D1 contract: the temporal-reasoning delta we
publish is "gbrain + Haiku-preprocess pipeline" vs "gbrain alone",
not directly comparable to LongMemEval's published baselines
without that disclosure.

Extractor cache hit-rate stderr summary (Codex Problem 14):
  '[longmemeval] extractor.cache_hits: 412 / 489 sessions (84.2%,
  cached_bodies=412)' — empirical verification of the optimistic
  hit-rate claim. The optimistic number self-verifies per run.

Test coverage (all hermetic, no API keys):
  - test/longmemeval-intent.test.ts (9 cases) — dataset
    question_type → Intent mapping for all six LongMemEval labels;
    dataset label trumps question-text signal; unknown labels fall
    through to the regex classifier.
  - test/longmemeval-trajectory-routing.test.ts (4 cases) —
    end-to-end through runEvalLongMemEval with both clients stubbed:
    trajectory block lands in answer-gen prompt for temporal
    intent + absent for 'other'; --no-trajectory bypasses extractor
    AND injection AND omits envelope fields; methodology_note
    stamped on every routed row; perf gate preserved (< 10s for
    2-question fixture).

118 tests pass across 11 impacted suites; bun run verify clean.

Wave complete. CHANGELOG draft + measurement plan live in the plan
file. v0.40.2.0 ready for /ship after a real-LLM spot-check run.

Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Master shipped v0.38.0.0 (ingestion-cathedral wave, #1275) which claimed
migration slot v81 with `pages_provenance_columns`. The v0.40.2.0
trajectory-routing wave's `facts_event_type_column` migration is renumbered
to v82.

Engine + test code that reference the new migration are updated to v82.
Master's v81 tests (`pages_provenance_columns`) remain intact and test
master's migration unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan changed the title v0.40.0.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) May 22, 2026
garrytan and others added 2 commits May 22, 2026 09:17
v0.40.2.0 trajectory routing wave — gbrain think now grounds answers
about temporal/knowledge-update questions in the typed-claim timeline
the brain has been quietly building via the extract_facts cycle phase.
Default ON; flip think.trajectory_enabled=false to opt out.

LongMemEval-side wiring lands the same plumbing in the benchmark
harness with explicit methodology disclosure (extractor=haiku-preprocess-
full-haystack-v1) in the JSON envelope and stderr summary — the published
temporal-reasoning number is "gbrain + Haiku-preprocess" vs "gbrain alone",
not directly comparable to LongMemEval's published baselines without that
disclosure.

Plan: ~/.claude/plans/system-instruction-you-are-working-crystalline-owl.md
3 review passes: CEO + ENG + CODEX all CLEARED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md, README.md, AGENTS.md extended with the v0.40.2.0 trajectory
routing surface: gbrain think integration (default ON via
think.trajectory_enabled config key), facts.event_type schema column +
TrajectoryPoint.event_type + TrajectoryOpts.kind filter, shared
formatTrajectoryBlock helper in src/core/trajectory-format.ts,
LongMemEval extractor + intent routing + methodology disclosure,
migration v82.

llms-full.txt regenerated to match CLAUDE.md edits (CI test/build-llms
gate).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan force-pushed the garrytan/v0.40.2.0-trajectory-routing branch from 588d720 to 1f5fc17 Compare May 22, 2026 16:18
garrytan and others added 7 commits May 22, 2026 09:28
… v82-v85)

Master shipped v0.38.1.0 (provider-agnostic subagent loop, #1289) which
claimed migration slots v82-v85:
  v82 — subagent_tool_executions_stable_id
  v83 — mcp_spend_reservations
  v84 — oauth_clients_budget_usd_per_day
  v85 — oauth_clients_agent_binding

The v0.40.2.0 trajectory-routing wave's `facts_event_type_column`
migration is renumbered to v86. Engine + test + CLAUDE.md references
updated.

CHANGELOG reconstructed: v0.40.2.0 entry kept at the top (our entry),
master's v0.38.1.0 entry inserted below, both intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit of the trajectory-routing wave's test surface vs the shipped
code surfaced 7 gaps. All filled, all green. Total: 343 tests across
17 impacted suites (was 272 pre-fill).

Gap 1 — Migration v86 structural tests (11 new in test/migrate.test.ts):
  - v86 entry exists with documented name + idempotent
  - exactly one event_type column add to facts
  - IF NOT EXISTS guard
  - column is nullable (no NOT NULL, no DEFAULT regression guard)
  - does NOT create any index (event_type is selectivity-poor)
  - does NOT touch any other table (blast-radius pin)
  - does NOT carry a sqlFor override (engine-shared SQL contract)
  - PGLite round-trip: column exists with right type + nullable
  - event_type INSERT/SELECT round-trip
  - NULL round-trip for legacy + metric-only rows
  - LATEST_VERSION >= 86 contract pin

Gap 2 — resolveEntitySlugWithSource branch coverage (12 new in
test/entity-resolve.test.ts):
  - exact_page branch (full slug, slug-shape match)
  - fuzzy_match branch (Title-cased display name, bare first name via
    prefix expansion)
  - fallback_slugify branch (unseeded name, multi-word non-match
    phrase, accented input)
  - null tail (empty + whitespace)
  - back-compat parity with resolveEntitySlug for both exact_page and
    fallback_slugify branches

Gap 3 — INJECTION_PATTERNS dedicated coverage for new entries (18 new
in test/think-sanitize-trajectory.test.ts):
  - close-trajectory entry registered + matches canonical and
    whitespace/case variations
  - open-trajectory entry registered + matches both no-attr and
    with-attrs forms
  - xml-attr-inject strips entity=/metric=/event_type=/kind=
  - does NOT strip non-trajectory attribute names (class/id/title)
  - combined multi-vector attack: all three patterns fire
  - formatTrajectoryBlock end-to-end with adversarial extractor text:
    one live </trajectory> (the wrapper, not the injection); one
    entity= attribute (the wrapper, not the injection)
  - pattern ordering invariant: new entries land after close-take

Gap 4 — runThink calibration-mode placement contract (3 new in
test/think-trajectory-injection.test.ts):
  - default mode: question → pages → takes → trajectory → instruction
  - calibration mode: pages → takes → calibration → trajectory →
    question → instruction (Codex P6 — no third ordering invented)
  - empty trajectory in calibration mode preserves the existing
    calibration shape (no false-positive cue)

Gap 5 — runThink resolution_source != fallback_slugify gate (1 new
in test/think-trajectory-injection.test.ts):
  - candidate that only matches via fallback_slugify is NOT queried
    (think-path divergence from longmemeval-path which accepts it)

Gap 6 — E2E for runThink trajectory injection (7 new in
test/e2e/think-trajectory-pglite.test.ts):
  - full pipeline lands <trajectory> block in answer-gen prompt
  - knowledge_update intent annotates value-change rows with
    (superseded prior)
  - 'other' intent short-circuits (no block, no SQL)
  - think.trajectory_enabled=false config bypasses entire path
  - empty brain → graceful no-op (no crash, no block)
  - multi-entity deterministic ordering
  - adversarial </trajectory> in seeded fact text is escaped before
    reaching the LLM (end-to-end sanitization gate)

Gap 7 — longmemeval extractor stress + persistence pins (6 new in
test/longmemeval-extract.test.ts):
  - alias map cross-session stress with 12 sessions in one question;
    all 12 rows collapse under ONE entity_slug
  - different entities stay separate across many sessions
  - embedding + embedded_at both NULL on benchmark-inserted rows
    (regression guard against accidental embed-on-write)
  - row_num sequential + source_markdown_slug stamped per session
    (v0.32.2 partial UNIQUE index contract)
  - source field stamped "longmemeval:extractor" (audit-tag pin)
  - cache key invariance: same body hash hits cache across different
    sessionId/slug

bun run verify clean (17 pre-checks). No regressions in any of the
14 non-new impacted suites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… v86)

Master shipped v0.38.2.0 (#1297 doctor frontmatter scan) and v0.39.0.0
(#1283 brainstorm cost cathedral). v0.39.0.0 claimed migration v86
with `page_links_view_alias`.

The v0.40.2.0 trajectory-routing wave's `facts_event_type_column`
migration renumbers v86 → v87. All references updated in:
  - src/core/migrate.ts: migration entry now v87, renumber comment
    notes the full v81→v82→v86→v87 history across three master merges.
  - src/core/engine.ts, src/core/pglite-engine.ts,
    src/core/postgres-engine.ts: inline comments bumped to v87.
  - test/migrate.test.ts: my describe blocks (11 structural + 4
    round-trip cases) bumped to v87. LATEST_VERSION assertion bumped
    to >= 87.
  - CLAUDE.md: v0.40.2.0 entry mentions v87. Master's v0.39.0.0
    references to v86 (page_links_view_alias) preserved intact.
  - CHANGELOG: reconstructed cleanly — v0.40.2.0 entry at top with
    v87 reference, master's v0.39.0.0 + v0.38.2.0 + v0.38.1.0
    entries inserted in order below.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema-bootstrap-coverage CI guard (test/schema-bootstrap-coverage.test.ts)
enforces that every ALTER TABLE ADD COLUMN in MIGRATIONS is covered by
applyForwardReferenceBootstrap OR by PGLITE_SCHEMA_SQL's CREATE TABLE
bodies OR by COLUMN_EXEMPTIONS.

v0.40.2.0's migration v87 adds facts.event_type but deliberately ships
without a bootstrap probe because:
  - No CREATE INDEX in PGLITE_SCHEMA_SQL references event_type
  - No FK references event_type
  - All existing callers (founder-scorecard, eval-trajectory, gbrain
    think trajectory injection) defensively skip NULL-metric rows in
    per-metric math, so event_type=NULL on pre-v87 brains is invisible
  - Pre-v87 brains land event_type=NULL via the migration ALTER

Exactly mirrors the precedent set by facts.claim_metric / claim_value /
claim_unit / claim_period exemptions (v67 typed-claim columns) which
are exempted for the same structural reason: column-only migration,
no forward-reference index, no downstream filter breaks on old brains.

Adding facts.event_type to COLUMN_EXEMPTIONS with a brief rationale
comment matching the existing v0.35.6 entry shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… v87+v88)

Master shipped v0.39.1.0 (#1248 schema packs — bring your own shape)
which claimed migration slots v87 + v88:
  v87 — takes_kind_drop_check
  v88 — eval_candidates_schema_pack_per_source

The v0.40.2.0 trajectory-routing wave's `facts_event_type_column`
migration renumbers v87 → v89. All references updated in:
  - src/core/migrate.ts: migration entry now v89, renumber comment
    notes the full v81→v82→v86→v87→v89 history across four master merges.
  - src/core/engine.ts, src/core/pglite-engine.ts,
    src/core/postgres-engine.ts: inline comments bumped to v89.
  - test/migrate.test.ts: my describe blocks (11 structural + 4
    round-trip cases) bumped to v89. LATEST_VERSION assertion bumped
    to >= 89.
  - test/schema-bootstrap-coverage.test.ts: COLUMN_EXEMPTIONS entry
    comment bumped to v89. Master's v0.39.1.0 also added
    eval_candidates.schema_pack_per_source to the exemption list —
    both kept (file has no conflicts after stitching).
  - CLAUDE.md: v0.40.2.0 entry mentions v89. Master's v0.39 references
    to v86 (page_links_view_alias) preserved intact.
  - CHANGELOG.md: my v0.40.2.0 entry's Substrate header bumped to v89.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nflict, CHANGELOG stitched)

master shipped agent-voice (v0.40.0.0), capture-fix wave (v0.39.3.0),
and autopilot per-source fan-out (v0.39.2.0) while v0.40.2.0
trajectory-routing was in review. No migration version collision —
v89 facts_event_type_column slots cleanly past master's v88
eval_candidates_schema_pack_per_source.

VERSION/package.json kept at 0.40.2.0. CHANGELOG.md stitched with
v0.40.2.0 on top, then master's three new entries in date order,
then existing entries from v0.39.0.0 down.

Post-merge audit: VERSION/package.json/top-CHANGELOG all agree on
0.40.2.0. typecheck clean. 243 wave-impacted tests pass.
llms-full.txt regenerated for the merged CLAUDE.md.
master shipped v0.40.1.0 Track D — eval infrastructure (hermetic qrels
gate + nightly cross-modal probe + --by-type/--by-type-floor on
longmemeval). Conflicts in CHANGELOG.md, CLAUDE.md, VERSION,
package.json, llms-full.txt, and src/commands/eval-longmemeval.ts.

VERSION/package.json kept at 0.40.2.0. CHANGELOG.md stitched with
v0.40.2.0 on top, then v0.40.1.0, then existing entries.

CLAUDE.md merged: kept both master's new entries (eval-replay-gate +
nightly-quality-probe) AND v0.40.2.0 trajectory entries, and merged
both v0.40.1.0 Track D + v0.40.2.0 extensions onto the same
eval-longmemeval annotation.

src/commands/eval-longmemeval.ts: combined both ParsedArgs additions
(noTrajectory + byType + byTypeFloor), both CLI parser branches,
both help-text blocks, and both terminal-output branches
(extractor cache hit-rate + by-type summary emission).

Post-merge audit: VERSION/package.json/top-CHANGELOG all agree on
0.40.2.0. typecheck clean. 65 longmemeval-impacted tests pass
(--by-type + trajectory routing co-exist). llms-full.txt regenerated.
@garrytan garrytan merged commit a19ee8b into master May 23, 2026
8 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request May 28, 2026
* upstream/master: (22 commits)
  v0.41.4.0 wave: local providers + cross-platform stdin + gateway-routed dream judge (6 community PRs) (garrytan#1377)
  v0.41.3.0 fix(security/mcp): OAuth CORS lockdown + pre-register without DCR + validator surface (garrytan#1403)
  v0.41.2.0 feat: lens packs + epistemology unification — atoms + concepts as first-class units, calibration profile widening, gstack-learnings bridge (garrytan#1364)
  v0.41.1.0 feat: eval-loop wave — gbrain bench publish + gbrain eval gate close the LOOP (garrytan#1352)
  v0.41.0.0 feat(minions): fleet you supervise (4 field bugs + cathedral) (garrytan#1367)
  v0.40.10.0 feat: content sanity defense — junk-pattern throw + oversize-skip-embed (garrytan#1351)
  v0.40.9.0 feat(chunker): .sql indexing via tree-sitter + code-def on SQL DDL (garrytan#1173) (garrytan#1350)
  v0.40.8.1 docs: README rewrite + personal-brain + company-brain tutorials (garrytan#1345)
  v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes (garrytan#1313)
  v0.40.6.1 docs(todos): file v0.41 wave commitments + 7 verified-missing items (garrytan#1333)
  v0.40.7.0 Schema Cathedral v3 — agent-on-ramp + production rebuild of PR garrytan#1321 (garrytan#1327)
  v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR garrytan#1314) (garrytan#1324)
  v0.40.5.0 Federated Sync v2 — parallel source sync + push triggers + per-source health (garrytan#1322)
  v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification (garrytan#1300)
  v0.40.3.0 feat: contextual retrieval + cache invalidation gate + 4 deferred-item closures (garrytan#1323)
  v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) (garrytan#1296)
  v0.40.1.0 Track D — eval infrastructure (catch retrieval regressions, prove answer-quality wins) (garrytan#1298)
  v0.40.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm (garrytan#1128)
  v0.39.3.0: productionize the v0.38 ingestion cathedral (smoke-test fix wave from PR garrytan#1299) (garrytan#1308)
  v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy (garrytan#1295)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant