Skip to content

v0.42.27.0 feat(idea-lineage): first-class idea_lineage op + feature eval + skill hardening#1940

Open
garrytan wants to merge 12 commits into
masterfrom
codex/idea-lineage-skill
Open

v0.42.27.0 feat(idea-lineage): first-class idea_lineage op + feature eval + skill hardening#1940
garrytan wants to merge 12 commits into
masterfrom
codex/idea-lineage-skill

Conversation

@garrytan

@garrytan garrytan commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Summary

Builds on community PR #1830 (the original idea-lineage thinking skill by @davidNbreslauer) and takes it from a markdown workflow to a first-class, tested capability:

  1. Hardened + deepened the skill — graph/timeline-aware evidence gathering (get_backlinks, traverse_graph depth-2, get_timeline), a concrete high/medium/low confidence rubric + degraded-evidence note, and a hardened routing boundary (adversarial fixtures separating it from concept-synthesis and trajectory queries; new low-collision changed my mind about trigger).
  2. First-class idea_lineage op (src/core/operations.ts) — resolve a free-text idea to its best concept/page anchor, then gather dated evidence (matches, related concepts via backlinks + graph, timeline anchors, takes, optional entity trajectory, idea-scoped cached contradictions). Handler-orchestrated over existing engine primitives (no new engine method), resolve→gather two-phase, scope:'read' + localOnly with an in-handler ctx.remote reject. CLI (gbrain idea-lineage <idea>) + MCP tool auto-generated from the contract. Returns candidate anchors + disambiguation_needed and a degraded flag.
  3. Feature-recovery eval — synthetic-corpus op test asserting lineage recovery, a cross-engine determinism case, and gbrain eval idea-lineage <idea> reporting evidence coverage (new lineage_evidence_coverage glossary metric), persisting to .gbrain-evals/ (gitignored).

The op is local-only by design: it composes read primitives whose source/visibility filtering isn't uniform yet, so v1 is scoped to local/trusted callers (see TODOS.md for the federated/remote follow-up). Classification (reversal vs abandoned branch) stays agent-side; the op returns evidence, not narrative, so it stays deterministic and engine-parity-testable.

Design review

Went through /plan-eng-review + a Codex outside-voice plan review (4 cross-model tensions decided: handler-orchestration over an engine method, keep the op thin, local-only v1, deterministic parity assertions) and a Codex adversarial diff review (takes source-scoping + eval source-resolution fixed; the GBRAIN_HOME-preload isolation was replaced with per-test isolation after it was found to leak into subprocess tests).

Testing

  • idea-lineage op test (synthetic-corpus recovery + contract) 10/0; metric-glossary 18/0; engine-parity (incl. new idea_lineage case) 14/0 against real Postgres + PGLite; skills-conformance/resolver/skillpack 371/0; routing-eval 116/0 (0 false positives); bun run verify 30/30; typecheck clean.
  • Test isolation fix: the suite's "no API key configured" assertions now isolate from a developer's real ~/.gbrain/config.json (they previously passed in CI but failed locally on a configured brain). Verified the 6 affected files 85/0 under the real config.

Supersedes

Supersedes #1830 (cross-repo fork PR — these improvements build on that work; original skill credited in the CHANGELOG).

🤖 Generated with Claude Code

Documentation

Docs synced for this release:

  • CHANGELOG.md — v0.42.27.0 entry (sell-test: what / why / how, with copy-paste commands).
  • docs/eval/METRIC_GLOSSARY.md — regenerated with the new lineage_evidence_coverage metric (CI freshness guard green).
  • TODOS.md — filed the federated/remote idea_lineage follow-up (deferred from v1's local-only scope).
  • skills/RESOLVER.md + skills/manifest.json + operations-descriptions.ts — idea-lineage skill row + idea_lineage op description (the canonical reference surfaces).

Documentation debt (pre-existing, not from this PR)

  • ⚠️ CLAUDE.md says "GBrain ships 29 skills" but the manifest now holds 51 — long-standing drift from prior releases that didn't update the prose count. Out of scope here (touching CLAUDE.md triggers an llms-bundle regen); worth a standalone cleanup. Reference-quadrant fix.

davidNbreslauer and others added 8 commits June 3, 2026 14:51
Add graph/timeline tools (get_backlinks, traverse_graph depth-2, get_timeline)
to the skill, a concrete high/medium/low confidence rubric + degraded-evidence
note, and a new low-collision "changed my mind about" trigger. Expand
routing-eval fixtures with adversarial paraphrases + trajectory/query negatives,
plus a protective concept-synthesis boundary case. RESOLVER + llms-full updated.
Resolve a free-text idea to its best concept/page anchor, then gather dated
evidence (matches, backlinks + depth-2 graph, timeline, takes, optional entity
trajectory, cached contradictions). Handler-orchestrated over existing engine
primitives (no new engine method); resolve→gather two-phase; scope:'read' +
localOnly with a ctx.remote reject (sidesteps the federated-scope/visibility
gaps in getBacklinks/getTimeline). Embeddings stripped at the wire boundary.
Extract the contradiction slug-filter into a shared contradiction-filter helper
reused by find_contradictions (DRY). Pin IDEA_LINEAGE_DESCRIPTION.
…eage

Synthetic-corpus op test asserting lineage recovery (resolution, disambiguation,
idea-scoped contradictions, embedding-stripped wire shape, remote reject,
empty→degraded). Cross-engine parity case asserting deterministic evidence
(top-result + non-vector set-equal). New `gbrain eval idea-lineage <idea>` CLI
reporting evidence coverage, persisting to .gbrain-evals/idea-lineage-results.jsonl
(explicit persistence; gitignored). Add lineage_evidence_coverage glossary metric
+ render group; regenerate METRIC_GLOSSARY.md.
Point GBRAIN_HOME at a throwaway temp dir in the shared preload so the suite
never reads the developer's real ~/.gbrain/config.json. Without this, tests that
assert "no API key configured" behavior (think degradation, hasAnthropicKey,
probeLlmAvailability, ZE-key health, dream synthesize) pass in CI but fail on any
machine with a configured brain, because loadConfig() resolves the key from the
config file even after the env var is deleted. Makes local runs match CI.
…solution

Address adversarial-review findings:
- Replace the GBRAIN_HOME-in-preload approach with per-test suppressAnthropicKey
  (file-level beforeAll/afterAll) on the in-process no-key tests. The preload
  leaked GBRAIN_HOME into HOME-isolated subprocess tests (skillpack-check,
  doctor-home-dir, init-migrate-only, …) and broke their child-process
  isolation. Per-test isolation touches no subprocess test.
- idea_lineage: source-scope the takes gather (was unscoped while every other
  evidence bucket was scoped).
- gbrain eval idea-lineage: resolve the source via the canonical resolveSourceId
  chain (--source / GBRAIN_SOURCE / .gbrain-source) instead of hardcoding default.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
garrytan and others added 4 commits June 7, 2026 08:37
…Ids[]

getBacklinks, getTimeline, searchTakes, and searchTakesVector now accept the
federated sourceIds[] array (array path wins over scalar; neither = no filter),
mirroring findTrajectory's source predicate. searchTakes/searchTakesVector gain
real source_id isolation (previously holder-allow-list only — holder-scope is
not a source boundary). getTimeline's 8-case branch collapses to one composed
query. Both engines move in lockstep; engine-parity asserts cross-source
EXCLUSION for each method.
Drop localOnly + the runtime ctx.remote reject; thread one validated
sourceScopeOpts scope to all five gather channels. findTrajectory now threads
remote=ctx.remote===true (world-only facts for remote — fixes a hardcoded
remote:false private-fact leak). Contradictions (global, unscoped trend) are
omitted for remote callers. p.source is validated against ctx.auth.allowedSources
for remote callers (closes a cross-source IDOR). Phase-2 gather uses
Promise.allSettled with a partial/errors flag; schema_version 1->2. HTTP/OAuth
MCP only — not added to the subagent allow-list (deferred). Remote-safety unit
tests + description updated; TODOS follow-ups filed.
…skill

# Conflicts:
#	CHANGELOG.md
#	TODOS.md
#	VERSION
#	package.json
#	test/helpers/no-anthropic-key.ts
#	test/think-pipeline.serial.test.ts
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants