v0.42.27.0 feat(idea-lineage): first-class idea_lineage op + feature eval + skill hardening#1940
Open
garrytan wants to merge 12 commits into
Open
v0.42.27.0 feat(idea-lineage): first-class idea_lineage op + feature eval + skill hardening#1940garrytan wants to merge 12 commits into
garrytan wants to merge 12 commits into
Conversation
Add graph/timeline tools (get_backlinks, traverse_graph depth-2, get_timeline) to the skill, a concrete high/medium/low confidence rubric + degraded-evidence note, and a new low-collision "changed my mind about" trigger. Expand routing-eval fixtures with adversarial paraphrases + trajectory/query negatives, plus a protective concept-synthesis boundary case. RESOLVER + llms-full updated.
Resolve a free-text idea to its best concept/page anchor, then gather dated evidence (matches, backlinks + depth-2 graph, timeline, takes, optional entity trajectory, cached contradictions). Handler-orchestrated over existing engine primitives (no new engine method); resolve→gather two-phase; scope:'read' + localOnly with a ctx.remote reject (sidesteps the federated-scope/visibility gaps in getBacklinks/getTimeline). Embeddings stripped at the wire boundary. Extract the contradiction slug-filter into a shared contradiction-filter helper reused by find_contradictions (DRY). Pin IDEA_LINEAGE_DESCRIPTION.
…eage Synthetic-corpus op test asserting lineage recovery (resolution, disambiguation, idea-scoped contradictions, embedding-stripped wire shape, remote reject, empty→degraded). Cross-engine parity case asserting deterministic evidence (top-result + non-vector set-equal). New `gbrain eval idea-lineage <idea>` CLI reporting evidence coverage, persisting to .gbrain-evals/idea-lineage-results.jsonl (explicit persistence; gitignored). Add lineage_evidence_coverage glossary metric + render group; regenerate METRIC_GLOSSARY.md.
Point GBRAIN_HOME at a throwaway temp dir in the shared preload so the suite never reads the developer's real ~/.gbrain/config.json. Without this, tests that assert "no API key configured" behavior (think degradation, hasAnthropicKey, probeLlmAvailability, ZE-key health, dream synthesize) pass in CI but fail on any machine with a configured brain, because loadConfig() resolves the key from the config file even after the env var is deleted. Makes local runs match CI.
…solution Address adversarial-review findings: - Replace the GBRAIN_HOME-in-preload approach with per-test suppressAnthropicKey (file-level beforeAll/afterAll) on the in-process no-key tests. The preload leaked GBRAIN_HOME into HOME-isolated subprocess tests (skillpack-check, doctor-home-dir, init-migrate-only, …) and broke their child-process isolation. Per-test isolation touches no subprocess test. - idea_lineage: source-scope the takes gather (was unscoped while every other evidence bucket was scoped). - gbrain eval idea-lineage: resolve the source via the canonical resolveSourceId chain (--source / GBRAIN_SOURCE / .gbrain-source) instead of hardcoding default.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3 tasks
…Ids[] getBacklinks, getTimeline, searchTakes, and searchTakesVector now accept the federated sourceIds[] array (array path wins over scalar; neither = no filter), mirroring findTrajectory's source predicate. searchTakes/searchTakesVector gain real source_id isolation (previously holder-allow-list only — holder-scope is not a source boundary). getTimeline's 8-case branch collapses to one composed query. Both engines move in lockstep; engine-parity asserts cross-source EXCLUSION for each method.
Drop localOnly + the runtime ctx.remote reject; thread one validated sourceScopeOpts scope to all five gather channels. findTrajectory now threads remote=ctx.remote===true (world-only facts for remote — fixes a hardcoded remote:false private-fact leak). Contradictions (global, unscoped trend) are omitted for remote callers. p.source is validated against ctx.auth.allowedSources for remote callers (closes a cross-source IDOR). Phase-2 gather uses Promise.allSettled with a partial/errors flag; schema_version 1->2. HTTP/OAuth MCP only — not added to the subagent allow-list (deferred). Remote-safety unit tests + description updated; TODOS follow-ups filed.
…skill # Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json # test/helpers/no-anthropic-key.ts # test/think-pipeline.serial.test.ts
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds on community PR #1830 (the original
idea-lineagethinking skill by @davidNbreslauer) and takes it from a markdown workflow to a first-class, tested capability:get_backlinks,traverse_graphdepth-2,get_timeline), a concrete high/medium/low confidence rubric + degraded-evidence note, and a hardened routing boundary (adversarial fixtures separating it fromconcept-synthesisand trajectory queries; new low-collisionchanged my mind abouttrigger).idea_lineageop (src/core/operations.ts) — resolve a free-text idea to its best concept/page anchor, then gather dated evidence (matches, related concepts via backlinks + graph, timeline anchors, takes, optional entity trajectory, idea-scoped cached contradictions). Handler-orchestrated over existing engine primitives (no new engine method), resolve→gather two-phase,scope:'read'+localOnlywith an in-handlerctx.remotereject. CLI (gbrain idea-lineage <idea>) + MCP tool auto-generated from the contract. Returns candidate anchors +disambiguation_neededand adegradedflag.gbrain eval idea-lineage <idea>reporting evidence coverage (newlineage_evidence_coverageglossary metric), persisting to.gbrain-evals/(gitignored).The op is local-only by design: it composes read primitives whose source/visibility filtering isn't uniform yet, so v1 is scoped to local/trusted callers (see TODOS.md for the federated/remote follow-up). Classification (reversal vs abandoned branch) stays agent-side; the op returns evidence, not narrative, so it stays deterministic and engine-parity-testable.
Design review
Went through
/plan-eng-review+ a Codex outside-voice plan review (4 cross-model tensions decided: handler-orchestration over an engine method, keep the op thin, local-only v1, deterministic parity assertions) and a Codex adversarial diff review (takes source-scoping + eval source-resolution fixed; the GBRAIN_HOME-preload isolation was replaced with per-test isolation after it was found to leak into subprocess tests).Testing
idea_lineagecase) 14/0 against real Postgres + PGLite; skills-conformance/resolver/skillpack 371/0; routing-eval 116/0 (0 false positives);bun run verify30/30; typecheck clean.~/.gbrain/config.json(they previously passed in CI but failed locally on a configured brain). Verified the 6 affected files 85/0 under the real config.Supersedes
Supersedes #1830 (cross-repo fork PR — these improvements build on that work; original skill credited in the CHANGELOG).
🤖 Generated with Claude Code
Documentation
Docs synced for this release:
lineage_evidence_coveragemetric (CI freshness guard green).idea_lineagefollow-up (deferred from v1's local-only scope).idea_lineageop description (the canonical reference surfaces).Documentation debt (pre-existing, not from this PR)
CLAUDE.mdsays "GBrain ships 29 skills" but the manifest now holds 51 — long-standing drift from prior releases that didn't update the prose count. Out of scope here (touching CLAUDE.md triggers an llms-bundle regen); worth a standalone cleanup. Reference-quadrant fix.