feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) by garrytan · Pull Request #622 · garrytan/gstack

garrytan · 2026-03-29T06:48:09Z

Summary

Every session now makes the next one smarter. Per-project institutional knowledge that compounds across skills.

New: learnings persistence, /learn skill, confidence calibration, cross-project discovery, confidence decay, learnings count in preamble.

Infrastructure: 2 bin scripts, 2 resolver modules, 9 template integrations, 1 design doc.

Test Coverage

601 tests pass, 0 fail. 13 new unit tests for bin scripts.

Pre-Landing Review

CEO + Eng Review CLEARED. 2 Codex outside voice runs (4 findings accepted).

Test plan

bun test — 601 pass, 0 fail
Manual bin script verification
gen:skill-docs freshness check

🤖 Generated with Claude Code

…cture Three new resolvers for the self-learning system: - LEARNINGS_SEARCH: tells skills to load prior learnings before analysis - LEARNINGS_LOG: tells skills to capture discoveries after completing work - CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gstack-learnings-log: validates JSON, auto-injects timestamp, appends to ~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation). gstack-learnings-search: reads/filters/dedupes learnings with confidence decay (observed/inferred lose 1pt/30d), cross-project discovery, and "latest winner" resolution per key+type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Every skill now prints "LEARNINGS: N entries loaded" during preamble, making the compounding loop visible to the user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}} placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours, investigate, retro, and cso templates. Regenerated all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

New skill for reviewing, searching, pruning, and exporting what gstack has learned across sessions. Commands: /learn, /learn search, /learn prune, /learn export, /learn stats, /learn add. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony (v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound Engineering, adapted to GStack's architecture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tests gstack-learnings-log (valid/invalid JSON, timestamp injection, append-only) and gstack-learnings-search (dedup, type/query/limit filters, confidence decay, user-stated no-decay, malformed JSONL skip). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… free Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp preservation, special characters, files array, sort order, type grouping, combined filtering, missing fields, confidence floor at 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main landed v0.13.4.0 (Sidebar Defense) while this branch also used v0.13.4.0 (GStack Learns). Resolved by bumping this branch to v0.13.5.0 and keeping both entries in chronological order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-29T15:52:51Z

E2E Evals: ✅ PASS

61/61 tests passed | $5.71 total cost | 12 parallel runners

Suite	Result	Status	Cost
e2e-browse	7/7	✅	$0.33
e2e-deploy	6/6	✅	$0.97
e2e-design	3/3	✅	$0.48
e2e-plan	7/7	✅	$1.07
e2e-qa-workflow	3/3	✅	$0.85
e2e-review	6/6	✅	$1.05
e2e-workflow	4/4	✅	$0.46
llm-judge	25/25	✅	$0.5

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

Main landed v0.13.5.0 (Factory Droid Compatibility) while this branch already had v0.13.5.0 (GStack Learns). Bumped to v0.13.6.0 and kept both entries in chronological order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Same pattern as .claude/skills/ and .agents/. These SKILL.md files are generated from .tmpl templates by gen:skill-docs --host factory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Seeds N+1 query pattern, stale cache pitfall, and rubocop preference into learnings.jsonl, then runs /learn and checks that at least 2/3 appear in the agent's output. Gate tier, ~$0.25/run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Main landed v0.13.5.1 (gitignore .factory) while this branch had v0.13.6.0 (GStack Learns). Kept v0.13.6.0 with both CHANGELOG entries. Resolved learn/SKILL.md rename/delete conflict (new file on our branch, git confused it with a .factory/ rename). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…t skill Inspired by gstack's per-project learning system (garrytan/gstack#622). Adapts three key ideas to IronClaw's v2 engine: Retrieval engine improvements (crates/ironclaw_engine/src/memory/retrieval.rs): 1. Confidence decay — older docs score lower in retrieval via exponential decay (half-life ~2 years). Specs and Lessons have a floor of 0.3 so hard-won knowledge never fully fades. Notes and Summaries can decay to zero, preventing stale scratch notes from polluting context. 2. Title-based dedup — when multiple docs share a title (corrections superseding old learnings), only the most recently updated survives at retrieval time. Append-only write + read-time dedup, no write coordination needed. 3. Both features compose: score = (keyword_relevance + type_weight) × decay. New skill (skills/learn/SKILL.md): - "show learnings" — browse extracted skills, lessons, insights grouped by type - "learning stats" — counts, usage metrics, top performers - "prune learnings" — identify stale entries (unused skills, old intel) - "export learnings" — structured markdown export to context/learnings-export.md - "add a lesson" — manual learning entry - "learning quality" — assess which learnings are actually helping Tests: 4 new tests for recency_factor and dedup_by_title. All 203 engine tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

….4.0) (garrytan#622) * feat: learnings + confidence resolvers — cross-skill memory infrastructure Three new resolvers for the self-learning system: - LEARNINGS_SEARCH: tells skills to load prior learnings before analysis - LEARNINGS_LOG: tells skills to capture discoveries after completing work - CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: learnings bin scripts — append-only JSONL read/write gstack-learnings-log: validates JSON, auto-injects timestamp, appends to ~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation). gstack-learnings-search: reads/filters/dedupes learnings with confidence decay (observed/inferred lose 1pt/30d), cross-project discovery, and "latest winner" resolution per key+type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: learnings count in preamble output Every skill now prints "LEARNINGS: N entries loaded" during preamble, making the compounding loop visible to the user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate learnings + confidence into 9 skill templates Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}} placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours, investigate, retro, and cso templates. Regenerated all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /learn skill — manage project learnings New skill for reviewing, searching, pruning, and exporting what gstack has learned across sessions. Commands: /learn, /learn search, /learn prune, /learn export, /learn stats, /learn add. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: self-learning roadmap — 5-release design doc Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony (v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound Engineering, adapted to GStack's architecture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: learnings bin script unit tests — 13 tests, free Tests gstack-learnings-log (valid/invalid JSON, timestamp injection, append-only) and gstack-learnings-search (dedup, type/query/limit filters, confidence decay, user-stated no-decay, malformed JSONL skip). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.4.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: learnings resolver + bin script edge case tests — 21 new tests, free Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp preservation, special characters, files array, sort order, type grouping, combined filtering, missing fields, confidence floor at 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sync package.json version with VERSION file (0.13.4.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: gitignore .factory/ — generated output, not source Same pattern as .claude/skills/ and .agents/. These SKILL.md files are generated from .tmpl templates by gen:skill-docs --host factory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: /learn E2E — seed 3 learnings, verify agent surfaces them Seeds N+1 query pattern, stale cache pitfall, and rubocop preference into learnings.jsonl, then runs /learn and checks that at least 2/3 appear in the agent's output. Gate tier, ~$0.25/run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan and others added 11 commits March 28, 2026 22:52

feat: learnings count in preamble output

ef487a9

Every skill now prints "LEARNINGS: N entries loaded" during preamble, making the compounding loop visible to the user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: bump version and changelog (v0.13.4.0)

a910e9d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: sync package.json version with VERSION file (0.13.4.0)

1e3f5c4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan and others added 4 commits March 29, 2026 13:19

chore: gitignore .factory/ — generated output, not source

04d4baf

Same pattern as .claude/skills/ and .agents/. These SKILL.md files are generated from .tmpl templates by gen:skill-docs --host factory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

garrytan merged commit ae0a9ad into main Mar 29, 2026
18 checks passed

ilblackdragon mentioned this pull request Mar 30, 2026

feat(engine): learning system improvements — confidence decay, dedup, /learn skill nearai/ironclaw#1751

Draft

5 tasks

This was referenced May 27, 2026

gstack-learnings-search cross-project trust gate fails open for rows missing the trusted field #1745

Closed

fix(learnings): fail closed when cross-project learning lacks trusted field #1746

Closed

Pablosinyores mentioned this pull request May 27, 2026

fix(security): cross-project trust gate must fail closed in gstack-learnings-search (#1745) #1749

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0)#622

feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0)#622
garrytan merged 15 commits into
mainfrom
garrytan/ce-features

garrytan commented Mar 29, 2026

Uh oh!

github-actions Bot commented Mar 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented Mar 29, 2026

Summary

Test Coverage

Pre-Landing Review

Test plan

Uh oh!

github-actions Bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Evals: ✅ PASS

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Mar 29, 2026 •

edited

Loading