feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0)#622
Merged
Conversation
…cture Three new resolvers for the self-learning system: - LEARNINGS_SEARCH: tells skills to load prior learnings before analysis - LEARNINGS_LOG: tells skills to capture discoveries after completing work - CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gstack-learnings-log: validates JSON, auto-injects timestamp, appends to ~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation). gstack-learnings-search: reads/filters/dedupes learnings with confidence decay (observed/inferred lose 1pt/30d), cross-project discovery, and "latest winner" resolution per key+type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every skill now prints "LEARNINGS: N entries loaded" during preamble, making the compounding loop visible to the user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}}
placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours,
investigate, retro, and cso templates. Regenerated all SKILL.md files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New skill for reviewing, searching, pruning, and exporting what gstack has learned across sessions. Commands: /learn, /learn search, /learn prune, /learn export, /learn stats, /learn add. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony (v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound Engineering, adapted to GStack's architecture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests gstack-learnings-log (valid/invalid JSON, timestamp injection, append-only) and gstack-learnings-search (dedup, type/query/limit filters, confidence decay, user-stated no-decay, malformed JSONL skip). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… free Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp preservation, special characters, files array, sort order, type grouping, combined filtering, missing fields, confidence floor at 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Main landed v0.13.4.0 (Sidebar Defense) while this branch also used v0.13.4.0 (GStack Learns). Resolved by bumping this branch to v0.13.5.0 and keeping both entries in chronological order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
E2E Evals: ✅ PASS61/61 tests passed | $5.71 total cost | 12 parallel runners
12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite |
Main landed v0.13.5.0 (Factory Droid Compatibility) while this branch already had v0.13.5.0 (GStack Learns). Bumped to v0.13.6.0 and kept both entries in chronological order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same pattern as .claude/skills/ and .agents/. These SKILL.md files are generated from .tmpl templates by gen:skill-docs --host factory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Seeds N+1 query pattern, stale cache pitfall, and rubocop preference into learnings.jsonl, then runs /learn and checks that at least 2/3 appear in the agent's output. Gate tier, ~$0.25/run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Main landed v0.13.5.1 (gitignore .factory) while this branch had v0.13.6.0 (GStack Learns). Kept v0.13.6.0 with both CHANGELOG entries. Resolved learn/SKILL.md rename/delete conflict (new file on our branch, git confused it with a .factory/ rename). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ilblackdragon
added a commit
to nearai/ironclaw
that referenced
this pull request
Mar 30, 2026
…t skill Inspired by gstack's per-project learning system (garrytan/gstack#622). Adapts three key ideas to IronClaw's v2 engine: Retrieval engine improvements (crates/ironclaw_engine/src/memory/retrieval.rs): 1. Confidence decay — older docs score lower in retrieval via exponential decay (half-life ~2 years). Specs and Lessons have a floor of 0.3 so hard-won knowledge never fully fades. Notes and Summaries can decay to zero, preventing stale scratch notes from polluting context. 2. Title-based dedup — when multiple docs share a title (corrections superseding old learnings), only the most recently updated survives at retrieval time. Append-only write + read-time dedup, no write coordination needed. 3. Both features compose: score = (keyword_relevance + type_weight) × decay. New skill (skills/learn/SKILL.md): - "show learnings" — browse extracted skills, lessons, insights grouped by type - "learning stats" — counts, usage metrics, top performers - "prune learnings" — identify stale entries (unused skills, old intel) - "export learnings" — structured markdown export to context/learnings-export.md - "add a lesson" — manual learning entry - "learning quality" — assess which learnings are actually helping Tests: 4 new tests for recency_factor and dedup_by_title. All 203 engine tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
This was referenced May 27, 2026
gstack-learnings-search cross-project trust gate fails open for rows missing the
trusted field
#1745
Closed
Closed
3 tasks
mathiasmora2232
pushed a commit
to mathiasmora2232/gstack
that referenced
this pull request
May 30, 2026
….4.0) (garrytan#622) * feat: learnings + confidence resolvers — cross-skill memory infrastructure Three new resolvers for the self-learning system: - LEARNINGS_SEARCH: tells skills to load prior learnings before analysis - LEARNINGS_LOG: tells skills to capture discoveries after completing work - CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: learnings bin scripts — append-only JSONL read/write gstack-learnings-log: validates JSON, auto-injects timestamp, appends to ~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation). gstack-learnings-search: reads/filters/dedupes learnings with confidence decay (observed/inferred lose 1pt/30d), cross-project discovery, and "latest winner" resolution per key+type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: learnings count in preamble output Every skill now prints "LEARNINGS: N entries loaded" during preamble, making the compounding loop visible to the user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate learnings + confidence into 9 skill templates Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}} placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours, investigate, retro, and cso templates. Regenerated all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /learn skill — manage project learnings New skill for reviewing, searching, pruning, and exporting what gstack has learned across sessions. Commands: /learn, /learn search, /learn prune, /learn export, /learn stats, /learn add. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: self-learning roadmap — 5-release design doc Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony (v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound Engineering, adapted to GStack's architecture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: learnings bin script unit tests — 13 tests, free Tests gstack-learnings-log (valid/invalid JSON, timestamp injection, append-only) and gstack-learnings-search (dedup, type/query/limit filters, confidence decay, user-stated no-decay, malformed JSONL skip). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.4.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: learnings resolver + bin script edge case tests — 21 new tests, free Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp preservation, special characters, files array, sort order, type grouping, combined filtering, missing fields, confidence floor at 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sync package.json version with VERSION file (0.13.4.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: gitignore .factory/ — generated output, not source Same pattern as .claude/skills/ and .agents/. These SKILL.md files are generated from .tmpl templates by gen:skill-docs --host factory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: /learn E2E — seed 3 learnings, verify agent surfaces them Seeds N+1 query pattern, stale cache pitfall, and rubocop preference into learnings.jsonl, then runs /learn and checks that at least 2/3 appear in the agent's output. Gate tier, ~$0.25/run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Every session now makes the next one smarter. Per-project institutional knowledge that compounds across skills.
New: learnings persistence, /learn skill, confidence calibration, cross-project discovery, confidence decay, learnings count in preamble.
Infrastructure: 2 bin scripts, 2 resolver modules, 9 template integrations, 1 design doc.
Test Coverage
601 tests pass, 0 fail. 13 new unit tests for bin scripts.
Pre-Landing Review
CEO + Eng Review CLEARED. 2 Codex outside voice runs (4 findings accepted).
Test plan
🤖 Generated with Claude Code