Skip to content

feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0)#622

Merged
garrytan merged 15 commits into
mainfrom
garrytan/ce-features
Mar 29, 2026
Merged

feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0)#622
garrytan merged 15 commits into
mainfrom
garrytan/ce-features

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

Every session now makes the next one smarter. Per-project institutional knowledge that compounds across skills.

New: learnings persistence, /learn skill, confidence calibration, cross-project discovery, confidence decay, learnings count in preamble.

Infrastructure: 2 bin scripts, 2 resolver modules, 9 template integrations, 1 design doc.

Test Coverage

601 tests pass, 0 fail. 13 new unit tests for bin scripts.

Pre-Landing Review

CEO + Eng Review CLEARED. 2 Codex outside voice runs (4 findings accepted).

Test plan

  • bun test — 601 pass, 0 fail
  • Manual bin script verification
  • gen:skill-docs freshness check

🤖 Generated with Claude Code

garrytan and others added 11 commits March 28, 2026 22:52
…cture

Three new resolvers for the self-learning system:
- LEARNINGS_SEARCH: tells skills to load prior learnings before analysis
- LEARNINGS_LOG: tells skills to capture discoveries after completing work
- CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gstack-learnings-log: validates JSON, auto-injects timestamp, appends to
~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation).

gstack-learnings-search: reads/filters/dedupes learnings with confidence
decay (observed/inferred lose 1pt/30d), cross-project discovery, and
"latest winner" resolution per key+type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every skill now prints "LEARNINGS: N entries loaded" during preamble,
making the compounding loop visible to the user.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}}
placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours,
investigate, retro, and cso templates. Regenerated all SKILL.md files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New skill for reviewing, searching, pruning, and exporting what gstack
has learned across sessions. Commands: /learn, /learn search, /learn prune,
/learn export, /learn stats, /learn add.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony
(v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound
Engineering, adapted to GStack's architecture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests gstack-learnings-log (valid/invalid JSON, timestamp injection,
append-only) and gstack-learnings-search (dedup, type/query/limit filters,
confidence decay, user-stated no-decay, malformed JSONL skip).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… free

Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and
CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp
preservation, special characters, files array, sort order, type grouping,
combined filtering, missing fields, confidence floor at 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Main landed v0.13.4.0 (Sidebar Defense) while this branch also used
v0.13.4.0 (GStack Learns). Resolved by bumping this branch to v0.13.5.0
and keeping both entries in chronological order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Mar 29, 2026

Copy link
Copy Markdown

E2E Evals: ✅ PASS

61/61 tests passed | $5.71 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 7/7 $0.33
e2e-deploy 6/6 $0.97
e2e-design 3/3 $0.48
e2e-plan 7/7 $1.07
e2e-qa-workflow 3/3 $0.85
e2e-review 6/6 $1.05
e2e-workflow 4/4 $0.46
llm-judge 25/25 $0.5

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

garrytan and others added 4 commits March 29, 2026 13:19
Main landed v0.13.5.0 (Factory Droid Compatibility) while this branch
already had v0.13.5.0 (GStack Learns). Bumped to v0.13.6.0 and kept
both entries in chronological order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same pattern as .claude/skills/ and .agents/. These SKILL.md files are
generated from .tmpl templates by gen:skill-docs --host factory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Seeds N+1 query pattern, stale cache pitfall, and rubocop preference
into learnings.jsonl, then runs /learn and checks that at least 2/3
appear in the agent's output. Gate tier, ~$0.25/run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Main landed v0.13.5.1 (gitignore .factory) while this branch had
v0.13.6.0 (GStack Learns). Kept v0.13.6.0 with both CHANGELOG entries.
Resolved learn/SKILL.md rename/delete conflict (new file on our branch,
git confused it with a .factory/ rename).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit ae0a9ad into main Mar 29, 2026
18 checks passed
ilblackdragon added a commit to nearai/ironclaw that referenced this pull request Mar 30, 2026
…t skill

Inspired by gstack's per-project learning system (garrytan/gstack#622).
Adapts three key ideas to IronClaw's v2 engine:

Retrieval engine improvements (crates/ironclaw_engine/src/memory/retrieval.rs):

1. Confidence decay — older docs score lower in retrieval via exponential
   decay (half-life ~2 years). Specs and Lessons have a floor of 0.3 so
   hard-won knowledge never fully fades. Notes and Summaries can decay
   to zero, preventing stale scratch notes from polluting context.

2. Title-based dedup — when multiple docs share a title (corrections
   superseding old learnings), only the most recently updated survives
   at retrieval time. Append-only write + read-time dedup, no write
   coordination needed.

3. Both features compose: score = (keyword_relevance + type_weight) × decay.

New skill (skills/learn/SKILL.md):
- "show learnings" — browse extracted skills, lessons, insights grouped by type
- "learning stats" — counts, usage metrics, top performers
- "prune learnings" — identify stale entries (unused skills, old intel)
- "export learnings" — structured markdown export to context/learnings-export.md
- "add a lesson" — manual learning entry
- "learning quality" — assess which learnings are actually helping

Tests: 4 new tests for recency_factor and dedup_by_title. All 203 engine
tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mathiasmora2232 pushed a commit to mathiasmora2232/gstack that referenced this pull request May 30, 2026
….4.0) (garrytan#622)

* feat: learnings + confidence resolvers — cross-skill memory infrastructure

Three new resolvers for the self-learning system:
- LEARNINGS_SEARCH: tells skills to load prior learnings before analysis
- LEARNINGS_LOG: tells skills to capture discoveries after completing work
- CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: learnings bin scripts — append-only JSONL read/write

gstack-learnings-log: validates JSON, auto-injects timestamp, appends to
~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation).

gstack-learnings-search: reads/filters/dedupes learnings with confidence
decay (observed/inferred lose 1pt/30d), cross-project discovery, and
"latest winner" resolution per key+type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: learnings count in preamble output

Every skill now prints "LEARNINGS: N entries loaded" during preamble,
making the compounding loop visible to the user.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate learnings + confidence into 9 skill templates

Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}}
placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours,
investigate, retro, and cso templates. Regenerated all SKILL.md files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /learn skill — manage project learnings

New skill for reviewing, searching, pruning, and exporting what gstack
has learned across sessions. Commands: /learn, /learn search, /learn prune,
/learn export, /learn stats, /learn add.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: self-learning roadmap — 5-release design doc

Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony
(v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound
Engineering, adapted to GStack's architecture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: learnings bin script unit tests — 13 tests, free

Tests gstack-learnings-log (valid/invalid JSON, timestamp injection,
append-only) and gstack-learnings-search (dedup, type/query/limit filters,
confidence decay, user-stated no-decay, malformed JSONL skip).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.4.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: learnings resolver + bin script edge case tests — 21 new tests, free

Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and
CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp
preservation, special characters, files array, sort order, type grouping,
combined filtering, missing fields, confidence floor at 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sync package.json version with VERSION file (0.13.4.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: gitignore .factory/ — generated output, not source

Same pattern as .claude/skills/ and .agents/. These SKILL.md files are
generated from .tmpl templates by gen:skill-docs --host factory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: /learn E2E — seed 3 learnings, verify agent surfaces them

Seeds N+1 query pattern, stale cache pitfall, and rubocop preference
into learnings.jsonl, then runs /learn and checks that at least 2/3
appear in the agent's output. Gate tier, ~$0.25/run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant