Hypothesis
We believe recurring patterns in feedback memory + recent session transcripts can be mechanically surfaced into actionable framework changes (rule edits, hook tightenings, new skill drafts).
We will know we're right when the first dry-run report on real session history surfaces ≥3 patterns the operator confirms are real and worth acting on (vs being noise, one-offs, or things already known).
Budget
2 days of one engineer. At the cap, the spike ENDS — either the dry-run report is good enough to justify a follow-up [Feature] for the full --propose mode, or the report is mostly noise and we write a memo and move on.
Kill Criteria
- Answered YES — dry-run report surfaces ≥3 confirmed-useful patterns from real session history; operator agrees these would be worth acting on. Promote to
[Feature] for the full --propose loop (suppression ledger, threshold tuning, PR proposal flow).
- Answered NO — report is dominated by noise, one-offs, or patterns already captured in
MEMORY.md. Signal-to-noise too low to justify the full loop. Discard with a memo.
- Out of budget — 2 days elapsed without a clear yes/no signal. Treat as NO; write the memo with what was learned about why the signal was unclear.
Disposition
PROMOTE on YES → file a fresh [Feature] for the full /learn skill (--propose mode opening /task tickets, suppression ledger at .claude/session/learn-rejected.json, threshold tuning, choice between framework-PR and adopter-config-only proposals).
DISCARD on NO → write docs/spike-memos/learn-skill-feasibility.md covering: what patterns the dry-run found, why the signal was insufficient, what would have to change about the inputs (memory density, transcript filtering, etc.) to make this viable.
Approach
Smallest test that answers the hypothesis:
- Read all
*.md files under ~/.claude/projects/-Users-ahmed-Projects-apexstack/memory/ — parse YAML frontmatter, extract Why: / How to apply: lines.
- Walk the last ~30 days of session JSONL files under
~/.claude/projects/.../sessions/ — filter for hook blocks, user "no don't" / "stop" / corrections, missed ▸ Activating markers.
- Bundle the inputs + the current
.claude/rules/*.md and hand to an LLM with the prompt: "What patterns recur ≥3 times that suggest a rule edit, hook tightening, or new skill?"
- Output: a markdown report (no writes, no PRs). Operator reads and judges.
Boundary to fail-fast: if the first run produces zero patterns or only obvious-already-known ones, kill immediately — no need to spend the second day.
Inspiration
Question raised in the live "FROM SENIOR DEV TO ONE-MAN ARMY" stream: "Do you have a pipeline where agents are constantly learning from mistakes — some sort of prompt tuning without filling the context with too many rules?"
The framework today has the input side (memory/ captures user corrections in plain markdown) and the output side (skill / rule / hook files), but nothing in the middle that propagates memory entries into structural updates. This spike tests whether the middle is buildable cheaply.
Hypothesis
We believe recurring patterns in feedback memory + recent session transcripts can be mechanically surfaced into actionable framework changes (rule edits, hook tightenings, new skill drafts).
We will know we're right when the first dry-run report on real session history surfaces ≥3 patterns the operator confirms are real and worth acting on (vs being noise, one-offs, or things already known).
Budget
2 days of one engineer. At the cap, the spike ENDS — either the dry-run report is good enough to justify a follow-up
[Feature]for the full--proposemode, or the report is mostly noise and we write a memo and move on.Kill Criteria
[Feature]for the full--proposeloop (suppression ledger, threshold tuning, PR proposal flow).MEMORY.md. Signal-to-noise too low to justify the full loop. Discard with a memo.Disposition
PROMOTE on YES → file a fresh
[Feature]for the full/learnskill (--proposemode opening/tasktickets, suppression ledger at.claude/session/learn-rejected.json, threshold tuning, choice between framework-PR and adopter-config-only proposals).DISCARD on NO → write
docs/spike-memos/learn-skill-feasibility.mdcovering: what patterns the dry-run found, why the signal was insufficient, what would have to change about the inputs (memory density, transcript filtering, etc.) to make this viable.Approach
Smallest test that answers the hypothesis:
*.mdfiles under~/.claude/projects/-Users-ahmed-Projects-apexstack/memory/— parse YAML frontmatter, extractWhy:/How to apply:lines.~/.claude/projects/.../sessions/— filter for hook blocks, user "no don't" / "stop" / corrections, missed▸ Activatingmarkers..claude/rules/*.mdand hand to an LLM with the prompt: "What patterns recur ≥3 times that suggest a rule edit, hook tightening, or new skill?"Boundary to fail-fast: if the first run produces zero patterns or only obvious-already-known ones, kill immediately — no need to spend the second day.
Inspiration
Question raised in the live "FROM SENIOR DEV TO ONE-MAN ARMY" stream: "Do you have a pipeline where agents are constantly learning from mistakes — some sort of prompt tuning without filling the context with too many rules?"
The framework today has the input side (
memory/captures user corrections in plain markdown) and the output side (skill / rule / hook files), but nothing in the middle that propagates memory entries into structural updates. This spike tests whether the middle is buildable cheaply.