[Spike] /learn — auto-surface framework drift from feedback memory + session transcripts

## Hypothesis

We believe recurring patterns in feedback memory + recent session transcripts can be mechanically surfaced into actionable framework changes (rule edits, hook tightenings, new skill drafts).

We will know we're right when the first dry-run report on real session history surfaces ≥3 patterns the operator confirms are real and worth acting on (vs being noise, one-offs, or things already known).

## Budget

2 days of one engineer. At the cap, the spike ENDS — either the dry-run report is good enough to justify a follow-up `[Feature]` for the full `--propose` mode, or the report is mostly noise and we write a memo and move on.

## Kill Criteria

- **Answered YES** — dry-run report surfaces ≥3 confirmed-useful patterns from real session history; operator agrees these would be worth acting on. Promote to `[Feature]` for the full `--propose` loop (suppression ledger, threshold tuning, PR proposal flow).
- **Answered NO** — report is dominated by noise, one-offs, or patterns already captured in `MEMORY.md`. Signal-to-noise too low to justify the full loop. Discard with a memo.
- **Out of budget** — 2 days elapsed without a clear yes/no signal. Treat as NO; write the memo with what was learned about why the signal was unclear.

## Disposition

**PROMOTE on YES** → file a fresh `[Feature]` for the full `/learn` skill (`--propose` mode opening `/task` tickets, suppression ledger at `.claude/session/learn-rejected.json`, threshold tuning, choice between framework-PR and adopter-config-only proposals).

**DISCARD on NO** → write `docs/spike-memos/learn-skill-feasibility.md` covering: what patterns the dry-run found, why the signal was insufficient, what would have to change about the inputs (memory density, transcript filtering, etc.) to make this viable.

## Approach

Smallest test that answers the hypothesis:

- Read all `*.md` files under `~/.claude/projects/-Users-ahmed-Projects-apexstack/memory/` — parse YAML frontmatter, extract `Why:` / `How to apply:` lines.
- Walk the last ~30 days of session JSONL files under `~/.claude/projects/.../sessions/` — filter for hook blocks, user "no don't" / "stop" / corrections, missed `▸ Activating` markers.
- Bundle the inputs + the current `.claude/rules/*.md` and hand to an LLM with the prompt: *"What patterns recur ≥3 times that suggest a rule edit, hook tightening, or new skill?"*
- Output: a markdown report (no writes, no PRs). Operator reads and judges.

**Boundary to fail-fast**: if the first run produces zero patterns or only obvious-already-known ones, kill immediately — no need to spend the second day.

## Inspiration

Question raised in the live "FROM SENIOR DEV TO ONE-MAN ARMY" stream: *"Do you have a pipeline where agents are constantly learning from mistakes — some sort of prompt tuning without filling the context with too many rules?"*

The framework today has the input side (`memory/` captures user corrections in plain markdown) and the output side (skill / rule / hook files), but nothing in the middle that **propagates** memory entries into structural updates. This spike tests whether the middle is buildable cheaply.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spike] /learn — auto-surface framework drift from feedback memory + session transcripts #241

Hypothesis

Budget

Kill Criteria

Disposition

Approach

Inspiration

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Spike] /learn — auto-surface framework drift from feedback memory + session transcripts #241

Description

Hypothesis

Budget

Kill Criteria

Disposition

Approach

Inspiration

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions