Skip to content

[Spike] /learn — auto-surface framework drift from feedback memory + session transcripts #241

@atlas-apex

Description

@atlas-apex

Hypothesis

We believe recurring patterns in feedback memory + recent session transcripts can be mechanically surfaced into actionable framework changes (rule edits, hook tightenings, new skill drafts).

We will know we're right when the first dry-run report on real session history surfaces ≥3 patterns the operator confirms are real and worth acting on (vs being noise, one-offs, or things already known).

Budget

2 days of one engineer. At the cap, the spike ENDS — either the dry-run report is good enough to justify a follow-up [Feature] for the full --propose mode, or the report is mostly noise and we write a memo and move on.

Kill Criteria

  • Answered YES — dry-run report surfaces ≥3 confirmed-useful patterns from real session history; operator agrees these would be worth acting on. Promote to [Feature] for the full --propose loop (suppression ledger, threshold tuning, PR proposal flow).
  • Answered NO — report is dominated by noise, one-offs, or patterns already captured in MEMORY.md. Signal-to-noise too low to justify the full loop. Discard with a memo.
  • Out of budget — 2 days elapsed without a clear yes/no signal. Treat as NO; write the memo with what was learned about why the signal was unclear.

Disposition

PROMOTE on YES → file a fresh [Feature] for the full /learn skill (--propose mode opening /task tickets, suppression ledger at .claude/session/learn-rejected.json, threshold tuning, choice between framework-PR and adopter-config-only proposals).

DISCARD on NO → write docs/spike-memos/learn-skill-feasibility.md covering: what patterns the dry-run found, why the signal was insufficient, what would have to change about the inputs (memory density, transcript filtering, etc.) to make this viable.

Approach

Smallest test that answers the hypothesis:

  • Read all *.md files under ~/.claude/projects/-Users-ahmed-Projects-apexstack/memory/ — parse YAML frontmatter, extract Why: / How to apply: lines.
  • Walk the last ~30 days of session JSONL files under ~/.claude/projects/.../sessions/ — filter for hook blocks, user "no don't" / "stop" / corrections, missed ▸ Activating markers.
  • Bundle the inputs + the current .claude/rules/*.md and hand to an LLM with the prompt: "What patterns recur ≥3 times that suggest a rule edit, hook tightening, or new skill?"
  • Output: a markdown report (no writes, no PRs). Operator reads and judges.

Boundary to fail-fast: if the first run produces zero patterns or only obvious-already-known ones, kill immediately — no need to spend the second day.

Inspiration

Question raised in the live "FROM SENIOR DEV TO ONE-MAN ARMY" stream: "Do you have a pipeline where agents are constantly learning from mistakes — some sort of prompt tuning without filling the context with too many rules?"

The framework today has the input side (memory/ captures user corrections in plain markdown) and the output side (skill / rule / hook files), but nothing in the middle that propagates memory entries into structural updates. This spike tests whether the middle is buildable cheaply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    spikeThrow-away exploration; exempt from AgDR + coverage gates

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions