Envelope D2: provenance classifier (Qwen3 substrate) + Pass-3 regex by jpwinans · Pull Request #5 · jpwinans/mempalace

jpwinans · 2026-05-11T22:05:23Z

Summary

Envelope D2 from 2026-05-11 paired build (Task MemPalace#15, Phase 1). Wires the production local-substrate classifier (Qwen3-Coder-30B at mlx_lm.server :8802) for provenance candidate validation. D1 (PR #4) shipped the heuristic + stub; D2 ships the production classifier + calibration proof + Pass-3 regex extension for cases the heuristic was missing.

Changes

Package conversion

mempalace/provenance.py → mempalace/provenance/__init__.py (via git mv, history preserved). Public import path unchanged — mempalace.provenance.extract_candidates / validate_candidate / ProvenanceCandidate / ProvenanceRecord / WING_LINEAGE_SCHEMA_DOC all still resolve.

`mempalace.provenance.classifier` (new module)

qwen3_classifier(context) -> dict — production classifier. POSTs to /v1/chat/completions with a strict JSON-output prompt; parses response; handles markdown-fence stripping; returns the dict shape validate_candidate expects.
Failure-soft: network / HTTP / decode / shape errors return the rejection dict {is_provenance: False, confidence: 0.0} rather than raising. Mining pipeline never crashes when substrate is unavailable.
Env overrides: MEMPALACE_PROVENANCE_CLASSIFIER_URL, _MODEL, _TIMEOUT.
temperature=0 for reproducibility across mining batches.

Pass-3 in `extract_candidates`

Capitalized bare-relation as subject + attribution + quote. Catches calibration fixture MemPalace#14 ("Dad always told me 'never trust a smiling investor'") that Pass-1 (requires possessive prefix) and Pass-2 (also requires possessive) miss. Capitalize-only constraint keeps false-positives manageable; classifier filters lowercase mid-sentence cases via context.

Calibration result

14 hand-labeled fixtures from architect envelope. Target: precision ≥ 0.85, recall ≥ 0.85.

Metric	Value
Precision	1.000
Recall	1.000
TP / FP / TN / FN	8 / 0 / 6 / 0
Confidence range (positives)	0.90 – 0.95
Confidence range (negatives)	0.00
Avg latency / call	~0.9 s

Clean separation between positives and negatives — no threshold ambiguity. Recalibrate if either metric drifts below 0.85 on a future model swap.

Tests (49 total, 49 pass in 17s)

test_provenance.py — 26 existing + 4 new Pass-3 (dad-always-told, mom-used-to-say, roshi-taught, capitalization-required-negative).
test_classifier.py — 18 unit tests with mocked urllib.request.urlopen. Happy path, code-fence stripping (both ```json and bare ```), request-shape (model id, temp=0, max_tokens), 6 failure-soft paths (URLError, HTTPError, TimeoutError, malformed outer JSON, missing choices, malformed inner JSON, missing is_provenance key, non-dict inner), confidence coercion (string→0, >1 clamp, <0 clamp), env-var overrides.
test_classifier_calibration.py — 1 live-substrate test against the pinned 14-fixture set. Auto-skipped when substrate unreachable (HEAD probe on /v1/models). Asserts precision and recall ≥ 0.85.

Scope honored

validate_candidate(classifier=None) default still uses the D1 stub — test path unchanged for downstream consumers.
Production paths explicitly pass qwen3_classifier. The D3 envelope wires this into mempalace.miner.convo_miner.
No changes outside mempalace/provenance/ + tests/. No new dependencies (stdlib urllib.request).

Discipline

Branch base: jpwinans/mempalace main (e9085e3, post-PR Envelope D1: provenance module (heuristic + classifier interface) #4 merge).
Fresh worktree ~/mempalace-worktrees/classifier.
PR targets jpwinans/mempalace (used -R flag — calibrated from yesterday's misfire on PR layers: add read_diary public API MemPalace/mempalace#1464).
Cross-coder LGTM requested via hearing channel.

ENVELOPE D2 from 2026-05-11 paired build (Task MemPalace#15, Phase 1). Wires the real local-substrate classifier (Qwen3-Coder-30B at mlx_lm.server :8802) for provenance candidate validation. D1 shipped the heuristic + stub; D2 ships the production classifier + calibration proof. Changes: - Convert mempalace/provenance.py to mempalace/provenance/ package via git mv to __init__.py. Public import path unchanged (mempalace.provenance.extract_candidates / validate_candidate / ProvenanceCandidate / ProvenanceRecord still resolve). - Add mempalace.provenance.classifier module with: - qwen3_classifier(context) -> dict matching the validate_candidate classifier interface - urllib-based POST to /v1/chat/completions (mirrors closet_llm pattern in this repo) - Strict JSON prompt template with rules: vague "she said" rejected, operational content rejected, person-mention without attribution rejected, conservative when in doubt - Failure-soft: network/HTTP/decode/shape errors return the rejection dict {is_provenance: False, confidence: 0.0} rather than raising — mining pipeline never crashes on unavailable substrate - Env overrides: MEMPALACE_PROVENANCE_CLASSIFIER_URL, _MODEL, _TIMEOUT - temperature=0 for reproducibility across mining batches - Markdown code-fence stripping for ```json...``` wrapped outputs the model occasionally emits - Add Pass-3 to extract_candidates: capitalized bare-relation as subject + attribution + quote. Catches calibration fixture MemPalace#14 ("Dad always told me 'never trust a smiling investor'") which Pass-1 (requires possessive prefix) and Pass-2 (also requires possessive) miss. Capitalize-only constraint keeps false- positives manageable; classifier filters lowercase mid-sentence cases via context. Calibration (live mlx_lm.server, 2026-05-11): 14 hand-labeled fixtures from architect envelope. Target: precision >= 0.85, recall >= 0.85. Result: precision = 1.000, recall = 1.000. TP=8 FP=0 TN=6 FN=0. Confidence range: 0.90-0.95 on positives, 0.00 on negatives — clean separation, no calibration ambiguity. Average latency: 0.9s per call. Tests (49 total, 49 pass in 17s): - tests/test_provenance.py: 26 existing + 4 new Pass-3 tests (dad-always-told-me variant, mom-used-to-say, roshi-taught-me, capitalization-required-negative). - tests/test_classifier.py: 18 unit tests covering happy-path, code-fence stripping (both ```json``` and bare ```), request-shape validation (model id, temp=0, max_tokens), all six failure-soft paths (URLError, HTTPError, TimeoutError, malformed outer JSON, missing choices, malformed inner JSON, missing is_provenance key, non-dict inner), confidence coercion (string->0, >1 clamped, <0 clamped), env-var overrides for endpoint + model. - tests/test_classifier_calibration.py: 1 live-substrate test against the pinned 14-fixture set. Auto-skipped when substrate unreachable (HEAD probe on /v1/models). Asserts precision >= 0.85 and recall >= 0.85. Scope honored: validate_candidate's classifier=None default still uses the stub (test path unchanged); production callers explicitly pass qwen3_classifier. D3 mining integration is the next envelope.

…iner (#6) ENVELOPE D3 from 2026-05-11 paired build (Task MemPalace#15, Phase 1 final). Wires extract_candidates + qwen3_classifier into mempalace.convo_miner so new diary mining produces wing_lineage drawers in addition to the operational wing. Phase 1 of Task MemPalace#15 closes with this PR — Phase 2 (60k existing-drawer backfill) is its own scoping task. Changes: - New mempalace/provenance/mining.py with mine_chunk_for_provenance: take a chunk, run extract_candidates -> validate with classifier (default: qwen3_classifier from D2) -> rewrite transitive attributions -> dedupe -> upsert into wing_lineage. - Transitive-attribution rewrite (architect-flagged from D2 calibration case MemPalace#11): when classifier returns speaker name (e.g., "James") for text containing "<possessive> <relation>'s" (e.g., "his father's saying"), redirect to room=<relation> (e.g., "father"). Without rewrite, "Tonight James reminded me: 'measure twice' — his father's saying" files under room='james' and a future search for "father saying" misses it. - Dedup by (person, quote, source_file) hash baked into the drawer_id. Re-mining same source -> existing drawer; same attribution in different source files -> distinct drawers (intentional — distinct attribution events tracked separately). - MEMPALACE_PROVENANCE_DISABLED env var (truthy: 1/true/yes, case-insensitive) makes mine_chunk_for_provenance a no-op. For environments where the classifier substrate is unavailable, CI, fresh checkouts, or backfill jobs that handle their own pass. - convo_miner._file_chunks_locked: after the operational upsert inside the per-chunk loop, call mine_chunk_for_provenance. Run AFTER operational durability is established so a slow classifier call doesn't delay the canonical write. Failure-soft at three layers: the inner call is itself failure-soft, the convo_miner wrapper catches anything that escapes, operational mining proceeds regardless. - DEFAULT_CONFIDENCE_THRESHOLD = 0.7 per design doc §D1. D2 calibration showed positives at 0.90-0.95 and negatives at 0.00 — 0.7 sits cleanly in the gap. Tunable via kwarg. Schema (per Provenance-Preservation-Design §D3): Drawer content rendered as YAML-ish PROVENANCE: block with Person / Relation / Quote / Context / Source lines. Metadata includes wing=wing_lineage, room=<person_slug>, person, relation_type, is_quote, confidence, extracted_by, source_file, source_session, filed_at, filed_at_ts. Tests (14 new in test_provenance_mining.py; 62 total mempalace provenance tests): - Happy path: chunk + accepting classifier -> 1 wing_lineage drawer with correct meta + design-doc content shape. - Threshold: below-default-threshold rejected; custom threshold lets lower-confidence through. - Dedup: same chunk+source twice -> 1 drawer; different sources -> distinct drawers. - Disabled mode: MEMPALACE_PROVENANCE_DISABLED with 1/true/yes variants all yield 0 drawers. - No-candidates returns 0; operational mining unaffected. - Failure-soft: classifier raising -> 0 drawers, no crash. - Transitive-attribution rewrite (case MemPalace#11): classifier surfaces speaker name, _rewrite_speaker_to_source redirects to relation when "<possessive> <relation>'s" appears in candidate or context. - Unit tests on _rewrite_speaker_to_source directly (positive, negative, None-input cases). - End-to-end convo_miner integration: _file_chunks_locked with a chunk produces BOTH operational drawer (wing=wing_test) AND wing_lineage drawer (wing=wing_lineage). 62/62 pass in <100ms (no live substrate required — tests inject mock classifiers). Phase 1 status after this merges: - D1 (PR #4): heuristic + classifier interface — MERGED - D2 (PR #5): qwen3_classifier + Pass-3 + calibration — MERGED - D3 (this PR): mining integration — pending After merge: forward-only provenance preservation is operational. No new diary mining loses biographical/relational lineage. Phase 2 (60k existing-drawer backfill) is a separate scoped task.

jpwinans merged commit 9349760 into main May 11, 2026
0 of 6 checks passed

jpwinans mentioned this pull request May 11, 2026

Envelope D3: provenance mining integration — wing_lineage from convo_miner #6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Envelope D2: provenance classifier (Qwen3 substrate) + Pass-3 regex#5

Envelope D2: provenance classifier (Qwen3 substrate) + Pass-3 regex#5
jpwinans merged 1 commit into
mainfrom
feat/mempalace-provenance-classifier

jpwinans commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jpwinans commented May 11, 2026

Summary

Changes

Package conversion

mempalace.provenance.classifier (new module)

Pass-3 in extract_candidates

Calibration result

Tests (49 total, 49 pass in 17s)

Scope honored

Discipline

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`mempalace.provenance.classifier` (new module)

Pass-3 in `extract_candidates`