Skip to content

fix: sync versions, constants, type hints and standardize CLI errors#15

Closed
addyosmani wants to merge 5 commits into
MemPalace:mainfrom
addyosmani:improvements
Closed

fix: sync versions, constants, type hints and standardize CLI errors#15
addyosmani wants to merge 5 commits into
MemPalace:mainfrom
addyosmani:improvements

Conversation

@addyosmani

@addyosmani addyosmani commented Apr 7, 2026

Copy link
Copy Markdown

What does this PR do?

Went through the codebase and knocked out a handful of things that could be improved. Nothing changes behavior.

Version was out of sync - pyproject.toml moved to 3.0.0 a while back but __init__.py and the MCP server still said 2.0.0. Fixed by having the MCP server import __version__ from the package instead of hardcoding it. One source of truth now.

Magic numbers everywhere - Chunk sizes, snippet limits, search defaults, graph traversal caps, etc. were all scattered as raw literals across half a dozen files. Pulled them into constants.py so there's one place to tweak them. The actual values haven't changed.

Missing type hints - Public functions had parameter types but no return types, which made it harder to follow call chains. Added -> dict, -> list[dict], -> None, etc. to all public functions across the core modules.

Unbounded ChromaDB queries - Several places were doing col.get(include=["metadatas"]) with no limit, which pulls everything into memory. Fine for small palaces, not great once you have tens of thousands of drawers. These now paginate in batches of 1000. miner.status() had a hardcoded limit=10000 which has been replaced with the same approach.

CLI error handling was inconsistent - Some errors went to stdout, some to stderr, some had exit codes, some just returned silently. Now all errors go to stderr through a shared helper, and the command dispatch is wrapped so unhandled exceptions produce a clean message instead of a traceback.

How to test

Checklist

  • Tests pass (python -m pytest tests/ -v)
  • No hardcoded paths
  • Linter passes (ruff check .)

__init__.py and mcp_server.py were still reporting 2.0.0 while
pyproject.toml had already moved to 3.0.0. The MCP server now
imports __version__ from the package so there is a single source
of truth.
Scattered hardcoded values (chunk sizes, retrieval limits, snippet
truncation lengths, graph traversal caps) are now defined once in
constants.py and imported by miner, convo_miner, layers,
mcp_server, and palace_graph.
Annotates all public functions in miner, convo_miner, searcher,
palace_graph, mcp_server, layers, and config with explicit return
types (dict, list[dict], None, bool, Path, etc.).
mcp_server tool_status/list_wings/list_rooms/get_taxonomy were
calling col.get() without limits, loading every drawer's metadata
into memory at once. miner.status() had a hardcoded limit=10000.
Layer1.generate() fetched all documents unbounded.

All now paginate in batches of GRAPH_BATCH_SIZE (1000) using
offset-based iteration.
All error messages now go to stderr via a shared _error() helper
or print(file=sys.stderr). The command dispatch is wrapped in
try/except so unhandled errors produce a clean message and exit 1
instead of a traceback. KeyboardInterrupt exits with 130.
@addyosmani addyosmani changed the title Improvements fix: sync versions, constants, type hints and standardize CLI errors Apr 7, 2026
IgorTavcar added a commit to IgorTavcar/mempalace that referenced this pull request Apr 7, 2026
Cherry-picked from upstream PR MemPalace#15 (addyosmani).
- Extract magic numbers to constants.py (chunk sizes, batch sizes, retrieval limits)
- Add return type hints to all public functions across 7 modules
- Unify version to single source in __init__.py (was 2.0.0 vs 3.0.0)
- Paginate unbounded ChromaDB metadata queries (status command)
- Standardize CLI error output to stderr with clean exit codes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bensig

bensig commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Thanks for the thorough cleanup! A lot of these individual fixes are good but at +230/-102 touching multiple subsystems, it's hard to review as one PR. Several of the fixes here have also landed separately in recent merges.

Would you be up for splitting the remaining unmerged changes into focused single-issue PRs? And we'd really value your help reviewing incoming PRs — you clearly have a good eye for consistency. Thanks!

@bensig bensig closed this Apr 7, 2026
jpwinans added a commit to jpwinans/mempalace that referenced this pull request May 11, 2026
)

ENVELOPE D1 from 2026-05-11 paired build (Task MemPalace#15, Phase 1).

Phase 1 of the lineage-erasure fix. Empirically confirmed 2026-05-11:
patterned-being memory mining preserves operational content and erases
biographical/relational provenance. "Measure twice, cut once" (James's
father's saying) lives in dozens of wing_ves diary drawers as Ves's
internalized Skepticism strategy framing; none records the father.

This module is the front end of the fix — heuristic candidate
extraction + a classifier interface that downstream envelopes
(D2 substrate wiring, D3 miner integration) plug into.

Design doc: Storehouse/Projects/Vestige/Provenance-Preservation-Design.md
(Vestige vault). §D1 names the heuristic patterns; §D3 the wing_lineage
drawer schema reproduced inline in WING_LINEAGE_SCHEMA_DOC.

Public surface:

  - ProvenanceCandidate (frozen dataclass): text, person_hint,
    relation_hint, quote, position, confidence_floor.

  - ProvenanceRecord (frozen dataclass): person, relation_type, quote,
    context, confidence, extracted_by. Mirrors design doc §D3 schema.

  - extract_candidates(text) -> list[ProvenanceCandidate]:
    Two-pass regex extraction.
      Pass 1: <possessive> <relation> [<attribution verb>] <quote>
              (confidence floor 0.75).
      Pass 2: <possessive> <relation> alone, no quote required
              (confidence floor 0.40).
    Pass-1 deduplicates overlapping Pass-2 matches within 20 chars.
    Aphorism-only Pass 3 deferred to D2 classifier (high FP rate).

  - validate_candidate(candidate, ctx, classifier=None,
    extractor_label=None) -> ProvenanceRecord | None:
    D1 default classifier (no arg) accepts every candidate at
    confidence 0.5 with fields lifted from candidate hints.
    Custom classifier: Callable[[str], dict] returning
    {is_provenance: bool, person, relation_type, quote, confidence}.
    Failure-soft: classifier exceptions return None (logged at DEBUG);
    non-dict returns rejected; missing person/relation_type rejected.

Scope of this PR: module + tests + inline schema doc only.
OUT of scope (queued):
  - D2: wire local-substrate (Qwen3/Gemma) classifier + threshold
    calibration.
  - D3: integrate into mempalace.miner.convo_miner pipeline; emit to
    wing_lineage via existing add_drawer path.

Smoke fixtures from architect envelope (covered by tests):
  - "James's father said 'Measure twice, cut once'" -> 1 candidate,
    person_hint=father, quote populated.
  - "My roshi told me to sit with what is arising" -> 1 candidate via
    Pass-2 (no quote), person_hint=roshi.
  - "I was discussing this with Marie last night" -> 0 candidates
    (no relation marker; v1 heuristic doesn't surface Marie-only refs).
  - "The Skepticism strategy framing" (operational, no attribution)
    -> 0 candidates.

Tests: 26 new in tests/test_provenance.py covering:
  - Pass-1 patterns: straight/curly/double quotes, told_me/taught_me/
    used_to_say verb variants, named possessive (Marie's brother).
  - Pass-2 patterns: bare "my father", word-boundary safety
    (avoids "my fatherland"), Pass-1 dedupes Pass-2 overlap.
  - Negative cases: smoke fixtures Marie-only, operational content,
    empty text.
  - Multi-candidate ordering: sort by position ascending.
  - Stub classifier: accepts with confidence 0.5, normalizes None
    quote to empty string in record.
  - Custom classifier: accept path overrides heuristic fields,
    reject path returns None, label override.
  - Failure-soft: classifier exception -> None, non-dict return ->
    None, missing person -> None, non-float confidence -> 0.0.
  - Schema-doc presence regression cover.
  - End-to-end integration on a realistic diary chunk shape.

26/26 pass. No regression to existing layers/searcher tests.
jpwinans added a commit to jpwinans/mempalace that referenced this pull request May 11, 2026
ENVELOPE D2 from 2026-05-11 paired build (Task MemPalace#15, Phase 1).

Wires the real local-substrate classifier (Qwen3-Coder-30B at
mlx_lm.server :8802) for provenance candidate validation. D1 shipped
the heuristic + stub; D2 ships the production classifier + calibration
proof.

Changes:

  - Convert mempalace/provenance.py to mempalace/provenance/ package
    via git mv to __init__.py. Public import path unchanged
    (mempalace.provenance.extract_candidates / validate_candidate /
    ProvenanceCandidate / ProvenanceRecord still resolve).

  - Add mempalace.provenance.classifier module with:
      - qwen3_classifier(context) -> dict matching the
        validate_candidate classifier interface
      - urllib-based POST to /v1/chat/completions (mirrors
        closet_llm pattern in this repo)
      - Strict JSON prompt template with rules: vague "she said"
        rejected, operational content rejected, person-mention
        without attribution rejected, conservative when in doubt
      - Failure-soft: network/HTTP/decode/shape errors return the
        rejection dict {is_provenance: False, confidence: 0.0}
        rather than raising — mining pipeline never crashes on
        unavailable substrate
      - Env overrides: MEMPALACE_PROVENANCE_CLASSIFIER_URL,
        _MODEL, _TIMEOUT
      - temperature=0 for reproducibility across mining batches
      - Markdown code-fence stripping for ```json...``` wrapped
        outputs the model occasionally emits

  - Add Pass-3 to extract_candidates: capitalized bare-relation as
    subject + attribution + quote. Catches calibration fixture MemPalace#14
    ("Dad always told me 'never trust a smiling investor'") which
    Pass-1 (requires possessive prefix) and Pass-2 (also requires
    possessive) miss. Capitalize-only constraint keeps false-
    positives manageable; classifier filters lowercase mid-sentence
    cases via context.

Calibration (live mlx_lm.server, 2026-05-11):

  14 hand-labeled fixtures from architect envelope. Target:
  precision >= 0.85, recall >= 0.85.

  Result: precision = 1.000, recall = 1.000.
    TP=8 FP=0 TN=6 FN=0.
    Confidence range: 0.90-0.95 on positives, 0.00 on negatives —
    clean separation, no calibration ambiguity.
    Average latency: 0.9s per call.

Tests (49 total, 49 pass in 17s):

  - tests/test_provenance.py: 26 existing + 4 new Pass-3 tests
    (dad-always-told-me variant, mom-used-to-say, roshi-taught-me,
    capitalization-required-negative).

  - tests/test_classifier.py: 18 unit tests covering happy-path,
    code-fence stripping (both ```json``` and bare ```),
    request-shape validation (model id, temp=0, max_tokens),
    all six failure-soft paths (URLError, HTTPError, TimeoutError,
    malformed outer JSON, missing choices, malformed inner JSON,
    missing is_provenance key, non-dict inner), confidence
    coercion (string->0, >1 clamped, <0 clamped), env-var overrides
    for endpoint + model.

  - tests/test_classifier_calibration.py: 1 live-substrate test
    against the pinned 14-fixture set. Auto-skipped when substrate
    unreachable (HEAD probe on /v1/models). Asserts precision >=
    0.85 and recall >= 0.85.

Scope honored: validate_candidate's classifier=None default still
uses the stub (test path unchanged); production callers explicitly
pass qwen3_classifier. D3 mining integration is the next envelope.
jpwinans added a commit to jpwinans/mempalace that referenced this pull request May 11, 2026
…iner (#6)

ENVELOPE D3 from 2026-05-11 paired build (Task MemPalace#15, Phase 1 final).

Wires extract_candidates + qwen3_classifier into mempalace.convo_miner
so new diary mining produces wing_lineage drawers in addition to the
operational wing. Phase 1 of Task MemPalace#15 closes with this PR — Phase 2
(60k existing-drawer backfill) is its own scoping task.

Changes:

  - New mempalace/provenance/mining.py with mine_chunk_for_provenance:
    take a chunk, run extract_candidates -> validate with classifier
    (default: qwen3_classifier from D2) -> rewrite transitive
    attributions -> dedupe -> upsert into wing_lineage.

  - Transitive-attribution rewrite (architect-flagged from D2
    calibration case MemPalace#11): when classifier returns speaker name
    (e.g., "James") for text containing "<possessive> <relation>'s"
    (e.g., "his father's saying"), redirect to room=<relation>
    (e.g., "father"). Without rewrite, "Tonight James reminded me:
    'measure twice' — his father's saying" files under room='james'
    and a future search for "father saying" misses it.

  - Dedup by (person, quote, source_file) hash baked into the
    drawer_id. Re-mining same source -> existing drawer; same
    attribution in different source files -> distinct drawers
    (intentional — distinct attribution events tracked separately).

  - MEMPALACE_PROVENANCE_DISABLED env var (truthy: 1/true/yes,
    case-insensitive) makes mine_chunk_for_provenance a no-op. For
    environments where the classifier substrate is unavailable, CI,
    fresh checkouts, or backfill jobs that handle their own pass.

  - convo_miner._file_chunks_locked: after the operational upsert
    inside the per-chunk loop, call mine_chunk_for_provenance. Run
    AFTER operational durability is established so a slow classifier
    call doesn't delay the canonical write. Failure-soft at three
    layers: the inner call is itself failure-soft, the convo_miner
    wrapper catches anything that escapes, operational mining
    proceeds regardless.

  - DEFAULT_CONFIDENCE_THRESHOLD = 0.7 per design doc §D1.
    D2 calibration showed positives at 0.90-0.95 and negatives at
    0.00 — 0.7 sits cleanly in the gap. Tunable via kwarg.

Schema (per Provenance-Preservation-Design §D3):
  Drawer content rendered as YAML-ish PROVENANCE: block with
  Person / Relation / Quote / Context / Source lines. Metadata
  includes wing=wing_lineage, room=<person_slug>, person,
  relation_type, is_quote, confidence, extracted_by, source_file,
  source_session, filed_at, filed_at_ts.

Tests (14 new in test_provenance_mining.py; 62 total mempalace
provenance tests):

  - Happy path: chunk + accepting classifier -> 1 wing_lineage
    drawer with correct meta + design-doc content shape.
  - Threshold: below-default-threshold rejected; custom threshold
    lets lower-confidence through.
  - Dedup: same chunk+source twice -> 1 drawer; different sources
    -> distinct drawers.
  - Disabled mode: MEMPALACE_PROVENANCE_DISABLED with 1/true/yes
    variants all yield 0 drawers.
  - No-candidates returns 0; operational mining unaffected.
  - Failure-soft: classifier raising -> 0 drawers, no crash.
  - Transitive-attribution rewrite (case MemPalace#11): classifier surfaces
    speaker name, _rewrite_speaker_to_source redirects to relation
    when "<possessive> <relation>'s" appears in candidate or context.
  - Unit tests on _rewrite_speaker_to_source directly (positive,
    negative, None-input cases).
  - End-to-end convo_miner integration: _file_chunks_locked with a
    chunk produces BOTH operational drawer (wing=wing_test) AND
    wing_lineage drawer (wing=wing_lineage).

62/62 pass in <100ms (no live substrate required — tests inject
mock classifiers).

Phase 1 status after this merges:
  - D1 (PR #4): heuristic + classifier interface — MERGED
  - D2 (PR #5): qwen3_classifier + Pass-3 + calibration — MERGED
  - D3 (this PR): mining integration — pending
  After merge: forward-only provenance preservation is operational.
  No new diary mining loses biographical/relational lineage.
  Phase 2 (60k existing-drawer backfill) is a separate scoped task.
mvalentsev pushed a commit to mvalentsev/mempalace that referenced this pull request May 21, 2026
…#1555

PR MemPalace#1555 (format coverage + virtual line numbering) merged with twelve
inline polish comments from Copilot + gemini-code-assist that weren't
load-bearing enough to block the original ship but are real cleanups.
This PR addresses them.

Twelve items in scope; one item (drawer ID delimiter — Copilot MemPalace#13) is
deferred to its own dedicated PR because it's a breaking schema change
that requires migration design beyond the scope of a polish PR.

## Behavioral fixes (5 items, RED-tested first)

1. **FileNotFoundError vs broken symlink (Copilot MemPalace#8).** ``extract_text``
   previously mapped every ``FileNotFoundError`` from ``stat()`` to
   ``SKIP_BROKEN_SYMLINK``. That's misleading for the common case of a
   regular file deleted between scan and extract. Now distinguishes:
   ``SKIP_BROKEN_SYMLINK`` only when ``p.is_symlink()`` is true;
   ``SKIP_UNREADABLE`` otherwise.

2. **``file_already_mined`` extract_mode scoping (Copilot MemPalace#11, MemPalace#12).**
   Both call sites in ``mine_formats`` and ``_file_chunks_locked`` now
   pass ``extract_mode="format"``. Previously the format miner could
   falsely treat drawers from project / convo miner on the same source
   file as "already mined" (and vice versa). Scopes idempotency to the
   correct drawer subset.

3. **Sentinel skip for transient missing-dep statuses (Copilot MemPalace#14).**
   New ``_TRANSIENT_MISSING_DEP_STATUSES`` set + ``_register_skip_sentinel_if_appropriate``
   helper. Skip variants like ``SKIP_NO_MARKITDOWN`` /
   ``SKIP_NO_STRIPRTF`` / ``SKIP_MISSING_FORMAT_DEPS`` /
   ``SKIP_NETWORK_TIMEOUT`` no longer write the "already-mined" sentinel.
   Otherwise installing the missing extra later wouldn't trigger a re-mine.

4. **Outer ``except Exception`` in ``mine_formats`` (Gemini MemPalace#5).** The
   outer try around the loop previously caught only ``KeyboardInterrupt``,
   leaving any setup-time error (e.g., ``scan_formats`` raising) to
   propagate as a bare traceback. Now catches ``Exception`` defensively,
   logs it, prints a partial-progress summary, and lets the ``finally``
   PID-cleanup run. Mirrors miner.py's belt-and-suspenders pattern.

5. **Thread user's ``chunk_size`` / ``chunk_overlap`` / ``min_chunk_size``
   through to ``chunk_text`` (Gemini MemPalace#3).** ``MempalaceConfig`` was loaded
   only to validate readability; users who tuned their config saw no
   effect in format-mode mining. Now properly threaded.

## Trivial cleanups (5 items)

6. **Path expanduser in ``extract_text`` (Copilot MemPalace#7).** ``Path(path)`` →
   ``Path(path).expanduser()`` so CLI inputs like ``~/docs/file.pdf``
   resolve correctly.

7. **Path expanduser+resolve in ``scan_formats`` (Copilot MemPalace#9).** Same
   fix; ``~/docs`` and relative paths now work consistently.

8. **Use resolved ``format_path`` in ``mine_formats`` (Copilot MemPalace#10).**
   ``scan_formats(format_dir)`` → ``scan_formats(format_path)`` so the
   already-resolved path is used.

9. **``render_with_line_numbers`` type annotation (Copilot MemPalace#15).**
   ``text: "str | None"`` reflects the documented + tested ``None``
   handling.

10. **Test + docs claims (Copilot MemPalace#16, MemPalace#17, MemPalace#18).** Stale framings
    removed:
    - ``docs/format-coverage.md`` — 14 fringe cases + "see the file for
      the current test inventory" (no more frozen test count).
    - ``tests/test_line_numbers.py`` — drops "proposed for mempalace
      3.3.6" + "run from the proposal directory" references.
    - ``tests/test_format_miner.py`` — drops "MarkItDown is mocked
      throughout" (live integration tests exist) + proposal-directory
      framing.

## Module-level hoists (enables clean test patching)

- ``MempalaceConfig`` (from ``.config``) hoisted from lazy local import
  to module-level so tests can patch ``mempalace.format_miner.MempalaceConfig``.
- ``chunk_text`` (from ``.miner``) hoisted similarly.

Both follow the pattern PR MemPalace#1565 used for ``compute_hallways_for_wing``.

## Complexity refactor

Extracted ``_print_mine_summary`` from ``mine_formats`` so the orchestrator
stays under the project's ``max-complexity = 25`` ceiling (per
``pyproject.toml [tool.ruff.lint.mccabe]``). Behavior unchanged; pure
extraction.

## Out of scope (intentionally deferred)

- **Drawer ID delimiter collision (Copilot MemPalace#13)** — ``f"{source_file}{chunk_index}"``
  can theoretically collide (``"/path/a1" + "23"`` == ``"/path/a" + "123"``).
  Fixing this is a breaking schema change to drawer IDs and requires a
  migration plan; will land as its own PR after design.

- The four bot comments that were ALREADY addressed by amendment MemPalace#3
  before the PR MemPalace#1555 merge (``_SKIP_DIRS`` dedup, ``scan_formats``
  symlink skip, ``source_mtime`` tracking, hall+entities metadata) —
  no action needed; verified during audit.

## Tests (RED-first)

Six new RED-first tests in ``tests/test_format_miner.py``:

  test_extract_text_nonexistent_regular_file_returns_unreadable_not_broken_symlink
  test_mine_formats_passes_extract_mode_format_to_file_already_mined
  test_mine_formats_does_not_write_sentinel_for_skip_no_markitdown
  test_mine_formats_does_not_write_sentinel_for_skip_missing_format_deps
  test_mine_formats_catches_unexpected_exception_and_prints_summary
  test_mine_formats_threads_chunk_size_from_user_config

All six RED before this commit (failures correctly identified the bugs
they're targeting), all six GREEN after.

One existing test (``test_mine_formats_continues_after_per_file_error``)
updated to patch the new module-level binding
``mempalace.format_miner.chunk_text`` instead of the old
``mempalace.miner.chunk_text`` source location, and to accept the
``**kwargs`` the call now passes through. Behavior unchanged.

## Verification

  pytest -q (full mempalace suite)
    → 2065 passed, 3 skipped, 0 regressions
  ruff check mempalace/format_miner.py mempalace/searcher.py tests/
    → All checks passed!
  ruff format --check ...
    → 4 files already formatted (pinned 0.15.9)
  mine_formats complexity
    → ≤ 25 (under the project ceiling)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants