Skip to content

Commit 942aae3

Browse files
jpheinclaude
andcommitted
fix(hnsw): address @igorls's MemPalace#1173 review
Restore three `_pin_hnsw_threads` tests that the previous integrity-gate commit deleted. The function is still live code on develop (defined at chroma.py:207, called from chroma.py:705 + mcp_server.py), so the deletion left the import unused (ruff F401) and dropped coverage on a function unrelated to this PR's scope. Restored verbatim from main. Plus three nits @igorls flagged: - **Thread-safety doc**: `_quarantined_paths` mutation is lock-free; documented that idempotency of `quarantine_stale_hnsw` is the safety property (concurrent same-palace calls produce a benign redundant rename attempt that no-ops, no need for a lock). - **Pickle protocol assumption**: `_segment_appears_healthy` requires PROTO ≥ 2 (`0x80`). Documented; matches what chromadb writes today, and a future protocol-0/1 emission would conservatively quarantine + lazy-rebuild rather than mis-classify as healthy. - Class-level vs module-level scope: keeping class-level — the conftest reset is the controlled case, and module-level wouldn't remove the foot-gun, just relocate it. Conftest reset documented in the existing comment is the right pattern for test isolation. Style nit (`Path(marker).touch()` vs `open(marker, "a").close()`) deferred — that pattern lives in MemPalace#1177's territory, not MemPalace#1173's. 37/37 tests pass on the PR branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 74ff5e6 commit 942aae3

2 files changed

Lines changed: 63 additions & 0 deletions

File tree

mempalace/backends/chroma.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,12 @@ def _segment_appears_healthy(seg_dir: str) -> bool:
7272
can execute arbitrary code, and the byte-sniff is sufficient to
7373
distinguish a complete write from truncation, zero-fill, or
7474
partial-flush corruption.
75+
76+
Assumes pickle protocol >= 2 (``0x80`` PROTO marker). Matches what
77+
chromadb writes today; if a future chromadb version emits protocol
78+
0/1 segments, this check would start returning False on healthy
79+
files and quarantine_stale_hnsw would conservatively rename them
80+
out of the way (lazy rebuild on next open recovers).
7581
"""
7682
meta_path = os.path.join(seg_dir, "index_metadata.pickle")
7783
if not os.path.isfile(meta_path):
@@ -621,6 +627,14 @@ def _client(self, palace_path: str):
621627
# Real runtime drift is still handled — palace-daemon's ``_auto_repair``
622628
# calls :func:`quarantine_stale_hnsw` directly on observed HNSW errors,
623629
# which bypasses this gate.
630+
#
631+
# Thread-safety: this set is mutated without a lock. Two concurrent
632+
# ``make_client()`` calls for the same palace can both pass the
633+
# membership check and both invoke ``quarantine_stale_hnsw``. That's
634+
# safe because the function is idempotent (mtime check + timestamped
635+
# rename of distinct directories), so the worst-case race produces
636+
# one redundant rename attempt that no-ops. Idempotency is the
637+
# safety property; locking would add cost without correctness gain.
624638
_quarantined_paths: set[str] = set()
625639

626640
@staticmethod

tests/test_backends.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -578,3 +578,52 @@ def _spy(path, stale_seconds=300.0):
578578
assert calls == [palace_a, palace_b]
579579

580580

581+
# ── _pin_hnsw_threads (per-process retrofit, separate from this PR's gate) ──
582+
583+
584+
def test_pin_hnsw_threads_retrofits_legacy_collection(tmp_path):
585+
"""Legacy collections (created without num_threads) get the retrofit applied."""
586+
palace_path = tmp_path / "legacy-palace"
587+
palace_path.mkdir()
588+
589+
client = chromadb.PersistentClient(path=str(palace_path))
590+
col = client.create_collection(
591+
"mempalace_drawers",
592+
metadata={"hnsw:space": "cosine"}, # no num_threads — legacy
593+
)
594+
assert col.configuration_json.get("hnsw", {}).get("num_threads") is None
595+
596+
_pin_hnsw_threads(col)
597+
598+
assert col.configuration_json["hnsw"]["num_threads"] == 1
599+
600+
601+
def test_pin_hnsw_threads_swallows_all_errors():
602+
"""Retrofit never raises even when collection.modify explodes."""
603+
604+
class _ExplodingCollection:
605+
def modify(self, *args, **kwargs):
606+
raise RuntimeError("boom")
607+
608+
_pin_hnsw_threads(_ExplodingCollection()) # must not raise
609+
610+
611+
def test_get_collection_applies_retrofit_on_existing_palace(tmp_path):
612+
"""ChromaBackend.get_collection(create=False) applies the retrofit."""
613+
palace_path = tmp_path / "palace"
614+
palace_path.mkdir()
615+
616+
# Simulate a legacy palace: create collection without num_threads
617+
bootstrap_client = chromadb.PersistentClient(path=str(palace_path))
618+
bootstrap_client.create_collection("mempalace_drawers", metadata={"hnsw:space": "cosine"})
619+
del bootstrap_client # drop reference so a fresh client reopens cleanly
620+
621+
wrapper = ChromaBackend().get_collection(
622+
str(palace_path),
623+
collection_name="mempalace_drawers",
624+
create=False,
625+
)
626+
627+
assert wrapper._collection.configuration_json["hnsw"]["num_threads"] == 1
628+
629+

0 commit comments

Comments
 (0)