docs(embedding): warn custom-EF authors about chromadb 1.5+ embed_query requirement by jphein · Pull Request #80 · techempower-org/mempalace

jphein · 2026-05-15T07:30:17Z

Summary

Adds a defensive doc comment in `mempalace/embedding.py` flagging a chromadb 1.5+ EF-protocol gotcha that's easy to fall into when implementing a custom embedding function.

The trap, condensed

class FTCodeEF:
    @staticmethod
    def name(): return "default"
    def __call__(self, input): ...  # the only method we implement

`Collection.upsert(documents=[...])` → works (legacy `call` dispatch).
`Collection.query(query_texts=[...])` → `AttributeError: 'FTCodeEF' object has no attribute 'embed_query'`.
mempalace's searcher catches the AttributeError and silently falls back to BM25.
Vector retrieval looks like it ran; results are returned; numbers are plausible.
But the encoder you thought you installed is never actually exercised on queries.

Fix (one base class)

from chromadb.api.types import EmbeddingFunction

class FTCodeEF(EmbeddingFunction):
    @staticmethod
    def name(): return "default"
    def __call__(self, input): ...
    # embed_query is inherited; it delegates to __call__

`EmbeddingFunction` provides a default `embed_query` that delegates to `call`, which is the right thing for most custom encoders. Override for asymmetric query/document encoders.

How we found this

Cross-encoder fusion verifier (techempower-org/mempalace#77) produced byte-identical per-probe ranks across two genuinely different SentenceTransformer fine-tunes. Direct `.encode()` on the models showed they were distinct; the EF-wrapped version showed they weren't. Root cause was the silent BM25 fallback above.

This affects anyone who builds a custom `EmbeddingFunction` against mempalace without inheriting from the chromadb base — a common case for SentenceTransformer wrappers (cf. @nakata-app's adaptmem work in MemPalace/mempalace#1384).

What this PR does

24-line comment block above `_build_ef_class` in `embedding.py`.
No code changes — our internal `_MempalaceONNX` already inherits from `ONNXMiniLM_L6_V2` which has both `call` and `embed_query`, so we're unaffected. The comment is for downstream users.

Test plan

No code changes — existing test suite passes unchanged.
Comment is in the right place (above `_build_ef_class` so it's visible to anyone reading the EF construction path).
Optional upstream PR to MemPalace/mempalace once we've validated more of the surrounding context.

🤖 Generated with Claude Code

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…ry requirement Defensive doc comment in ``mempalace/embedding.py`` flagging the chromadb 1.5+ ``EmbeddingFunction.embed_query`` requirement for any custom encoder. ``Collection.upsert`` works via ``__call__``; ``Collection.query(query_texts=...)`` requires ``embed_query``. Without inheritance from ``chromadb.api.types.EmbeddingFunction``, queries raise ``AttributeError`` and mempalace's searcher silently falls back to BM25 — invisible failure mode. Diagnosed against a SentenceTransformer adaptmem FT-Code wrapper that produced byte-identical ranks across two different fine-tuned checkpoints (BM25 fallback). Fix: subclass ``EmbeddingFunction``. See MemPalace#1384 for the encoder-axis discussion that surfaced this. Our internal ``_MempalaceONNX`` (already subclasses ONNXMiniLM_L6_V2 which has both methods) is unaffected; the comment is for downstream users who want to drop in a SentenceTransformer or similar without going through ONNXMiniLM_L6_V2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 15, 2026 07:30

Copilot started reviewing on behalf of jphein May 15, 2026 07:30 View session

jphein mentioned this pull request May 15, 2026

Add optional PostgreSQL backend with pg_sorted_heap support MemPalace/mempalace#665

Open

3 tasks

Copilot AI reviewed May 15, 2026

jphein force-pushed the docs/embedding-function-protocol-1384 branch from caa8eb9 to 35a625b Compare May 15, 2026 07:35

jphein merged commit d70b977 into main May 15, 2026
7 checks passed

jphein deleted the docs/embedding-function-protocol-1384 branch May 15, 2026 07:36

This was referenced May 15, 2026

docs: refresh README + CHANGELOG for 2026-05-15 postgres cutover complete #81

Merged

Question: should MemPalace recommend a single-writer/gateway pattern for multi-agent setups? MemPalace/mempalace#1497

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(embedding): warn custom-EF authors about chromadb 1.5+ embed_query requirement#80

docs(embedding): warn custom-EF authors about chromadb 1.5+ embed_query requirement#80
jphein merged 1 commit into
mainfrom
docs/embedding-function-protocol-1384

jphein commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jphein commented May 15, 2026

Summary

The trap, condensed

Fix (one base class)

How we found this

What this PR does

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants