Skip to content

fix(embeddings): fall back to model context length when no window is set#1718

Merged
jundot merged 1 commit into
jundot:mainfrom
JimStenstrom:fix/embedding-max-length-model-fallback
Jun 8, 2026
Merged

fix(embeddings): fall back to model context length when no window is set#1718
jundot merged 1 commit into
jundot:mainfrom
JimStenstrom:fix/embedding-max-length-model-fallback

Conversation

@JimStenstrom

Copy link
Copy Markdown
Contributor

Follow-up to #1694, which resolved the 512-token embedding truncation from #1687.

#1694's get_embedding_max_length() falls back to a hard 512 when neither the request nor the server's max_context_window pins a limit. In that no-window case it re-introduces the original #1687 symptom: a long-context embedding model is capped at 512 even though MLXEmbeddingModel._resolve_max_length() can read its real limit from max_position_embeddings / tokenizer model_max_length.

That model-side resolver only runs when it receives max_length=None, but the server always hands it a concrete 512 first — so for /v1/embeddings it never fires, and the config-aware resolution added in #1694 is effectively bypassed.

Change: return None from get_embedding_max_length() when nothing pins a limit, letting the existing model resolver do its job. 512 is preserved as that resolver's final fallback when the model exposes no usable metadata, so the conservative default is unchanged.

  • get_embedding_max_length() now returns int | None; its single caller passes the value straight to embed(max_length=...), which already accepts None.
  • Added TestGetEmbeddingMaxLength covering the request-override, configured-window, and no-window (returns None) paths.

Verification: pytest tests/test_embedding.py → 83 passed.

get_embedding_max_length() returned a hard 512 when neither the request nor
the server's max_context_window pinned a limit, re-truncating long-context
embedding models in exactly the no-config case jundot#1687 was about. Return None
instead so the engine/model embed() path resolves the model's own context
length (max_position_embeddings / tokenizer model_max_length, already in
MLXEmbeddingModel._resolve_max_length), keeping 512 only as that resolver's
final fallback. Follow-up to jundot#1694.
@jundot

jundot commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Thanks for the follow-up. I checked the server and embedding engine path, and passing None here is safe because MLXEmbeddingModel resolves the model/tokenizer limit and still keeps 512 as the final fallback when no metadata exists.

The request override and configured context-window paths stay unchanged, and CI is green. This looks good to me, and I'm going to merge it.

@jundot jundot merged commit 9fbb89a into jundot:main Jun 8, 2026
4 checks passed
@JimStenstrom JimStenstrom deleted the fix/embedding-max-length-model-fallback branch June 8, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants