fix(embeddings): fall back to model context length when no window is set#1718
Merged
jundot merged 1 commit intoJun 8, 2026
Merged
Conversation
get_embedding_max_length() returned a hard 512 when neither the request nor the server's max_context_window pinned a limit, re-truncating long-context embedding models in exactly the no-config case jundot#1687 was about. Return None instead so the engine/model embed() path resolves the model's own context length (max_position_embeddings / tokenizer model_max_length, already in MLXEmbeddingModel._resolve_max_length), keeping 512 only as that resolver's final fallback. Follow-up to jundot#1694.
Owner
|
Thanks for the follow-up. I checked the server and embedding engine path, and passing None here is safe because MLXEmbeddingModel resolves the model/tokenizer limit and still keeps 512 as the final fallback when no metadata exists. The request override and configured context-window paths stay unchanged, and CI is green. This looks good to me, and I'm going to merge it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #1694, which resolved the 512-token embedding truncation from #1687.
#1694's
get_embedding_max_length()falls back to a hard512when neither the request nor the server'smax_context_windowpins a limit. In that no-window case it re-introduces the original #1687 symptom: a long-context embedding model is capped at 512 even thoughMLXEmbeddingModel._resolve_max_length()can read its real limit frommax_position_embeddings/ tokenizermodel_max_length.That model-side resolver only runs when it receives
max_length=None, but the server always hands it a concrete512first — so for/v1/embeddingsit never fires, and the config-aware resolution added in #1694 is effectively bypassed.Change: return
Nonefromget_embedding_max_length()when nothing pins a limit, letting the existing model resolver do its job.512is preserved as that resolver's final fallback when the model exposes no usable metadata, so the conservative default is unchanged.get_embedding_max_length()now returnsint | None; its single caller passes the value straight toembed(max_length=...), which already acceptsNone.TestGetEmbeddingMaxLengthcovering the request-override, configured-window, and no-window (returnsNone) paths.Verification:
pytest tests/test_embedding.py→ 83 passed.