fix(embeddings): fall back to model context length when no window is set by JimStenstrom · Pull Request #1718 · jundot/omlx

JimStenstrom · 2026-06-07T13:44:39Z

Follow-up to #1694, which resolved the 512-token embedding truncation from #1687.

#1694's get_embedding_max_length() falls back to a hard 512 when neither the request nor the server's max_context_window pins a limit. In that no-window case it re-introduces the original #1687 symptom: a long-context embedding model is capped at 512 even though MLXEmbeddingModel._resolve_max_length() can read its real limit from max_position_embeddings / tokenizer model_max_length.

That model-side resolver only runs when it receives max_length=None, but the server always hands it a concrete 512 first — so for /v1/embeddings it never fires, and the config-aware resolution added in #1694 is effectively bypassed.

Change: return None from get_embedding_max_length() when nothing pins a limit, letting the existing model resolver do its job. 512 is preserved as that resolver's final fallback when the model exposes no usable metadata, so the conservative default is unchanged.

get_embedding_max_length() now returns int | None; its single caller passes the value straight to embed(max_length=...), which already accepts None.
Added TestGetEmbeddingMaxLength covering the request-override, configured-window, and no-window (returns None) paths.

Verification: pytest tests/test_embedding.py → 83 passed.

get_embedding_max_length() returned a hard 512 when neither the request nor the server's max_context_window pinned a limit, re-truncating long-context embedding models in exactly the no-config case jundot#1687 was about. Return None instead so the engine/model embed() path resolves the model's own context length (max_position_embeddings / tokenizer model_max_length, already in MLXEmbeddingModel._resolve_max_length), keeping 512 only as that resolver's final fallback. Follow-up to jundot#1694.

jundot · 2026-06-08T09:23:49Z

Thanks for the follow-up. I checked the server and embedding engine path, and passing None here is safe because MLXEmbeddingModel resolves the model/tokenizer limit and still keeps 512 as the final fallback when no metadata exists.

The request override and configured context-window paths stay unchanged, and CI is green. This looks good to me, and I'm going to merge it.

jundot merged commit 9fbb89a into jundot:main Jun 8, 2026
4 checks passed

JimStenstrom deleted the fix/embedding-max-length-model-fallback branch June 8, 2026 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(embeddings): fall back to model context length when no window is set#1718

fix(embeddings): fall back to model context length when no window is set#1718
jundot merged 1 commit into
jundot:mainfrom
JimStenstrom:fix/embedding-max-length-model-fallback

JimStenstrom commented Jun 7, 2026

Uh oh!

jundot commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JimStenstrom commented Jun 7, 2026

Uh oh!

jundot commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants