fix(model_metadata): fall through to defaults when endpoint has model but no context_length#12059
Open
LucidPaths wants to merge 1 commit into
Open
Conversation
… but no context_length When any custom endpoint (Ollama, Chutes, LM Studio, etc.) returns a model in its /v1/models response but without context_length metadata, the resolution chain previously returned DEFAULT_FALLBACK_CONTEXT (128K) immediately, bypassing models.dev, OpenRouter, and hardcoded defaults. Any OpenAI-compatible server may list models without context_length — this is not provider-specific. The resolver should keep trying rather than giving up at step 2. Changes: - Remove early return from step 2 when endpoint matches a model but has no context_length. Fall through to steps 3-8 (local probe, Anthropic API, models.dev, hardcoded defaults). - Move local server query out of the _is_known_provider_base_url guard so it still works for known provider URLs that need local probing. - Add ollama->ollama-cloud mapping in PROVIDER_TO_MODELS_DEV so Ollama Cloud endpoints can resolve model context via models.dev. - Fix regression test to use an unknown custom endpoint URL (not a known provider URL that skips the patched code path entirely). - Remove misleading `or` assertion that would accept the old buggy 128K value. - Rename test to clarify it tests empty vs no-context-length metadata.
This was referenced Apr 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When any OpenAI-compatible custom endpoint (Ollama Cloud, Chutes, LM Studio, local vLLM, etc.) returns a model in its
/v1/modelsresponse but without acontext_lengthfield, the context resolution chain inget_model_context_length()immediately returnedDEFAULT_FALLBACK_CONTEXT(128K) at step 2, bypassing models.dev, OpenRouter, and hardcoded defaults entirely.This is a provider-agnostic bug — any server that lists models without complete metadata hits this wall. The specific trigger was GLM-5.1 on Ollama Cloud resolving to 128K instead of the correct 202,752 window, but the same issue affects any model on any custom endpoint with incomplete
/v1/modelsresponses.Root Cause
In
agent/model_metadata.py, when step 2 matched a model at a custom endpoint but found nocontext_lengthin the response, two things happened before the early return:_is_known_provider_base_url()returnedTrue, the code skipped the endpoint query entirely (correct — Copilot/OpenAI report provider-limited context, not model context)._is_known_provider_base_url()returnedFalseandis_local_endpoint()returnedFalse, the code immediately returnedDEFAULT_FALLBACK_CONTEXT(128K) — incorrect. It should have continued to steps 3-8 which check models.dev and hardcoded defaults.Changes
agent/model_metadata.pyreturn DEFAULT_FALLBACK_CONTEXTwhen an endpoint matches a model butcontext_lengthis missing. Instead, log adebugmessage and fall through to steps 3-8 (local probe, Anthropic API, models.dev, hardcoded defaults)._is_known_provider_base_urlguard so it still fires for known-provider local endpoints. No behavioral change with current_URL_TO_PROVIDERentries (all remote), but eliminates an incorrect nesting dependency.agent/models_dev.py"ollama" → "ollama-cloud"mapping inPROVIDER_TO_MODELS_DEV. When a user configuresprovider: ollama(the natural name for Ollama Cloud), step 5'slookup_models_dev_context("ollama-cloud", model)can now resolve context windows via models.dev.tests/agent/test_model_metadata.pyhttps://llm.chutes.ai/v1(unknown custom endpoint) instead ofhttps://ollama.com/v1(known provider). The old URL skipped the patched code path entirely —ollama.comis in_URL_TO_PROVIDER, so_is_known_provider_base_url()returnedTrue, and the endpoint metadata block was never entered.assert result == DEFAULT_FALLBACK_CONTEXT or result == 202752— this would accept the old buggy 128K value.test_custom_endpoint_without_metadata_skips_name_based_default→test_custom_endpoint_empty_metadata_falls_through_to_defaults(empty endpoint response) and addtest_custom_endpoint_match_no_context_length_falls_through(endpoint returns model without context_length).mock_endpoint_fetch.return_value = {}to the empty-metadata test instead of relying on MagicMock default behavior.Testing
Both new tests specifically exercise the patched code path:
test_custom_endpoint_empty_metadata_falls_through_to_defaults— empty endpoint response, model resolves via hardcoded defaultstest_custom_endpoint_match_no_context_length_falls_through— endpoint returns model name but no context_length, resolves via hardcoded defaults instead of 128KArchitecture Context
get_model_context_length()has a 10-step resolution chain. Steps 1-2 are fast local lookups (cache, endpoint metadata). Steps 3-8 are progressively more expensive (local probe, Anthropic API, OpenRouter, models.dev, hardcoded defaults). Step 9 is a final local probe. Step 10 is the 128K fallback.The fix ensures that failing at step 2 doesn't short-circuit the entire chain. This aligns with the documented resolution order and the design intent of having multiple fallback layers.