Skip to content

fix(model_metadata): fall through to defaults when endpoint has model but no context_length#12059

Open
LucidPaths wants to merge 1 commit into
NousResearch:mainfrom
LucidPaths:fix/context-length-custom-endpoint-fallback
Open

fix(model_metadata): fall through to defaults when endpoint has model but no context_length#12059
LucidPaths wants to merge 1 commit into
NousResearch:mainfrom
LucidPaths:fix/context-length-custom-endpoint-fallback

Conversation

@LucidPaths

@LucidPaths LucidPaths commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Problem

When any OpenAI-compatible custom endpoint (Ollama Cloud, Chutes, LM Studio, local vLLM, etc.) returns a model in its /v1/models response but without a context_length field, the context resolution chain in get_model_context_length() immediately returned DEFAULT_FALLBACK_CONTEXT (128K) at step 2, bypassing models.dev, OpenRouter, and hardcoded defaults entirely.

This is a provider-agnostic bug — any server that lists models without complete metadata hits this wall. The specific trigger was GLM-5.1 on Ollama Cloud resolving to 128K instead of the correct 202,752 window, but the same issue affects any model on any custom endpoint with incomplete /v1/models responses.

Root Cause

In agent/model_metadata.py, when step 2 matched a model at a custom endpoint but found no context_length in the response, two things happened before the early return:

  1. If _is_known_provider_base_url() returned True, the code skipped the endpoint query entirely (correct — Copilot/OpenAI report provider-limited context, not model context).
  2. If _is_known_provider_base_url() returned False and is_local_endpoint() returned False, the code immediately returned DEFAULT_FALLBACK_CONTEXT (128K) — incorrect. It should have continued to steps 3-8 which check models.dev and hardcoded defaults.

Changes

agent/model_metadata.py

  • Remove early return DEFAULT_FALLBACK_CONTEXT when an endpoint matches a model but context_length is missing. Instead, log a debug message and fall through to steps 3-8 (local probe, Anthropic API, models.dev, hardcoded defaults).
  • Move local server query out of the _is_known_provider_base_url guard so it still fires for known-provider local endpoints. No behavioral change with current _URL_TO_PROVIDER entries (all remote), but eliminates an incorrect nesting dependency.

agent/models_dev.py

  • Add "ollama" → "ollama-cloud" mapping in PROVIDER_TO_MODELS_DEV. When a user configures provider: ollama (the natural name for Ollama Cloud), step 5's lookup_models_dev_context("ollama-cloud", model) can now resolve context windows via models.dev.

tests/agent/test_model_metadata.py

  • Fix regression test to use https://llm.chutes.ai/v1 (unknown custom endpoint) instead of https://ollama.com/v1 (known provider). The old URL skipped the patched code path entirely — ollama.com is in _URL_TO_PROVIDER, so _is_known_provider_base_url() returned True, and the endpoint metadata block was never entered.
  • Remove misleading assert result == DEFAULT_FALLBACK_CONTEXT or result == 202752 — this would accept the old buggy 128K value.
  • Rename test from test_custom_endpoint_without_metadata_skips_name_based_defaulttest_custom_endpoint_empty_metadata_falls_through_to_defaults (empty endpoint response) and add test_custom_endpoint_match_no_context_length_falls_through (endpoint returns model without context_length).
  • Add explicit mock_endpoint_fetch.return_value = {} to the empty-metadata test instead of relying on MagicMock default behavior.

Testing

$ python -m pytest tests/agent/test_model_metadata.py -v
81 passed in 1.48s

Both new tests specifically exercise the patched code path:

  • test_custom_endpoint_empty_metadata_falls_through_to_defaults — empty endpoint response, model resolves via hardcoded defaults
  • test_custom_endpoint_match_no_context_length_falls_through — endpoint returns model name but no context_length, resolves via hardcoded defaults instead of 128K

Architecture Context

get_model_context_length() has a 10-step resolution chain. Steps 1-2 are fast local lookups (cache, endpoint metadata). Steps 3-8 are progressively more expensive (local probe, Anthropic API, OpenRouter, models.dev, hardcoded defaults). Step 9 is a final local probe. Step 10 is the 128K fallback.

The fix ensures that failing at step 2 doesn't short-circuit the entire chain. This aligns with the documented resolution order and the design intent of having multiple fallback layers.

… but no context_length

When any custom endpoint (Ollama, Chutes, LM Studio, etc.) returns a
model in its /v1/models response but without context_length metadata,
the resolution chain previously returned DEFAULT_FALLBACK_CONTEXT (128K)
immediately, bypassing models.dev, OpenRouter, and hardcoded defaults.

Any OpenAI-compatible server may list models without context_length —
this is not provider-specific. The resolver should keep trying rather
than giving up at step 2.

Changes:
- Remove early return from step 2 when endpoint matches a model but
  has no context_length. Fall through to steps 3-8 (local probe,
  Anthropic API, models.dev, hardcoded defaults).
- Move local server query out of the _is_known_provider_base_url guard
  so it still works for known provider URLs that need local probing.
- Add ollama->ollama-cloud mapping in PROVIDER_TO_MODELS_DEV so Ollama
  Cloud endpoints can resolve model context via models.dev.
- Fix regression test to use an unknown custom endpoint URL (not a
  known provider URL that skips the patched code path entirely).
- Remove misleading `or` assertion that would accept the old buggy
  128K value.
- Rename test to clarify it tests empty vs no-context-length metadata.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants