Skip to content

fix: correct context-length resolution for kimi-k2.6 on Ollama Cloud and Kimi Coding#23980

Merged
kshitijk4poor merged 3 commits into
NousResearch:mainfrom
kshitijk4poor:fix/kimi-context-length-resolution
May 11, 2026
Merged

fix: correct context-length resolution for kimi-k2.6 on Ollama Cloud and Kimi Coding#23980
kshitijk4poor merged 3 commits into
NousResearch:mainfrom
kshitijk4poor:fix/kimi-context-length-resolution

Conversation

@kshitijk4poor

@kshitijk4poor kshitijk4poor commented May 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K, tripping the 64K minimum-context guard and preventing use on Ollama Cloud, Kimi Coding / Moonshot, and any custom Ollama endpoint.

Closes #23949.

Root cause

The context-length resolution chain had three gaps:

  1. Ollama native /api/show was never queried — the OpenAI-compat /v1/models correctly omits context_length, but Hermes never called the Ollama native endpoint which returns authoritative GGUF metadata (262144 for kimi-k2.6).
  2. models.dev stores kimi-k2.6:cloud — lookup_models_dev_context only searched for bare names, missing the :cloud-suffixed entry.
  3. OpenRouter reports 32768 for moonshotai/kimi-k2.6 — this stale metadata was accepted as truth, overriding the project's own curated DEFAULT_CONTEXT_LENGTHS table.

Changes

1. _query_ollama_api_show() — provider-agnostic Ollama native API probe

Queries /api/show at two points in the resolution chain:

  • Step 2b — for custom/unknown endpoints (after /v1/models fails)
  • Step 5e — for known providers with any base_url

For non-Ollama servers, the POST returns 404/405 quickly. Results are cached per model+URL. For hosted Ollama, prefers GGUF model_info.*.context_length over num_ctx.

2. :cloud/-cloud suffix fallback in lookup_models_dev_context()

When exact lookup fails, also tries appending :cloud and -cloud suffixes. Makes bare kimi-k2.6 match the kimi-k2.6:cloud entry in models.dev.

3. Gate OpenRouter metadata behind "if not effective_provider"

Based on PR #23950 by @nicoechaniz. Known providers should not be overridden by community-maintained OpenRouter data. When a provider is known (inferred from URL or set in config), skip OR and fall through to models.dev + curated defaults.

4. Kimi-family 32K guard (inside the OR gate)

For unknown providers where OR is still consulted: if OR returns exactly 32768 and _model_name_suggests_kimi() matches, reject and fall through to hardcoded defaults ("kimi": 262144).

5. Add "kimi" and "moonshot" to PROVIDER_TO_MODELS_DEV

Maps these bare provider names to kimi-for-coding, consistent with existing kimi-coding and kimi-coding-cn entries. From PR #23950 by @nicoechaniz.

Test plan

  • All 177 existing tests pass (test_model_metadata, test_models_dev, test_ollama_num_ctx, test_ollama_cloud_provider)
  • E2E verified: ollama-cloud, kimi-coding, and kimi-coding-cn all resolve kimi-k2.6 to 262144

@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder provider/kimi Kimi / Moonshot provider/ollama Ollama / local models P2 Medium — degraded but workaround exists labels May 11, 2026
@kshitijk4poor kshitijk4poor force-pushed the fix/kimi-context-length-resolution branch from b362822 to d63de45 Compare May 11, 2026 19:52
…and Kimi Coding

Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K,
tripping the 64K minimum-context guard and preventing use of the model on
Ollama Cloud and Kimi Coding / Moonshot providers.

Three fixes in the context-length resolution chain:

1. Ollama Cloud native /api/show query: new _query_ollama_api_show()
   queries the Ollama native API for authoritative GGUF model_info
   context_length.  For hosted Ollama, prefers model_info over num_ctx
   since users can't set their own num_ctx on Cloud.  Added at step 5e
   in get_model_context_length(), before the models.dev fallback.

2. models.dev :cloud/-cloud suffix fallback: lookup_models_dev_context()
   now also tries appending :cloud and -cloud suffixes when the bare
   model name doesn't match.  models.dev stores 'kimi-k2.6:cloud' but
   users and the live API use bare 'kimi-k2.6'.

3. Kimi-family 32K guard: after the OpenRouter metadata step, reject
   exactly 32768 for Kimi-named models (kimi-*, moonshot*) and fall
   through to hardcoded defaults ('kimi': 262144).  OpenRouter reports
   32768 for moonshotai/kimi-k2.6 but the model actually supports 262K.
   Narrow filter — only 32768, only Kimi-family — becomes dead code
   when OpenRouter updates its metadata.

---
@kshitijk4poor kshitijk4poor force-pushed the fix/kimi-context-length-resolution branch from d63de45 to a20971d Compare May 11, 2026 19:55
…onshot to PROVIDER_TO_MODELS_DEV

Based on PR NousResearch#23950 by @nicoechaniz.

- Add "kimi" and "moonshot" to PROVIDER_TO_MODELS_DEV → kimi-for-coding
- Gate OpenRouter metadata step behind "if not effective_provider":
  known providers should not be overridden by community-maintained OR data
- Keep the targeted Kimi-family 32k guard as a secondary safety net
  inside the OR gate (for unknown providers with Kimi models)

Co-authored-by: nicoechaniz <nicoechaniz@altermundi.net>
@kshitijk4poor kshitijk4poor merged commit 9a63b5f into NousResearch:main May 11, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/kimi Kimi / Moonshot provider/ollama Ollama / local models type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kimi-k2.6 on Ollama Cloud detected as 32K context despite API reporting 256K

4 participants