feat: Ollama improvements — Cloud provider, GLM continuation, think=false, surrogate sanitization, /v1 hint by teknium1 · Pull Request #10782 · NousResearch/hermes-agent

teknium1 · 2026-04-16T05:41:00Z

Summary

Consolidates 5 community PRs into a single Ollama improvement package. Each contributor's commit is preserved with original authorship via cherry-pick (rebase merge to keep attribution).

Changes

1. Ollama Cloud as built-in provider (cherry-picked from PR #6038 by @kshitijk4poor)

Full first-class provider: --provider ollama-cloud, PROVIDER_REGISTRY, CANONICAL_PROVIDERS
Dynamic model discovery from ollama.com/v1/models + models.dev, disk-cached (1hr TTL)
OLLAMA_API_KEY env var, URL auto-detection, model:tag passthrough normalization
37 provider-specific tests
Closes Add Ollama Cloud as built-in provider #3926

2. Continue Ollama GLM replies after stop misreports (cherry-picked from PR #10740 by @LeonSGP43)

Ollama-hosted GLM models misreport finish_reason="stop" on truncated tool-call responses
Narrowly-scoped heuristic reclassifies as "length" to trigger continuation
5 guard conditions (Ollama+GLM only), 3 test cases
Closes Bug: Ollama returns finish_reason='stop' on truncated GLM responses, causing agent to silently drop final output #10711

3. Pass think=false for Ollama when reasoning disabled (cherry-picked from PR #3197 by @Mibayy)

Ollama ignores OpenRouter-style reasoning extra_body; inject native think: false instead
Prevents Qwen3 etc. from generating <think> blocks when effort=none
4 test cases
Closes [Feature]: Pass Ollama think: false parameter when reasoning_effort: none is set for custom/Ollama providers #3191

4. Sanitize surrogate characters from Ollama model output (cherry-picked from PR #5074 by @ygd58)

Proactively strip lone surrogates (U+D800-U+DFFF) before API calls
Prevents json.dumps() crashes inside the OpenAI SDK
Closes [Bug]: Surrogate Code Points Crash UTF-8 Serialization with Ollama Models #5059

5. Hint about /v1 suffix for local model endpoints (our addition)

When configuring a custom endpoint via hermes model, detect local-looking URLs without /v1
Prompt: "Did you mean to add /v1 at the end? Most local model servers require it."
Addresses feedback from fix(agent): append /v1 to custom openai-compatible base URLs (#4600) #4617 and fix: is_local_endpoint misses Docker/Podman DNS names #7906 without auto-modifying the URL

Contributor Attribution

All cherry-picked commits preserve original authorship in git log. Use rebase merge to keep individual commit attribution.

PRs to close after merge

feat: add Ollama Cloud as built-in provider #6038 (Ollama Cloud provider — @kshitijk4poor)
fix(agent): continue Ollama GLM replies after stop misreports #10740 (GLM stop misreport — @LeonSGP43)
feat(ollama): pass think=false to custom providers when reasoning_effort is none #3197 (think=false — @Mibayy)
fix(agent): sanitize surrogate characters from API responses and before API calls #5074 (surrogate sanitization — @ygd58)
fix: register Ollama Cloud as known provider for context length resolution #5490 (context length — @LucidPaths, subsumed by feat: add Ollama Cloud as built-in provider #6038)
feat(auth): add Ollama Cloud, Google/Gemini, xAI, and Ollama Local as built-in providers #3709 (multi-provider bundle — @simplenamebox-ops, Ollama Cloud portion covered)

Test Results

37/37 Ollama Cloud provider tests pass
296/298 run_agent tests pass (2 pre-existing failures unrelated to this PR)
Full hermes_cli suite: 2068 passed (30 failures + 44 errors are pre-existing on main)

Add ollama-cloud as a first-class provider with full parity to existing API-key providers (gemini, zai, minimax, etc.): - PROVIDER_REGISTRY entry with OLLAMA_API_KEY env var - Provider aliases: ollama -> custom (local), ollama_cloud -> ollama-cloud - models.dev integration for accurate context lengths - URL-to-provider mapping (ollama.com -> ollama-cloud) - Passthrough model normalization (preserves Ollama model:tag format) - Default auxiliary model (nemotron-3-nano:30b) - HermesOverlay in providers.py - CLI --provider choices, CANONICAL_PROVIDERS entry - Dynamic model discovery with disk caching (1hr TTL) - 37 provider-specific tests Cherry-picked from PR #6038 by kshitijk4poor. Closes #3926

…ort is none When a custom/Ollama provider is used and reasoning_effort is set to 'none' (or enabled: false), inject 'think': false into the request extra_body. Ollama does not recognise the OpenRouter-style 'reasoning' extra_body field, so thinking-capable models (Qwen3, etc.) generate <think> blocks regardless of the reasoning_effort setting. This produces empty-response errors that corrupt session state. The fix adds a provider-specific block in _build_api_kwargs() that sets think=false in extra_body whenever self.provider == 'custom' and reasoning is explicitly disabled. Closes #3191

…re API calls

When a user enters a local model server URL (Ollama, vLLM, llama.cpp) without a /v1 suffix during 'hermes model' custom endpoint setup, prompt them to add it. Most OpenAI-compatible local servers require /v1 in the base URL for chat completions to work.

arsaboo · 2026-04-16T13:01:23Z

@teknium1 how do you envision handling local ollama? I am trying to work on adding local ollama as a built-in provider. Should we have OLLAMA_LOCAL_BASE_URL ?

Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (#11363) — optional-skills-catalog entry - /gquota (#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.

Cat0x00 · 2026-04-21T09:59:31Z

@teknium1 please see #3197 (comment) there is a bug in one of these merges (pull reqeusts).

) Fills documentation gaps that accumulated as features merged ahead of their docs updates. All additions are verified against code and the originating PRs. Providers: - Ollama Cloud (NousResearch#10782) — new provider section, env vars, quickstart/fallback rows - xAI Grok Responses API + TTS (NousResearch#10783) — provider note, TTS table + config - Google Gemini CLI OAuth (NousResearch#11270) — quickstart/fallback/cli-commands entries - NVIDIA NIM (NousResearch#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference - HERMES_INFERENCE_PROVIDER enum updated Messaging: - DISCORD_ALLOWED_ROLES (NousResearch#11608) — env-vars, discord.md access control section - DingTalk QR device-flow (NousResearch#11574) — wizard path in Option A + openClaw disclosure - Feishu document comment intelligent reply (NousResearch#11898) — full section + 3-tier access control + CLI Skills / commands: - concept-diagrams skill (NousResearch#11363) — optional-skills-catalog entry - /gquota (NousResearch#11270) — slash-commands reference Build: docusaurus build passes, ascii-guard lint 0 errors.

kshitijk4poor and others added 5 commits April 15, 2026 22:32

fix(agent): continue ollama glm truncation replies

8baafc3

fix(agent): sanitize surrogate characters from API responses and befo…

7d8eba1

…re API calls

teknium1 merged commit 5c39787 into main Apr 16, 2026
5 of 7 checks passed

teknium1 deleted the hermes/hermes-c66666e2 branch April 16, 2026 09:22

This was referenced Apr 16, 2026

fix: Ollama Cloud shows 0 models in /model TUI picker #10964

Closed

[Bug]: Ollama Cloud shows 0 models in /model TUI picker #10977

Closed

teknium1 mentioned this pull request Apr 18, 2026

docs: backfill coverage for recently-merged features #11942

Merged

github-actions Bot mentioned this pull request Apr 24, 2026

chore: bump NousResearch/hermes-agent version from v2026.4.16 to v2026.4.23 Docker-Hub-sirmark/docker-hermes-agent#3

Merged

el-analista mentioned this pull request Apr 26, 2026

feat(zai): add GLM-5.1 support, thinking param, and coding plan safeguards #6013

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Ollama improvements — Cloud provider, GLM continuation, think=false, surrogate sanitization, /v1 hint#10782

feat: Ollama improvements — Cloud provider, GLM continuation, think=false, surrogate sanitization, /v1 hint#10782
teknium1 merged 5 commits into
mainfrom
hermes/hermes-c66666e2

teknium1 commented Apr 16, 2026

Uh oh!

Uh oh!

arsaboo commented Apr 16, 2026

Uh oh!

Cat0x00 commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

teknium1 commented Apr 16, 2026

Summary

Changes

Contributor Attribution

PRs to close after merge

Test Results

Uh oh!

Uh oh!

arsaboo commented Apr 16, 2026

Uh oh!

Cat0x00 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Cat0x00 commented Apr 21, 2026 •

edited

Loading