fix(cli): allow custom/local endpoints without API key#2556
Merged
Conversation
Local LLM servers (llama.cpp, ollama, vLLM, etc.) typically don't require authentication. When a custom base_url is configured but no API key is found, use a placeholder instead of failing with 'Provider resolver returned an empty API key.' The OpenAI SDK accepts any string as api_key, and local servers simply ignore the Authorization header. Fixes issue reported by @ThatWolfieGuy — llama.cpp stopped working after updating because the new runtime provider resolver enforces non-empty API keys even for keyless local endpoints.
|
lol just what i was looking for |
teknium1
added a commit
that referenced
this pull request
Mar 30, 2026
…auth Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't require API keys, but the auxiliary client's _resolve_custom_runtime() rejected endpoints with empty keys — causing the auto-detection chain to skip the user's local server entirely. This broke compression, summarization, and memory flush for users running local models without an OpenRouter/cloud API key. The main CLI already had this fix (PR #2556, 'no-key-required' placeholder), but the auxiliary client's resolution path was missed. Two fixes: - _resolve_custom_runtime(): use 'no-key-required' placeholder instead of returning None when base_url is present but key is empty - resolve_provider_client() custom branch: same placeholder fallback for explicit_base_url without explicit_api_key Updates 2 tests that expected the old (broken) behavior.
teknium1
added a commit
that referenced
this pull request
Mar 30, 2026
…auth (#3842) Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't require API keys, but the auxiliary client's _resolve_custom_runtime() rejected endpoints with empty keys — causing the auto-detection chain to skip the user's local server entirely. This broke compression, summarization, and memory flush for users running local models without an OpenRouter/cloud API key. The main CLI already had this fix (PR #2556, 'no-key-required' placeholder), but the auxiliary client's resolution path was missed. Two fixes: - _resolve_custom_runtime(): use 'no-key-required' placeholder instead of returning None when base_url is present but key is empty - resolve_provider_client() custom branch: same placeholder fallback for explicit_base_url without explicit_api_key Updates 2 tests that expected the old (broken) behavior.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
…fdcb4c4a fix(cli): allow custom/local endpoints without API key
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
…auth (NousResearch#3842) Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't require API keys, but the auxiliary client's _resolve_custom_runtime() rejected endpoints with empty keys — causing the auto-detection chain to skip the user's local server entirely. This broke compression, summarization, and memory flush for users running local models without an OpenRouter/cloud API key. The main CLI already had this fix (PR NousResearch#2556, 'no-key-required' placeholder), but the auxiliary client's resolution path was missed. Two fixes: - _resolve_custom_runtime(): use 'no-key-required' placeholder instead of returning None when base_url is present but key is empty - resolve_provider_client() custom branch: same placeholder fallback for explicit_base_url without explicit_api_key Updates 2 tests that expected the old (broken) behavior.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…fdcb4c4a fix(cli): allow custom/local endpoints without API key
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…auth (NousResearch#3842) Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't require API keys, but the auxiliary client's _resolve_custom_runtime() rejected endpoints with empty keys — causing the auto-detection chain to skip the user's local server entirely. This broke compression, summarization, and memory flush for users running local models without an OpenRouter/cloud API key. The main CLI already had this fix (PR NousResearch#2556, 'no-key-required' placeholder), but the auxiliary client's resolution path was missed. Two fixes: - _resolve_custom_runtime(): use 'no-key-required' placeholder instead of returning None when base_url is present but key is empty - resolve_provider_client() custom branch: same placeholder fallback for explicit_base_url without explicit_api_key Updates 2 tests that expected the old (broken) behavior.
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…fdcb4c4a fix(cli): allow custom/local endpoints without API key
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…auth (NousResearch#3842) Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't require API keys, but the auxiliary client's _resolve_custom_runtime() rejected endpoints with empty keys — causing the auto-detection chain to skip the user's local server entirely. This broke compression, summarization, and memory flush for users running local models without an OpenRouter/cloud API key. The main CLI already had this fix (PR NousResearch#2556, 'no-key-required' placeholder), but the auxiliary client's resolution path was missed. Two fixes: - _resolve_custom_runtime(): use 'no-key-required' placeholder instead of returning None when base_url is present but key is empty - resolve_provider_client() custom branch: same placeholder fallback for explicit_base_url without explicit_api_key Updates 2 tests that expected the old (broken) behavior.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…fdcb4c4a fix(cli): allow custom/local endpoints without API key
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…auth (NousResearch#3842) Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't require API keys, but the auxiliary client's _resolve_custom_runtime() rejected endpoints with empty keys — causing the auto-detection chain to skip the user's local server entirely. This broke compression, summarization, and memory flush for users running local models without an OpenRouter/cloud API key. The main CLI already had this fix (PR NousResearch#2556, 'no-key-required' placeholder), but the auxiliary client's resolution path was missed. Two fixes: - _resolve_custom_runtime(): use 'no-key-required' placeholder instead of returning None when base_url is present but key is empty - resolve_provider_client() custom branch: same placeholder fallback for explicit_base_url without explicit_api_key Updates 2 tests that expected the old (broken) behavior.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…fdcb4c4a fix(cli): allow custom/local endpoints without API key
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…auth (NousResearch#3842) Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't require API keys, but the auxiliary client's _resolve_custom_runtime() rejected endpoints with empty keys — causing the auto-detection chain to skip the user's local server entirely. This broke compression, summarization, and memory flush for users running local models without an OpenRouter/cloud API key. The main CLI already had this fix (PR NousResearch#2556, 'no-key-required' placeholder), but the auxiliary client's resolution path was missed. Two fixes: - _resolve_custom_runtime(): use 'no-key-required' placeholder instead of returning None when base_url is present but key is empty - resolve_provider_client() custom branch: same placeholder fallback for explicit_base_url without explicit_api_key Updates 2 tests that expected the old (broken) behavior.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Users running local LLM servers (llama.cpp, ollama, vLLM) get:
This broke after the runtime provider resolver was tightened to enforce non-empty API keys. Local servers don't need auth.
Fix
In
_ensure_runtime_credentials(): when a custom base_url IS configured but no API key was found, use a placeholder ("no-key-required") instead of failing. Local servers ignore the Authorization header.Only applies when the base URL is NOT an OpenRouter URL (those always need real keys).
Reported by @ThatWolfieGuy — llama.cpp with Qwen 3.5 stopped working after updating to v0.4.0.