Skip to content

fix: forward auth + use LM Studio native endpoint for local ctx probes#13293

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-7394be4f
Apr 21, 2026
Merged

fix: forward auth + use LM Studio native endpoint for local ctx probes#13293
teknium1 merged 2 commits into
mainfrom
hermes/hermes-7394be4f

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Local model users get the actual runtime context length of their loaded LM Studio model (from loaded_instances[].config.context_length) instead of the model's theoretical max or a stale 128k fallback — fixes a common source of compression failures and context-overrun crashes on local setups.

Salvage of #3185 (@tannerfokkens-maker) onto current main. The original branch was 2472 commits behind and had merge conflicts; his commit is preserved via cherry-pick with authorship intact.

Changes

  • agent/model_metadata.py: new _auth_headers() helper; thread api_key through detect_local_server_type, _query_local_context_length, and query_ollama_num_ctx (the last one added here for symmetry with the Ollama auth-proxy case). New LM-Studio-native fast path in fetch_endpoint_model_metadata — gated on is_local_endpoint() AND detect_local_server_type() == 'lm-studio', so ollama/vLLM/llama.cpp fall through unchanged.
  • gateway/run.py: the @ context-reference ctx-length lookup now forwards the runtime base_url + api_key (previously only self._base_url, no api_key).
  • run_agent.py: the startup Ollama num_ctx detection now forwards self.api_key (small follow-up for parity with the LM Studio fix).
  • scripts/release.py: AUTHOR_MAP entry for @tannerfokkens-maker's local-hostname commit email.
  • tests/agent/test_model_metadata_local_ctx.py: new tests for auth forwarding + LM Studio native endpoint.

Validation

Before After
Targeted tests (test_model_metadata_local_ctx + test_ollama_num_ctx) 30/30 pass
Metadata/ctx-length area (-k 'model_metadata or context_length or ollama') 108/108 pass
E2E: stub LM Studio w/ loaded_instance ctx=24576, max=131072 returns 131072 returns 24576
E2E: vLLM stub (non-LM-Studio) works works (no regression)
E2E: no api_key configured no auth header no auth header (baseline preserved)

Closes #3185.

Tanner Fokkens and others added 2 commits April 20, 2026 20:49
Pass the user's configured api_key through local-server detection and
context-length probes (detect_local_server_type, _query_local_context_length,
query_ollama_num_ctx) and use LM Studio's native /api/v1/models endpoint in
fetch_endpoint_model_metadata when a loaded instance is present — so the
probed context length is the actual runtime value the user loaded the model
at, not just the model's theoretical max.

Helps local-LLM users whose auto-detected context length was wrong, causing
compression failures and context-overrun crashes.
Follow-up for salvaged PR #3185:
- run_agent.py: pass self.api_key to query_ollama_num_ctx() so Ollama
  behind an auth proxy (same issue class as the LM Studio fix) can be
  probed successfully.
- scripts/release.py AUTHOR_MAP: map @tannerfokkens-maker's local-hostname
  commit email.
@teknium1 teknium1 merged commit e00d963 into main Apr 21, 2026
6 of 7 checks passed
@teknium1 teknium1 deleted the hermes/hermes-7394be4f branch April 21, 2026 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant