Skip to content

feat(agent): add LM Studio as first-class provider (salvage #17061)#17102

Merged
kshitijk4poor merged 6 commits into
mainfrom
salvage/lmstudio-17061
Apr 28, 2026
Merged

feat(agent): add LM Studio as first-class provider (salvage #17061)#17102
kshitijk4poor merged 6 commits into
mainfrom
salvage/lmstudio-17061

Conversation

@kshitijk4poor

@kshitijk4poor kshitijk4poor commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

Salvage of #17061 by @rugvedS07 — promotes LM Studio from a "custom" provider alias to a first-class provider with JIT model loading, reasoning effort negotiation, and no-auth support.

Cherry-picked all 5 contributor commits onto current main with follow-up fixes from self-review.

What this PR does

From the contributor (#17061)

  • First-class provider registration: PROVIDER_REGISTRY, HERMES_OVERLAYS, aliases (lmstudio, lm-studio, lm_studio) now resolve to "lmstudio" instead of "custom"
  • Live model discovery via LM Studio's native /api/v1/models endpoint (filters embedding models)
  • JIT model loading via POST /api/v1/models/load — automatically loads the selected model with at least 64K context if not already loaded with sufficient context, capped at max_context_length
  • Reasoning effort negotiation — reads per-model capabilities.reasoning.allowed_options from LM Studio and clamps the user's reasoning config to what the model supports. Shared resolve_lmstudio_effort() keeps transport and summary paths in sync
  • No-auth mode with LMSTUDIO_NOAUTH_PLACEHOLDER for users running LM Studio without auth enabled
  • /model autocomplete with cached LM Studio model list (30s TTL, gated on LM Studio config)
  • hermes status shows LM Studio reachability and model count
  • Model validation via native API: distinguishes auth failure (401) from unreachable server from empty model list
  • Context length handling: excludes LM Studio from persistent context cache (loaded context is transient — user can reload with different context_length anytime)
  • Docs update for provider page
  • 450+ tests covering reasoning effort clamping, provider resolution, credential resolution, model validation, picker probing

Follow-up fixes (from self-review)

Finding Fix
Dead code: _lmstudio_loaded_context set but never read in run_agent.py Removed — context_compressor.update_model() is the actual consumer
N+1 HTTP probe: empty reasoning options not cached → per-turn HTTP call for non-reasoning models Added 60s TTL cache for empty results; non-empty cached permanently
Code duplication: URL-strip + auth-header + HTTP-call pattern repeated in 3 functions in models.py Extracted _lmstudio_server_root(), _lmstudio_request_headers(), _lmstudio_fetch_raw_models() shared helpers
Pre-existing test flake: TUI test_make_agent_passes_resolved_provider Relaxed target_model assertion to avoid module-cache ordering issue
Missing AUTHOR_MAP: rugved@lmstudio.ai Added → rugvedS07

Scoped out from the original PR

Config version bump 22→23 removed. The original PR added a migration that wrote LM_API_KEY=dummy-lm-api-key to .env when provider: lmstudio was set in config.yaml. This is unnecessary — resolve_api_key_provider_credentials() already substitutes LMSTUDIO_NOAUTH_PLACEHOLDER at runtime when no key is found. The migration would write a fake credential to .env (which is for secrets only), confuse users who inspect it, and create a maintenance asymmetry if the placeholder value ever changes. The runtime fallback covers all code paths (CLI, gateway, TUI, cron, auxiliary_client).

runtime_provider.py base URL precedence change reverted. The original PR changed the base URL resolution for all api_key providers (not just LM Studio) — flipping priority from config > env > default to env > config > default when both model.base_url in config AND the provider-specific env var (e.g. GLM_BASE_URL, LM_BASE_URL, KIMI_BASE_URL) are set. This affected ~30 providers (ZAI, MiniMax, Kimi, DeepSeek, HF, NVIDIA, etc.) for the sake of one LM Studio edge case. Reverted to preserve the established contract — LM Studio works correctly with the original logic since resolve_api_key_provider_credentials already incorporates LM_BASE_URL into creds["base_url"], and saved model.base_url from config properly overrides the registry default.

API stability

All LM Studio endpoints used are officially documented v1 API (stable since LM Studio 0.4.0):

  • GET /api/v1/models — model listing with capabilities
  • POST /api/v1/models/load with context_length param
  • capabilities.reasoning.allowed_options — documented since v0.3.16

No competing agent (Continue, Cline, OpenCode, Aider) implements JIT model loading or reasoning negotiation — this is novel.

Test plan

  • 5351 passed, 0 new failures (29 failures all pre-existing on main)
  • 450 PR-specific tests pass
  • E2E: alias resolution, PROVIDER_REGISTRY, no-auth placeholder, env var override, HERMES_OVERLAYS, auto-detection skip, reasoning effort, model metadata cache bypass
  • E2E: runtime_provider base_url precedence unchanged for ZAI, LM Studio
  • Self-review: 3-agent parallel review → all criticals/warnings addressed
  • CI (rely on GitHub CI for full suite)

Based on PR #17061 by @rugvedS07.

Comment thread hermes_cli/models.py Dismissed
Comment thread hermes_cli/models.py Dismissed
Comment thread hermes_cli/models.py Fixed
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/tui Terminal UI (ui-tui/ + tui_gateway/) area/config Config system, migrations, profiles labels Apr 28, 2026
@kshitijk4poor kshitijk4poor force-pushed the salvage/lmstudio-17061 branch 2 times, most recently from bced623 to 36131c5 Compare April 28, 2026 18:36
Comment thread hermes_cli/models.py Dismissed
rugvedS07 and others added 6 commits April 29, 2026 00:56
- Remove dead _lmstudio_loaded_context attribute from run_agent.py (set
  but never read — the loaded context is pushed to context_compressor.update_model
  which is the actual consumer)
- Cache empty reasoning options with 60s TTL to avoid per-turn HTTP probe
  for non-reasoning LM Studio models. Non-empty results cached permanently.
- Extract _lmstudio_server_root(), _lmstudio_request_headers(), and
  _lmstudio_fetch_raw_models() shared helpers in models.py — eliminates
  URL-strip + auth-header + HTTP-call duplication across probe_lmstudio_models,
  ensure_lmstudio_model_loaded, and lmstudio_model_reasoning_options
- Revert runtime_provider.py base_url precedence change: preserve the
  established contract (saved config.base_url > env var > default) for all
  api_key providers
- Remove unnecessary config version bump 22→23
- Fix TUI test: relax target_model assertion to avoid module-cache flake
- AUTHOR_MAP: added rugved@lmstudio.ai → rugvedS07
@kshitijk4poor kshitijk4poor force-pushed the salvage/lmstudio-17061 branch from 36131c5 to ca8f414 Compare April 28, 2026 19:27
@kshitijk4poor kshitijk4poor merged commit 5d2f9b5 into main Apr 28, 2026
9 of 12 checks passed
@kshitijk4poor kshitijk4poor deleted the salvage/lmstudio-17061 branch April 28, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/tui Terminal UI (ui-tui/ + tui_gateway/) P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants