Skip to content

fix(model_metadata): respect context_length from custom_providers config#7998

Closed
fancydirty wants to merge 7 commits into
NousResearch:mainfrom
fancydirty:fix/model-metadata-custom-context
Closed

fix(model_metadata): respect context_length from custom_providers config#7998
fancydirty wants to merge 7 commits into
NousResearch:mainfrom
fancydirty:fix/model-metadata-custom-context

Conversation

@fancydirty

Copy link
Copy Markdown
Contributor

Problem

When a user sets a custom provider model as the auxiliary compression/summary model (e.g. `glm-5.1` via `zai-customize`), and explicitly declares its `context_length` inside `custom_providers[].models[]` in `config.yaml`, `get_model_context_length()` ignores that override.

Instead, it falls through to probing, cache lookup, or the default 128k fallback. This causes the compression feasibility check to falsely report that the summary model's context window is too small, disabling context compression entirely.

Fix

Teach `get_model_context_length()` to check `custom_providers[].models..context_length` immediately after explicit `config_context_length` args, before any network probing or hardcoded defaults.

Changes

  • `agent/model_metadata.py`: add config-driven lookup for per-model context_length overrides inside custom_providers.

Verification

  • `python -m pytest tests/agent/test_model_metadata.py -q` → 80 passed
  • Manual check: `get_model_context_length('glm-5.1')` now returns `200000` (from config) instead of `128000` (fallback).

The /new and /reset commands were not calling shutdown_memory_provider()
on the cached agent before eviction. This caused OpenViking (and any
memory provider that relies on session-end shutdown) to skip commit,
leaving memories un-indexed until idle timeout or gateway shutdown.

Add the missing shutdown_memory_provider() call in _handle_reset_command(),
matching the behavior already present in the session expiry watcher.

Fixes NousResearch#7759
When the active model supports vision natively (e.g. kimi-for-coding,
gpt-4o, claude-3, gemini), pass image attachments as base64 image_url
content parts directly instead of pre-describing them via vision
enrichment. This saves an API call and improves image understanding.

- Add AIAgent._check_native_vision_support() with multi-source resolution:
  1. VISION_NATIVE_PASSTHROUGH env override
  2. User config agent.vision_native_models (config.yaml override)
  3. OpenRouter official /api/v1/models API (cached, authoritative for OR)
  4. models.dev community registry (broad provider coverage)
  5. Conservative hardcoded whitelist (emergency fallback)
- Add AIAgent._fetch_openrouter_vision_support() with in-memory + disk cache
- Add AIAgent._load_user_vision_native_models() to read config.yaml overrides
- Add user_message_content param to run_conversation for multimodal input
- Implement gateway passthrough logic with fallback to enrichment
- Add comprehensive tests for config override, OpenRouter, models.dev,
  whitelist and gateway paths
- Fix existing gateway test mocks to match updated signatures
- Bump config schema version to 15
Resolved conflicts in:
- gateway/run.py (native vision passthrough vs session/runtime resolution)
- run_agent.py (image serialization in conversation messages)

Kept native vision passthrough logic and integrated with latest session
key/runtime resolution. All affected tests passing (470 passed).
- _load_config() now merges disk config (config.yaml) with runtime
  config (CLI_CONFIG), fixing stale delegation settings in long-running
  gateway processes.
- Add tests for disk-only, runtime-override, and empty-runtime merge.
- Fix STT test return value unpacking from merge.
get_model_context_length() now checks custom_providers[].models[]
for a config-driven context_length override before falling back to
probing / defaults. This fixes compression feasibility warnings for
models like glm-5.1 whose context_length is declared in config.yaml
but was previously ignored.
Copilot AI review requested due to automatic review settings April 11, 2026 23:43

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix model metadata resolution so get_model_context_length() respects per-model context_length overrides from custom_providers config, but it also introduces broader changes around delegation config loading and native vision passthrough support in the agent + gateway.

Changes:

  • Add config-driven lookup for custom_providers[].models[model].context_length in get_model_context_length().
  • Merge delegation config from disk + runtime for delegate_task behavior.
  • Add native vision passthrough plumbing (vision support detection, gateway message_content propagation) and related tests/config defaults.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
agent/model_metadata.py Adds config-driven per-model context length override lookup via custom_providers[].models.
tools/delegate_tool.py Changes delegation config loading to merge disk + runtime configs.
tests/tools/test_delegate.py Adds tests for _load_config() merge behavior.
run_agent.py Adds native vision capability detection (OpenRouter/models.dev/whitelist), caching, and user_message_content passthrough.
hermes_cli/config.py Adds agent.vision_native_models to default config and bumps _config_version.
gateway/run.py Propagates message_content for native vision and updates inbound message preparation to return (text, content_parts).
gateway/platforms/telegram.py Adjusts Telegram media group wait defaults (env-configurable, longer default).
gateway/platforms/feishu.py Increases default media batch delay.
tests/gateway/test_telegram_documents.py Speeds up Telegram adapter waits for tests.
tests/gateway/test_stt_config.py Updates for new _prepare_inbound_message_text() tuple return.
tests/gateway/test_session_model_override_routing.py Updates test agent stub signature to accept user_message_content.
tests/gateway/test_run_progress_topics.py Updates test agent stub signatures to accept user_message_content.
tests/gateway/test_reasoning_command.py Updates test agent stub signature to accept user_message_content.
tests/gateway/test_fast_command.py Updates test agent stub signature to accept user_message_content.
tests/run_agent/test_native_vision_support.py Adds unit tests for vision support detection and message content passthrough/persistence behavior.
tests/gateway/test_native_vision_gateway.py Adds gateway-level tests for native vision passthrough vs enrichment fallback.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/model_metadata.py
Comment on lines +940 to +949
_cfg = load_config()
_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
for cp in _custom_providers:
if not isinstance(cp, dict):
continue
_models = cp.get("models", {})
if isinstance(_models, dict):
_ctx = _models.get(model, {}).get("context_length")
if isinstance(_ctx, int) and _ctx > 0:
return _ctx

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom_providers per-model context_length override is applied without checking which custom provider is active. Because this ignores the provider and base_url arguments, a matching model name in any custom_providers entry can override context length even for unrelated providers/endpoints (and ordering would decide which wins). Consider scoping the lookup to the relevant custom_providers entry (match on provider name/slug and/or base_url) before returning the override.

Suggested change
_cfg = load_config()
_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
for cp in _custom_providers:
if not isinstance(cp, dict):
continue
_models = cp.get("models", {})
if isinstance(_models, dict):
_ctx = _models.get(model, {}).get("context_length")
if isinstance(_ctx, int) and _ctx > 0:
return _ctx
def _normalize_provider_value(value: Any) -> Optional[str]:
if not isinstance(value, str):
return None
value = value.strip().lower()
return value or None
def _normalize_base_url_value(value: Any) -> Optional[str]:
if not isinstance(value, str):
return None
value = value.strip()
if not value:
return None
parsed = urlparse(value)
scheme = parsed.scheme.lower()
netloc = parsed.netloc.lower()
path = parsed.path.rstrip("/")
if scheme or netloc:
return f"{scheme}://{netloc}{path}"
return value.rstrip("/").lower()
_cfg = load_config()
_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
_active_provider = _normalize_provider_value(provider)
_active_base_url = _normalize_base_url_value(base_url)
_valid_custom_provider_count = sum(1 for cp in _custom_providers if isinstance(cp, dict))
for cp in _custom_providers:
if not isinstance(cp, dict):
continue
_provider_keys = ("provider", "slug", "name", "id")
_base_url_keys = ("base_url", "api_base", "endpoint", "url")
_cp_provider_values = {
v for v in (_normalize_provider_value(cp.get(key)) for key in _provider_keys) if v
}
_cp_base_urls = {
v for v in (_normalize_base_url_value(cp.get(key)) for key in _base_url_keys) if v
}
_has_scope_metadata = bool(_cp_provider_values or _cp_base_urls)
if _active_provider or _active_base_url:
_provider_matches = bool(
_active_provider and _cp_provider_values and _active_provider in _cp_provider_values
)
_base_url_matches = bool(
_active_base_url and _cp_base_urls and _active_base_url in _cp_base_urls
)
if _has_scope_metadata:
_matched_available_scope = False
if _active_provider and _cp_provider_values:
_matched_available_scope = _matched_available_scope or _provider_matches
if _active_base_url and _cp_base_urls:
_matched_available_scope = _matched_available_scope or _base_url_matches
if not _matched_available_scope:
continue
elif _valid_custom_provider_count != 1:
# Unscoped entries are ambiguous when multiple custom providers exist.
continue
_models = cp.get("models", {})
if isinstance(_models, dict):
_model_entry = _models.get(model, {})
if isinstance(_model_entry, dict):
_ctx = _model_entry.get("context_length")
if isinstance(_ctx, int) and _ctx > 0:
return _ctx

Copilot uses AI. Check for mistakes.
Comment thread agent/model_metadata.py
Comment on lines +937 to +949
# 0a. Config-driven overrides from custom_providers[].models[].context_length
try:
from hermes_cli.config import load_config
_cfg = load_config()
_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
for cp in _custom_providers:
if not isinstance(cp, dict):
continue
_models = cp.get("models", {})
if isinstance(_models, dict):
_ctx = _models.get(model, {}).get("context_length")
if isinstance(_ctx, int) and _ctx > 0:
return _ctx

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new config-driven custom_providers[].models[].context_length path isn’t covered by tests in tests/agent/test_model_metadata.py. Adding a unit test that sets load_config() to return a custom_providers entry with a per-model context_length (and verifies correct scoping to the selected provider/base_url) would prevent regressions.

Suggested change
# 0a. Config-driven overrides from custom_providers[].models[].context_length
try:
from hermes_cli.config import load_config
_cfg = load_config()
_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
for cp in _custom_providers:
if not isinstance(cp, dict):
continue
_models = cp.get("models", {})
if isinstance(_models, dict):
_ctx = _models.get(model, {}).get("context_length")
if isinstance(_ctx, int) and _ctx > 0:
return _ctx
# 0a. Config-driven overrides from custom_providers[].models[].context_length.
# Scope these overrides to the selected custom provider when base_url is
# known, so identical model IDs on different endpoints do not leak across
# providers. If no base_url is provided, preserve the historical behavior
# of allowing any matching custom provider model override.
try:
from hermes_cli.config import load_config
_cfg = load_config()
_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
_normalized_base_url = base_url.rstrip("/") if isinstance(base_url, str) and base_url else None
for cp in _custom_providers:
if not isinstance(cp, dict):
continue
if _normalized_base_url is not None:
_provider_url = cp.get("base_url") or cp.get("url") or cp.get("endpoint")
if not isinstance(_provider_url, str):
continue
if _provider_url.rstrip("/") != _normalized_base_url:
continue
_models = cp.get("models", {})
if isinstance(_models, dict):
_model_cfg = _models.get(model, {})
if isinstance(_model_cfg, dict):
_ctx = _model_cfg.get("context_length")
if isinstance(_ctx, int) and _ctx > 0:
return _ctx

Copilot uses AI. Check for mistakes.
Comment thread run_agent.py
Comment on lines +5651 to +5654
@staticmethod
def _fetch_openrouter_vision_support(model: str) -> Optional[bool]:
"""Query OpenRouter /api/v1/models for vision support. Returns True/False or None on failure."""
global _OPENROUTER_MODELS_CACHE, _OPENROUTER_MODELS_CACHE_TIME

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focus on model context_length overrides, but this PR also introduces native vision passthrough detection/caching logic (plus associated gateway + config changes). Please update the PR description to include these additional changes or split them into separate PRs so review/rollback scope is clear.

Copilot uses AI. Check for mistakes.
Comment thread run_agent.py
cfg = yaml.safe_load(f) or {}
models = cfg.get("agent", {}).get("vision_native_models", [])
if isinstance(models, list):
return [m.lower() for m in models if isinstance(m, str)]

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_load_user_vision_native_models() includes any string entries verbatim (lowercased). If the list contains an empty/whitespace string, the subsequent any(m in model_id ...) check will match every model and enable native vision passthrough globally. Filter out empty/whitespace-only entries (e.g., strip before including) to avoid this footgun.

Suggested change
return [m.lower() for m in models if isinstance(m, str)]
return [normalized for m in models if isinstance(m, str) and (normalized := m.strip().lower())]

Copilot uses AI. Check for mistakes.
Comment thread gateway/run.py
Comment on lines +2829 to +2833
try:
from run_agent import AIAgent
_gw_cfg = _load_gateway_config()
_model, _runtime = self._resolve_session_agent_runtime(
source=source, user_config=_gw_cfg

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This async function calls AIAgent._check_native_vision_support() inline to decide passthrough. That helper can do blocking I/O (OpenRouter /api/v1/models via requests, models.dev fetch, and config.yaml disk reads), which would block the event loop when images arrive. Consider ensuring this detection path is non-blocking (e.g., only use cached/local heuristics here, or run the blocking capability lookup in an executor and cache the result).

Copilot uses AI. Check for mistakes.
Comment thread gateway/run.py
Comment on lines +2845 to +2864
_parts = []
if message_text:
_parts.append({"type": "text", "text": message_text})
for _img_path in image_paths:
try:
import base64
from pathlib import Path
_img_data = Path(_img_path).read_bytes()
_b64 = base64.b64encode(_img_data).decode("ascii")
_suffix = Path(_img_path).suffix.lower()
_mime = {
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
".bmp": "image/bmp",
}.get(_suffix, "image/jpeg")
_parts.append({
"type": "image_url",
"image_url": {"url": f"data:{_mime};base64,{_b64}"},
})

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When native passthrough is enabled, images are read and base64-encoded synchronously (Path.read_bytes + base64.b64encode) inside the async event loop. For large images or bursts this can noticeably block the loop. Consider offloading the file read/encode to a thread (asyncio.to_thread) or using async I/O, and enforcing/validating a max size to avoid excessive memory/latency.

Suggested change
_parts = []
if message_text:
_parts.append({"type": "text", "text": message_text})
for _img_path in image_paths:
try:
import base64
from pathlib import Path
_img_data = Path(_img_path).read_bytes()
_b64 = base64.b64encode(_img_data).decode("ascii")
_suffix = Path(_img_path).suffix.lower()
_mime = {
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
".bmp": "image/bmp",
}.get(_suffix, "image/jpeg")
_parts.append({
"type": "image_url",
"image_url": {"url": f"data:{_mime};base64,{_b64}"},
})
_max_native_image_bytes = 10 * 1024 * 1024
async def _encode_image_for_native_vision(_img_path: str) -> Dict[str, Any]:
def _read_and_encode_image() -> Dict[str, Any]:
import base64
_path = Path(_img_path)
_size = _path.stat().st_size
if _size > _max_native_image_bytes:
raise ValueError(
f"image size {_size} exceeds native passthrough limit "
f"of {_max_native_image_bytes} bytes"
)
_img_data = _path.read_bytes()
_b64 = base64.b64encode(_img_data).decode("ascii")
_suffix = _path.suffix.lower()
_mime = {
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
".bmp": "image/bmp",
}.get(_suffix, "image/jpeg")
return {
"type": "image_url",
"image_url": {"url": f"data:{_mime};base64,{_b64}"},
}
return await asyncio.to_thread(_read_and_encode_image)
_parts = []
if message_text:
_parts.append({"type": "text", "text": message_text})
for _img_path in image_paths:
try:
_parts.append(await _encode_image_for_native_vision(_img_path))

Copilot uses AI. Check for mistakes.
Comment thread tools/delegate_tool.py
Comment on lines +945 to +950
# Merge: disk config is the base, runtime overrides only non-empty values.
# This fixes stale CLI_CONFIG at runtime when config.yaml is edited after
# the process starts (e.g. gateway sessions).
merged = dict(disk_cfg)
for k, v in runtime_cfg.items():
if v not in (None, ""):

Copilot AI Apr 11, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge logic will still keep stale runtime values for non-string fields like max_iterations/default_toolsets: CLI_CONFIG always has non-empty defaults (e.g. max_iterations=45), so edits to delegation.max_iterations in config.yaml won’t take effect at runtime even though the comment says this fixes stale CLI_CONFIG. Consider preferring disk_cfg when available and only overlaying runtime_cfg for keys that are truly runtime-only/explicitly set (or treat empty containers / default values as non-overrides).

Suggested change
# Merge: disk config is the base, runtime overrides only non-empty values.
# This fixes stale CLI_CONFIG at runtime when config.yaml is edited after
# the process starts (e.g. gateway sessions).
merged = dict(disk_cfg)
for k, v in runtime_cfg.items():
if v not in (None, ""):
# Merge: prefer persistent config when present, and only use runtime
# values to fill keys missing from disk. This avoids stale CLI_CONFIG
# defaults (for example non-string fields like max_iterations or
# default_toolsets) masking config.yaml edits after the process starts.
merged = dict(disk_cfg)
for k, v in runtime_cfg.items():
if k not in merged and v not in (None, ""):

Copilot uses AI. Check for mistakes.
@fancydirty

Copy link
Copy Markdown
Contributor Author

Superseded by #8005 (cleaner branch with no unrelated vision passthrough changes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants