fix(model_metadata): respect context_length from custom_providers config by fancydirty · Pull Request #7998 · NousResearch/hermes-agent

fancydirty · 2026-04-11T23:43:13Z

Problem

When a user sets a custom provider model as the auxiliary compression/summary model (e.g. `glm-5.1` via `zai-customize`), and explicitly declares its `context_length` inside `custom_providers[].models[]` in `config.yaml`, `get_model_context_length()` ignores that override.

Instead, it falls through to probing, cache lookup, or the default 128k fallback. This causes the compression feasibility check to falsely report that the summary model's context window is too small, disabling context compression entirely.

Fix

Teach `get_model_context_length()` to check `custom_providers[].models..context_length` immediately after explicit `config_context_length` args, before any network probing or hardcoded defaults.

Changes

`agent/model_metadata.py`: add config-driven lookup for per-model context_length overrides inside custom_providers.

Verification

`python -m pytest tests/agent/test_model_metadata.py -q` → 80 passed
Manual check: `get_model_context_length('glm-5.1')` now returns `200000` (from config) instead of `128000` (fallback).

The /new and /reset commands were not calling shutdown_memory_provider() on the cached agent before eviction. This caused OpenViking (and any memory provider that relies on session-end shutdown) to skip commit, leaving memories un-indexed until idle timeout or gateway shutdown. Add the missing shutdown_memory_provider() call in _handle_reset_command(), matching the behavior already present in the session expiry watcher. Fixes NousResearch#7759

When the active model supports vision natively (e.g. kimi-for-coding, gpt-4o, claude-3, gemini), pass image attachments as base64 image_url content parts directly instead of pre-describing them via vision enrichment. This saves an API call and improves image understanding. - Add AIAgent._check_native_vision_support() with multi-source resolution: 1. VISION_NATIVE_PASSTHROUGH env override 2. User config agent.vision_native_models (config.yaml override) 3. OpenRouter official /api/v1/models API (cached, authoritative for OR) 4. models.dev community registry (broad provider coverage) 5. Conservative hardcoded whitelist (emergency fallback) - Add AIAgent._fetch_openrouter_vision_support() with in-memory + disk cache - Add AIAgent._load_user_vision_native_models() to read config.yaml overrides - Add user_message_content param to run_conversation for multimodal input - Implement gateway passthrough logic with fallback to enrichment - Add comprehensive tests for config override, OpenRouter, models.dev, whitelist and gateway paths - Fix existing gateway test mocks to match updated signatures - Bump config schema version to 15

…it hallucination

…ishu albums

Resolved conflicts in: - gateway/run.py (native vision passthrough vs session/runtime resolution) - run_agent.py (image serialization in conversation messages) Kept native vision passthrough logic and integrated with latest session key/runtime resolution. All affected tests passing (470 passed).

- _load_config() now merges disk config (config.yaml) with runtime config (CLI_CONFIG), fixing stale delegation settings in long-running gateway processes. - Add tests for disk-only, runtime-override, and empty-runtime merge. - Fix STT test return value unpacking from merge.

get_model_context_length() now checks custom_providers[].models[] for a config-driven context_length override before falling back to probing / defaults. This fixes compression feasibility warnings for models like glm-5.1 whose context_length is declared in config.yaml but was previously ignored.

Copilot

Pull request overview

This PR aims to fix model metadata resolution so get_model_context_length() respects per-model context_length overrides from custom_providers config, but it also introduces broader changes around delegation config loading and native vision passthrough support in the agent + gateway.

Changes:

Add config-driven lookup for custom_providers[].models[model].context_length in get_model_context_length().
Merge delegation config from disk + runtime for delegate_task behavior.
Add native vision passthrough plumbing (vision support detection, gateway message_content propagation) and related tests/config defaults.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`agent/model_metadata.py`	Adds config-driven per-model context length override lookup via `custom_providers[].models`.
`tools/delegate_tool.py`	Changes delegation config loading to merge disk + runtime configs.
`tests/tools/test_delegate.py`	Adds tests for `_load_config()` merge behavior.
`run_agent.py`	Adds native vision capability detection (OpenRouter/models.dev/whitelist), caching, and `user_message_content` passthrough.
`hermes_cli/config.py`	Adds `agent.vision_native_models` to default config and bumps `_config_version`.
`gateway/run.py`	Propagates `message_content` for native vision and updates inbound message preparation to return `(text, content_parts)`.
`gateway/platforms/telegram.py`	Adjusts Telegram media group wait defaults (env-configurable, longer default).
`gateway/platforms/feishu.py`	Increases default media batch delay.
`tests/gateway/test_telegram_documents.py`	Speeds up Telegram adapter waits for tests.
`tests/gateway/test_stt_config.py`	Updates for new `_prepare_inbound_message_text()` tuple return.
`tests/gateway/test_session_model_override_routing.py`	Updates test agent stub signature to accept `user_message_content`.
`tests/gateway/test_run_progress_topics.py`	Updates test agent stub signatures to accept `user_message_content`.
`tests/gateway/test_reasoning_command.py`	Updates test agent stub signature to accept `user_message_content`.
`tests/gateway/test_fast_command.py`	Updates test agent stub signature to accept `user_message_content`.
`tests/run_agent/test_native_vision_support.py`	Adds unit tests for vision support detection and message content passthrough/persistence behavior.
`tests/gateway/test_native_vision_gateway.py`	Adds gateway-level tests for native vision passthrough vs enrichment fallback.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T23:49:12Z

+        _cfg = load_config()
+        _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
+        for cp in _custom_providers:
+            if not isinstance(cp, dict):
+                continue
+            _models = cp.get("models", {})
+            if isinstance(_models, dict):
+                _ctx = _models.get(model, {}).get("context_length")
+                if isinstance(_ctx, int) and _ctx > 0:
+                    return _ctx


The custom_providers per-model context_length override is applied without checking which custom provider is active. Because this ignores the provider and base_url arguments, a matching model name in any custom_providers entry can override context length even for unrelated providers/endpoints (and ordering would decide which wins). Consider scoping the lookup to the relevant custom_providers entry (match on provider name/slug and/or base_url) before returning the override.

Suggested change

_cfg = load_config()

_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []

for cp in _custom_providers:

if not isinstance(cp, dict):

continue

_models = cp.get("models", {})

if isinstance(_models, dict):

_ctx = _models.get(model, {}).get("context_length")

if isinstance(_ctx, int) and _ctx > 0:

return _ctx

def _normalize_provider_value(value: Any) -> Optional[str]:

if not isinstance(value, str):

return None

value = value.strip().lower()

return value or None

def _normalize_base_url_value(value: Any) -> Optional[str]:

if not isinstance(value, str):

return None

value = value.strip()

if not value:

return None

parsed = urlparse(value)

scheme = parsed.scheme.lower()

netloc = parsed.netloc.lower()

path = parsed.path.rstrip("/")

if scheme or netloc:

return f"{scheme}://{netloc}{path}"

return value.rstrip("/").lower()

_cfg = load_config()

_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []

_active_provider = _normalize_provider_value(provider)

_active_base_url = _normalize_base_url_value(base_url)

_valid_custom_provider_count = sum(1 for cp in _custom_providers if isinstance(cp, dict))

for cp in _custom_providers:

if not isinstance(cp, dict):

continue

_provider_keys = ("provider", "slug", "name", "id")

_base_url_keys = ("base_url", "api_base", "endpoint", "url")

_cp_provider_values = {

v for v in (_normalize_provider_value(cp.get(key)) for key in _provider_keys) if v

}

_cp_base_urls = {

v for v in (_normalize_base_url_value(cp.get(key)) for key in _base_url_keys) if v

}

_has_scope_metadata = bool(_cp_provider_values or _cp_base_urls)

if _active_provider or _active_base_url:

_provider_matches = bool(

_active_provider and _cp_provider_values and _active_provider in _cp_provider_values

)

_base_url_matches = bool(

_active_base_url and _cp_base_urls and _active_base_url in _cp_base_urls

)

if _has_scope_metadata:

_matched_available_scope = False

if _active_provider and _cp_provider_values:

_matched_available_scope = _matched_available_scope or _provider_matches

if _active_base_url and _cp_base_urls:

_matched_available_scope = _matched_available_scope or _base_url_matches

if not _matched_available_scope:

continue

elif _valid_custom_provider_count != 1:

# Unscoped entries are ambiguous when multiple custom providers exist.

continue

_models = cp.get("models", {})

if isinstance(_models, dict):

_model_entry = _models.get(model, {})

if isinstance(_model_entry, dict):

_ctx = _model_entry.get("context_length")

if isinstance(_ctx, int) and _ctx > 0:

return _ctx

Copilot · 2026-04-11T23:49:13Z

+    # 0a. Config-driven overrides from custom_providers[].models[].context_length
+    try:
+        from hermes_cli.config import load_config
+        _cfg = load_config()
+        _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
+        for cp in _custom_providers:
+            if not isinstance(cp, dict):
+                continue
+            _models = cp.get("models", {})
+            if isinstance(_models, dict):
+                _ctx = _models.get(model, {}).get("context_length")
+                if isinstance(_ctx, int) and _ctx > 0:
+                    return _ctx


This new config-driven custom_providers[].models[].context_length path isn’t covered by tests in tests/agent/test_model_metadata.py. Adding a unit test that sets load_config() to return a custom_providers entry with a per-model context_length (and verifies correct scoping to the selected provider/base_url) would prevent regressions.

Suggested change

# 0a. Config-driven overrides from custom_providers[].models[].context_length

try:

from hermes_cli.config import load_config

_cfg = load_config()

_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []

for cp in _custom_providers:

if not isinstance(cp, dict):

continue

_models = cp.get("models", {})

if isinstance(_models, dict):

_ctx = _models.get(model, {}).get("context_length")

if isinstance(_ctx, int) and _ctx > 0:

return _ctx

# 0a. Config-driven overrides from custom_providers[].models[].context_length.

# Scope these overrides to the selected custom provider when base_url is

# known, so identical model IDs on different endpoints do not leak across

# providers. If no base_url is provided, preserve the historical behavior

# of allowing any matching custom provider model override.

try:

from hermes_cli.config import load_config

_cfg = load_config()

_custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []

_normalized_base_url = base_url.rstrip("/") if isinstance(base_url, str) and base_url else None

for cp in _custom_providers:

if not isinstance(cp, dict):

continue

if _normalized_base_url is not None:

_provider_url = cp.get("base_url") or cp.get("url") or cp.get("endpoint")

if not isinstance(_provider_url, str):

continue

if _provider_url.rstrip("/") != _normalized_base_url:

continue

_models = cp.get("models", {})

if isinstance(_models, dict):

_model_cfg = _models.get(model, {})

if isinstance(_model_cfg, dict):

_ctx = _model_cfg.get("context_length")

if isinstance(_ctx, int) and _ctx > 0:

return _ctx

Copilot · 2026-04-11T23:49:13Z

+    @staticmethod
+    def _fetch_openrouter_vision_support(model: str) -> Optional[bool]:
+        """Query OpenRouter /api/v1/models for vision support. Returns True/False or None on failure."""
+        global _OPENROUTER_MODELS_CACHE, _OPENROUTER_MODELS_CACHE_TIME


The PR title/description focus on model context_length overrides, but this PR also introduces native vision passthrough detection/caching logic (plus associated gateway + config changes). Please update the PR description to include these additional changes or split them into separate PRs so review/rollback scope is clear.

Copilot · 2026-04-11T23:49:13Z

+                    cfg = yaml.safe_load(f) or {}
+                models = cfg.get("agent", {}).get("vision_native_models", [])
+                if isinstance(models, list):
+                    return [m.lower() for m in models if isinstance(m, str)]


_load_user_vision_native_models() includes any string entries verbatim (lowercased). If the list contains an empty/whitespace string, the subsequent any(m in model_id ...) check will match every model and enable native vision passthrough globally. Filter out empty/whitespace-only entries (e.g., strip before including) to avoid this footgun.

Suggested change

return [m.lower() for m in models if isinstance(m, str)]

return [normalized for m in models if isinstance(m, str) and (normalized := m.strip().lower())]

Copilot · 2026-04-11T23:49:14Z

+                try:
+                    from run_agent import AIAgent
+                    _gw_cfg = _load_gateway_config()
+                    _model, _runtime = self._resolve_session_agent_runtime(
+                        source=source, user_config=_gw_cfg


This async function calls AIAgent._check_native_vision_support() inline to decide passthrough. That helper can do blocking I/O (OpenRouter /api/v1/models via requests, models.dev fetch, and config.yaml disk reads), which would block the event loop when images arrive. Consider ensuring this detection path is non-blocking (e.g., only use cached/local heuristics here, or run the blocking capability lookup in an executor and cache the result).

Copilot · 2026-04-11T23:49:14Z

+                    _parts = []
+                    if message_text:
+                        _parts.append({"type": "text", "text": message_text})
+                    for _img_path in image_paths:
+                        try:
+                            import base64
+                            from pathlib import Path
+                            _img_data = Path(_img_path).read_bytes()
+                            _b64 = base64.b64encode(_img_data).decode("ascii")
+                            _suffix = Path(_img_path).suffix.lower()
+                            _mime = {
+                                ".png": "image/png",
+                                ".gif": "image/gif",
+                                ".webp": "image/webp",
+                                ".bmp": "image/bmp",
+                            }.get(_suffix, "image/jpeg")
+                            _parts.append({
+                                "type": "image_url",
+                                "image_url": {"url": f"data:{_mime};base64,{_b64}"},
+                            })


When native passthrough is enabled, images are read and base64-encoded synchronously (Path.read_bytes + base64.b64encode) inside the async event loop. For large images or bursts this can noticeably block the loop. Consider offloading the file read/encode to a thread (asyncio.to_thread) or using async I/O, and enforcing/validating a max size to avoid excessive memory/latency.

Suggested change

_parts = []

if message_text:

_parts.append({"type": "text", "text": message_text})

for _img_path in image_paths:

try:

import base64

from pathlib import Path

_img_data = Path(_img_path).read_bytes()

_b64 = base64.b64encode(_img_data).decode("ascii")

_suffix = Path(_img_path).suffix.lower()

_mime = {

".png": "image/png",

".gif": "image/gif",

".webp": "image/webp",

".bmp": "image/bmp",

}.get(_suffix, "image/jpeg")

_parts.append({

"type": "image_url",

"image_url": {"url": f"data:{_mime};base64,{_b64}"},

})

_max_native_image_bytes = 10 * 1024 * 1024

async def _encode_image_for_native_vision(_img_path: str) -> Dict[str, Any]:

def _read_and_encode_image() -> Dict[str, Any]:

import base64

_path = Path(_img_path)

_size = _path.stat().st_size

if _size > _max_native_image_bytes:

raise ValueError(

f"image size {_size} exceeds native passthrough limit "

f"of {_max_native_image_bytes} bytes"

)

_img_data = _path.read_bytes()

_b64 = base64.b64encode(_img_data).decode("ascii")

_suffix = _path.suffix.lower()

_mime = {

".png": "image/png",

".gif": "image/gif",

".webp": "image/webp",

".bmp": "image/bmp",

}.get(_suffix, "image/jpeg")

return {

"type": "image_url",

"image_url": {"url": f"data:{_mime};base64,{_b64}"},

}

return await asyncio.to_thread(_read_and_encode_image)

_parts = []

if message_text:

_parts.append({"type": "text", "text": message_text})

for _img_path in image_paths:

try:

_parts.append(await _encode_image_for_native_vision(_img_path))

Copilot · 2026-04-11T23:49:14Z

+    # Merge: disk config is the base, runtime overrides only non-empty values.
+    # This fixes stale CLI_CONFIG at runtime when config.yaml is edited after
+    # the process starts (e.g. gateway sessions).
+    merged = dict(disk_cfg)
+    for k, v in runtime_cfg.items():
+        if v not in (None, ""):


The merge logic will still keep stale runtime values for non-string fields like max_iterations/default_toolsets: CLI_CONFIG always has non-empty defaults (e.g. max_iterations=45), so edits to delegation.max_iterations in config.yaml won’t take effect at runtime even though the comment says this fixes stale CLI_CONFIG. Consider preferring disk_cfg when available and only overlaying runtime_cfg for keys that are truly runtime-only/explicitly set (or treat empty containers / default values as non-overrides).

Suggested change

# Merge: disk config is the base, runtime overrides only non-empty values.

# This fixes stale CLI_CONFIG at runtime when config.yaml is edited after

# the process starts (e.g. gateway sessions).

merged = dict(disk_cfg)

for k, v in runtime_cfg.items():

if v not in (None, ""):

# Merge: prefer persistent config when present, and only use runtime

# values to fill keys missing from disk. This avoids stale CLI_CONFIG

# defaults (for example non-string fields like max_iterations or

# default_toolsets) masking config.yaml edits after the process starts.

merged = dict(disk_cfg)

for k, v in runtime_cfg.items():

if k not in merged and v not in (None, ""):

fancydirty · 2026-04-12T00:05:28Z

Superseded by #8005 (cleaner branch with no unrelated vision passthrough changes).

fancydirty added 7 commits April 11, 2026 22:26

fix(gateway): increase Telegram media group wait to prevent album spl…

ba15b5e

…it hallucination

fix(gateway): increase media batch delay defaults for Telegram and Fe…

20a109a

…ishu albums

Copilot AI review requested due to automatic review settings April 11, 2026 23:43

Copilot started reviewing on behalf of fancydirty April 11, 2026 23:43 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

fancydirty closed this Apr 12, 2026

crazyn2 mentioned this pull request Apr 13, 2026

Feature Request: Allow manual capability declaration (vision/reasoning/tools) in custom_providers #8731

Open

10 tasks

alt-glitch mentioned this pull request May 12, 2026

fix(context): respect custom provider per-model context_length overrides #24460

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(model_metadata): respect context_length from custom_providers config#7998

fix(model_metadata): respect context_length from custom_providers config#7998
fancydirty wants to merge 7 commits into
NousResearch:mainfrom
fancydirty:fix/model-metadata-custom-context

fancydirty commented Apr 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

fancydirty commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        _cfg = load_config()
-        _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
-        for cp in _custom_providers:
-            if not isinstance(cp, dict):
-                continue
-            _models = cp.get("models", {})
-            if isinstance(_models, dict):
-                _ctx = _models.get(model, {}).get("context_length")
-                if isinstance(_ctx, int) and _ctx > 0:
-                    return _ctx
+        def _normalize_provider_value(value: Any) -> Optional[str]:
+            if not isinstance(value, str):
+                return None
+            value = value.strip().lower()
+            return value or None
+        def _normalize_base_url_value(value: Any) -> Optional[str]:
+            if not isinstance(value, str):
+                return None
+            value = value.strip()
+            if not value:
+                return None
+            parsed = urlparse(value)
+            scheme = parsed.scheme.lower()
+            netloc = parsed.netloc.lower()
+            path = parsed.path.rstrip("/")
+            if scheme or netloc:
+                return f"{scheme}://{netloc}{path}"
+            return value.rstrip("/").lower()
+        _cfg = load_config()
+        _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else []
+        _active_provider = _normalize_provider_value(provider)
+        _active_base_url = _normalize_base_url_value(base_url)
+        _valid_custom_provider_count = sum(1 for cp in _custom_providers if isinstance(cp, dict))
+        for cp in _custom_providers:
+            if not isinstance(cp, dict):
+                continue
+            _provider_keys = ("provider", "slug", "name", "id")
+            _base_url_keys = ("base_url", "api_base", "endpoint", "url")
+            _cp_provider_values = {
+                v for v in (_normalize_provider_value(cp.get(key)) for key in _provider_keys) if v
+            }
+            _cp_base_urls = {
+                v for v in (_normalize_base_url_value(cp.get(key)) for key in _base_url_keys) if v
+            }
+            _has_scope_metadata = bool(_cp_provider_values or _cp_base_urls)
+            if _active_provider or _active_base_url:
+                _provider_matches = bool(
+                    _active_provider and _cp_provider_values and _active_provider in _cp_provider_values
+                )
+                _base_url_matches = bool(
+                    _active_base_url and _cp_base_urls and _active_base_url in _cp_base_urls
+                )
+                if _has_scope_metadata:
+                    _matched_available_scope = False
+                    if _active_provider and _cp_provider_values:
+                        _matched_available_scope = _matched_available_scope or _provider_matches
+                    if _active_base_url and _cp_base_urls:
+                        _matched_available_scope = _matched_available_scope or _base_url_matches
+                    if not _matched_available_scope:
+                        continue
+                elif _valid_custom_provider_count != 1:
+                    # Unscoped entries are ambiguous when multiple custom providers exist.
+                    continue
+            _models = cp.get("models", {})
+            if isinstance(_models, dict):
+                _model_entry = _models.get(model, {})
+                if isinstance(_model_entry, dict):
+                    _ctx = _model_entry.get("context_length")
+                    if isinstance(_ctx, int) and _ctx > 0:
+                        return _ctx

	return [m.lower() for m in models if isinstance(m, str)]
	return [normalized for m in models if isinstance(m, str) and (normalized := m.strip().lower())]

Conversation

fancydirty commented Apr 11, 2026

Problem

Fix

Changes

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

fancydirty commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants