fix(model_metadata): respect context_length from custom_providers config#7998
fix(model_metadata): respect context_length from custom_providers config#7998fancydirty wants to merge 7 commits into
Conversation
The /new and /reset commands were not calling shutdown_memory_provider() on the cached agent before eviction. This caused OpenViking (and any memory provider that relies on session-end shutdown) to skip commit, leaving memories un-indexed until idle timeout or gateway shutdown. Add the missing shutdown_memory_provider() call in _handle_reset_command(), matching the behavior already present in the session expiry watcher. Fixes NousResearch#7759
When the active model supports vision natively (e.g. kimi-for-coding, gpt-4o, claude-3, gemini), pass image attachments as base64 image_url content parts directly instead of pre-describing them via vision enrichment. This saves an API call and improves image understanding. - Add AIAgent._check_native_vision_support() with multi-source resolution: 1. VISION_NATIVE_PASSTHROUGH env override 2. User config agent.vision_native_models (config.yaml override) 3. OpenRouter official /api/v1/models API (cached, authoritative for OR) 4. models.dev community registry (broad provider coverage) 5. Conservative hardcoded whitelist (emergency fallback) - Add AIAgent._fetch_openrouter_vision_support() with in-memory + disk cache - Add AIAgent._load_user_vision_native_models() to read config.yaml overrides - Add user_message_content param to run_conversation for multimodal input - Implement gateway passthrough logic with fallback to enrichment - Add comprehensive tests for config override, OpenRouter, models.dev, whitelist and gateway paths - Fix existing gateway test mocks to match updated signatures - Bump config schema version to 15
Resolved conflicts in: - gateway/run.py (native vision passthrough vs session/runtime resolution) - run_agent.py (image serialization in conversation messages) Kept native vision passthrough logic and integrated with latest session key/runtime resolution. All affected tests passing (470 passed).
- _load_config() now merges disk config (config.yaml) with runtime config (CLI_CONFIG), fixing stale delegation settings in long-running gateway processes. - Add tests for disk-only, runtime-override, and empty-runtime merge. - Fix STT test return value unpacking from merge.
get_model_context_length() now checks custom_providers[].models[] for a config-driven context_length override before falling back to probing / defaults. This fixes compression feasibility warnings for models like glm-5.1 whose context_length is declared in config.yaml but was previously ignored.
There was a problem hiding this comment.
Pull request overview
This PR aims to fix model metadata resolution so get_model_context_length() respects per-model context_length overrides from custom_providers config, but it also introduces broader changes around delegation config loading and native vision passthrough support in the agent + gateway.
Changes:
- Add config-driven lookup for
custom_providers[].models[model].context_lengthinget_model_context_length(). - Merge delegation config from disk + runtime for
delegate_taskbehavior. - Add native vision passthrough plumbing (vision support detection, gateway message_content propagation) and related tests/config defaults.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
agent/model_metadata.py |
Adds config-driven per-model context length override lookup via custom_providers[].models. |
tools/delegate_tool.py |
Changes delegation config loading to merge disk + runtime configs. |
tests/tools/test_delegate.py |
Adds tests for _load_config() merge behavior. |
run_agent.py |
Adds native vision capability detection (OpenRouter/models.dev/whitelist), caching, and user_message_content passthrough. |
hermes_cli/config.py |
Adds agent.vision_native_models to default config and bumps _config_version. |
gateway/run.py |
Propagates message_content for native vision and updates inbound message preparation to return (text, content_parts). |
gateway/platforms/telegram.py |
Adjusts Telegram media group wait defaults (env-configurable, longer default). |
gateway/platforms/feishu.py |
Increases default media batch delay. |
tests/gateway/test_telegram_documents.py |
Speeds up Telegram adapter waits for tests. |
tests/gateway/test_stt_config.py |
Updates for new _prepare_inbound_message_text() tuple return. |
tests/gateway/test_session_model_override_routing.py |
Updates test agent stub signature to accept user_message_content. |
tests/gateway/test_run_progress_topics.py |
Updates test agent stub signatures to accept user_message_content. |
tests/gateway/test_reasoning_command.py |
Updates test agent stub signature to accept user_message_content. |
tests/gateway/test_fast_command.py |
Updates test agent stub signature to accept user_message_content. |
tests/run_agent/test_native_vision_support.py |
Adds unit tests for vision support detection and message content passthrough/persistence behavior. |
tests/gateway/test_native_vision_gateway.py |
Adds gateway-level tests for native vision passthrough vs enrichment fallback. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _cfg = load_config() | ||
| _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else [] | ||
| for cp in _custom_providers: | ||
| if not isinstance(cp, dict): | ||
| continue | ||
| _models = cp.get("models", {}) | ||
| if isinstance(_models, dict): | ||
| _ctx = _models.get(model, {}).get("context_length") | ||
| if isinstance(_ctx, int) and _ctx > 0: | ||
| return _ctx |
There was a problem hiding this comment.
The custom_providers per-model context_length override is applied without checking which custom provider is active. Because this ignores the provider and base_url arguments, a matching model name in any custom_providers entry can override context length even for unrelated providers/endpoints (and ordering would decide which wins). Consider scoping the lookup to the relevant custom_providers entry (match on provider name/slug and/or base_url) before returning the override.
| _cfg = load_config() | |
| _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else [] | |
| for cp in _custom_providers: | |
| if not isinstance(cp, dict): | |
| continue | |
| _models = cp.get("models", {}) | |
| if isinstance(_models, dict): | |
| _ctx = _models.get(model, {}).get("context_length") | |
| if isinstance(_ctx, int) and _ctx > 0: | |
| return _ctx | |
| def _normalize_provider_value(value: Any) -> Optional[str]: | |
| if not isinstance(value, str): | |
| return None | |
| value = value.strip().lower() | |
| return value or None | |
| def _normalize_base_url_value(value: Any) -> Optional[str]: | |
| if not isinstance(value, str): | |
| return None | |
| value = value.strip() | |
| if not value: | |
| return None | |
| parsed = urlparse(value) | |
| scheme = parsed.scheme.lower() | |
| netloc = parsed.netloc.lower() | |
| path = parsed.path.rstrip("/") | |
| if scheme or netloc: | |
| return f"{scheme}://{netloc}{path}" | |
| return value.rstrip("/").lower() | |
| _cfg = load_config() | |
| _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else [] | |
| _active_provider = _normalize_provider_value(provider) | |
| _active_base_url = _normalize_base_url_value(base_url) | |
| _valid_custom_provider_count = sum(1 for cp in _custom_providers if isinstance(cp, dict)) | |
| for cp in _custom_providers: | |
| if not isinstance(cp, dict): | |
| continue | |
| _provider_keys = ("provider", "slug", "name", "id") | |
| _base_url_keys = ("base_url", "api_base", "endpoint", "url") | |
| _cp_provider_values = { | |
| v for v in (_normalize_provider_value(cp.get(key)) for key in _provider_keys) if v | |
| } | |
| _cp_base_urls = { | |
| v for v in (_normalize_base_url_value(cp.get(key)) for key in _base_url_keys) if v | |
| } | |
| _has_scope_metadata = bool(_cp_provider_values or _cp_base_urls) | |
| if _active_provider or _active_base_url: | |
| _provider_matches = bool( | |
| _active_provider and _cp_provider_values and _active_provider in _cp_provider_values | |
| ) | |
| _base_url_matches = bool( | |
| _active_base_url and _cp_base_urls and _active_base_url in _cp_base_urls | |
| ) | |
| if _has_scope_metadata: | |
| _matched_available_scope = False | |
| if _active_provider and _cp_provider_values: | |
| _matched_available_scope = _matched_available_scope or _provider_matches | |
| if _active_base_url and _cp_base_urls: | |
| _matched_available_scope = _matched_available_scope or _base_url_matches | |
| if not _matched_available_scope: | |
| continue | |
| elif _valid_custom_provider_count != 1: | |
| # Unscoped entries are ambiguous when multiple custom providers exist. | |
| continue | |
| _models = cp.get("models", {}) | |
| if isinstance(_models, dict): | |
| _model_entry = _models.get(model, {}) | |
| if isinstance(_model_entry, dict): | |
| _ctx = _model_entry.get("context_length") | |
| if isinstance(_ctx, int) and _ctx > 0: | |
| return _ctx |
| # 0a. Config-driven overrides from custom_providers[].models[].context_length | ||
| try: | ||
| from hermes_cli.config import load_config | ||
| _cfg = load_config() | ||
| _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else [] | ||
| for cp in _custom_providers: | ||
| if not isinstance(cp, dict): | ||
| continue | ||
| _models = cp.get("models", {}) | ||
| if isinstance(_models, dict): | ||
| _ctx = _models.get(model, {}).get("context_length") | ||
| if isinstance(_ctx, int) and _ctx > 0: | ||
| return _ctx |
There was a problem hiding this comment.
This new config-driven custom_providers[].models[].context_length path isn’t covered by tests in tests/agent/test_model_metadata.py. Adding a unit test that sets load_config() to return a custom_providers entry with a per-model context_length (and verifies correct scoping to the selected provider/base_url) would prevent regressions.
| # 0a. Config-driven overrides from custom_providers[].models[].context_length | |
| try: | |
| from hermes_cli.config import load_config | |
| _cfg = load_config() | |
| _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else [] | |
| for cp in _custom_providers: | |
| if not isinstance(cp, dict): | |
| continue | |
| _models = cp.get("models", {}) | |
| if isinstance(_models, dict): | |
| _ctx = _models.get(model, {}).get("context_length") | |
| if isinstance(_ctx, int) and _ctx > 0: | |
| return _ctx | |
| # 0a. Config-driven overrides from custom_providers[].models[].context_length. | |
| # Scope these overrides to the selected custom provider when base_url is | |
| # known, so identical model IDs on different endpoints do not leak across | |
| # providers. If no base_url is provided, preserve the historical behavior | |
| # of allowing any matching custom provider model override. | |
| try: | |
| from hermes_cli.config import load_config | |
| _cfg = load_config() | |
| _custom_providers = _cfg.get("custom_providers", []) if isinstance(_cfg, dict) else [] | |
| _normalized_base_url = base_url.rstrip("/") if isinstance(base_url, str) and base_url else None | |
| for cp in _custom_providers: | |
| if not isinstance(cp, dict): | |
| continue | |
| if _normalized_base_url is not None: | |
| _provider_url = cp.get("base_url") or cp.get("url") or cp.get("endpoint") | |
| if not isinstance(_provider_url, str): | |
| continue | |
| if _provider_url.rstrip("/") != _normalized_base_url: | |
| continue | |
| _models = cp.get("models", {}) | |
| if isinstance(_models, dict): | |
| _model_cfg = _models.get(model, {}) | |
| if isinstance(_model_cfg, dict): | |
| _ctx = _model_cfg.get("context_length") | |
| if isinstance(_ctx, int) and _ctx > 0: | |
| return _ctx |
| @staticmethod | ||
| def _fetch_openrouter_vision_support(model: str) -> Optional[bool]: | ||
| """Query OpenRouter /api/v1/models for vision support. Returns True/False or None on failure.""" | ||
| global _OPENROUTER_MODELS_CACHE, _OPENROUTER_MODELS_CACHE_TIME |
There was a problem hiding this comment.
The PR title/description focus on model context_length overrides, but this PR also introduces native vision passthrough detection/caching logic (plus associated gateway + config changes). Please update the PR description to include these additional changes or split them into separate PRs so review/rollback scope is clear.
| cfg = yaml.safe_load(f) or {} | ||
| models = cfg.get("agent", {}).get("vision_native_models", []) | ||
| if isinstance(models, list): | ||
| return [m.lower() for m in models if isinstance(m, str)] |
There was a problem hiding this comment.
_load_user_vision_native_models() includes any string entries verbatim (lowercased). If the list contains an empty/whitespace string, the subsequent any(m in model_id ...) check will match every model and enable native vision passthrough globally. Filter out empty/whitespace-only entries (e.g., strip before including) to avoid this footgun.
| return [m.lower() for m in models if isinstance(m, str)] | |
| return [normalized for m in models if isinstance(m, str) and (normalized := m.strip().lower())] |
| try: | ||
| from run_agent import AIAgent | ||
| _gw_cfg = _load_gateway_config() | ||
| _model, _runtime = self._resolve_session_agent_runtime( | ||
| source=source, user_config=_gw_cfg |
There was a problem hiding this comment.
This async function calls AIAgent._check_native_vision_support() inline to decide passthrough. That helper can do blocking I/O (OpenRouter /api/v1/models via requests, models.dev fetch, and config.yaml disk reads), which would block the event loop when images arrive. Consider ensuring this detection path is non-blocking (e.g., only use cached/local heuristics here, or run the blocking capability lookup in an executor and cache the result).
| _parts = [] | ||
| if message_text: | ||
| _parts.append({"type": "text", "text": message_text}) | ||
| for _img_path in image_paths: | ||
| try: | ||
| import base64 | ||
| from pathlib import Path | ||
| _img_data = Path(_img_path).read_bytes() | ||
| _b64 = base64.b64encode(_img_data).decode("ascii") | ||
| _suffix = Path(_img_path).suffix.lower() | ||
| _mime = { | ||
| ".png": "image/png", | ||
| ".gif": "image/gif", | ||
| ".webp": "image/webp", | ||
| ".bmp": "image/bmp", | ||
| }.get(_suffix, "image/jpeg") | ||
| _parts.append({ | ||
| "type": "image_url", | ||
| "image_url": {"url": f"data:{_mime};base64,{_b64}"}, | ||
| }) |
There was a problem hiding this comment.
When native passthrough is enabled, images are read and base64-encoded synchronously (Path.read_bytes + base64.b64encode) inside the async event loop. For large images or bursts this can noticeably block the loop. Consider offloading the file read/encode to a thread (asyncio.to_thread) or using async I/O, and enforcing/validating a max size to avoid excessive memory/latency.
| _parts = [] | |
| if message_text: | |
| _parts.append({"type": "text", "text": message_text}) | |
| for _img_path in image_paths: | |
| try: | |
| import base64 | |
| from pathlib import Path | |
| _img_data = Path(_img_path).read_bytes() | |
| _b64 = base64.b64encode(_img_data).decode("ascii") | |
| _suffix = Path(_img_path).suffix.lower() | |
| _mime = { | |
| ".png": "image/png", | |
| ".gif": "image/gif", | |
| ".webp": "image/webp", | |
| ".bmp": "image/bmp", | |
| }.get(_suffix, "image/jpeg") | |
| _parts.append({ | |
| "type": "image_url", | |
| "image_url": {"url": f"data:{_mime};base64,{_b64}"}, | |
| }) | |
| _max_native_image_bytes = 10 * 1024 * 1024 | |
| async def _encode_image_for_native_vision(_img_path: str) -> Dict[str, Any]: | |
| def _read_and_encode_image() -> Dict[str, Any]: | |
| import base64 | |
| _path = Path(_img_path) | |
| _size = _path.stat().st_size | |
| if _size > _max_native_image_bytes: | |
| raise ValueError( | |
| f"image size {_size} exceeds native passthrough limit " | |
| f"of {_max_native_image_bytes} bytes" | |
| ) | |
| _img_data = _path.read_bytes() | |
| _b64 = base64.b64encode(_img_data).decode("ascii") | |
| _suffix = _path.suffix.lower() | |
| _mime = { | |
| ".png": "image/png", | |
| ".gif": "image/gif", | |
| ".webp": "image/webp", | |
| ".bmp": "image/bmp", | |
| }.get(_suffix, "image/jpeg") | |
| return { | |
| "type": "image_url", | |
| "image_url": {"url": f"data:{_mime};base64,{_b64}"}, | |
| } | |
| return await asyncio.to_thread(_read_and_encode_image) | |
| _parts = [] | |
| if message_text: | |
| _parts.append({"type": "text", "text": message_text}) | |
| for _img_path in image_paths: | |
| try: | |
| _parts.append(await _encode_image_for_native_vision(_img_path)) |
| # Merge: disk config is the base, runtime overrides only non-empty values. | ||
| # This fixes stale CLI_CONFIG at runtime when config.yaml is edited after | ||
| # the process starts (e.g. gateway sessions). | ||
| merged = dict(disk_cfg) | ||
| for k, v in runtime_cfg.items(): | ||
| if v not in (None, ""): |
There was a problem hiding this comment.
The merge logic will still keep stale runtime values for non-string fields like max_iterations/default_toolsets: CLI_CONFIG always has non-empty defaults (e.g. max_iterations=45), so edits to delegation.max_iterations in config.yaml won’t take effect at runtime even though the comment says this fixes stale CLI_CONFIG. Consider preferring disk_cfg when available and only overlaying runtime_cfg for keys that are truly runtime-only/explicitly set (or treat empty containers / default values as non-overrides).
| # Merge: disk config is the base, runtime overrides only non-empty values. | |
| # This fixes stale CLI_CONFIG at runtime when config.yaml is edited after | |
| # the process starts (e.g. gateway sessions). | |
| merged = dict(disk_cfg) | |
| for k, v in runtime_cfg.items(): | |
| if v not in (None, ""): | |
| # Merge: prefer persistent config when present, and only use runtime | |
| # values to fill keys missing from disk. This avoids stale CLI_CONFIG | |
| # defaults (for example non-string fields like max_iterations or | |
| # default_toolsets) masking config.yaml edits after the process starts. | |
| merged = dict(disk_cfg) | |
| for k, v in runtime_cfg.items(): | |
| if k not in merged and v not in (None, ""): |
|
Superseded by #8005 (cleaner branch with no unrelated vision passthrough changes). |
Problem
When a user sets a custom provider model as the auxiliary compression/summary model (e.g. `glm-5.1` via `zai-customize`), and explicitly declares its `context_length` inside `custom_providers[].models[]` in `config.yaml`, `get_model_context_length()` ignores that override.
Instead, it falls through to probing, cache lookup, or the default 128k fallback. This causes the compression feasibility check to falsely report that the summary model's context window is too small, disabling context compression entirely.
Fix
Teach `get_model_context_length()` to check `custom_providers[].models..context_length` immediately after explicit `config_context_length` args, before any network probing or hardcoded defaults.
Changes
Verification