Skip to content

Feature Request: Allow manual capability declaration (vision/reasoning/tools) in custom_providers #8731

@crazyn2

Description

@crazyn2

Problem Statement

Currently, Hermes determines model capabilities (vision, reasoning, tools) through:

  1. The models.dev database (https://models.dev/api.json)
  2. Hardcoded patterns in the codebase

This works well for popular cloud models, but breaks for custom/local models that:

  • Are not listed in models.dev
  • Run on private/internal API endpoints
  • Have custom fine-tunes with different capabilities than the base model
  • Use proxies/gateways that report generic model names

Current behavior

When using a custom provider with a local model:

custom_providers:
  - name: my-local-vllm
    base_url: http://localhost:8000/v1
    api_key: dummy
    models:
      my-llava-model:
        context_length: 8192

Hermes has no way to know that my-llava-model supports vision, so:

  • vision_analyze tool falls back to auxiliary vision model (extra API call)
  • browser_vision uses auxiliary model instead of native passthrough
  • Reasoning parameters are not sent even if the model supports it
  • Tool schemas may be incorrectly filtered

Users discover these limitations through degraded performance or cryptic errors, not upfront configuration.

Proposed Solution

Extend the custom_providers configuration to allow explicit capability declaration per model:

custom_providers:
  - name: my-local-vllm
    base_url: http://localhost:8000/v1
    api_key: dummy
    models:
      my-llava-model:
        context_length: 8192
        capabilities:
          vision: true           # Supports image input
          reasoning: false       # Does not support reasoning params
          tools: true            # Supports function calling
          streaming: true        # Supports streaming output
      my-reasoning-model:
        context_length: 32768
        capabilities:
          vision: false
          reasoning: true        # Supports reasoning_effort parameter
          tools: true
          streaming: true

Design Details

1. Configuration Schema

Extend _VALID_CUSTOM_PROVIDER_FIELDS in hermes_cli/config.py to accept a capabilities dict:

_VALID_CUSTOM_PROVIDER_FIELDS = {
    "name", "base_url", "api_key", "api_mode", "model", "models",
    "context_length", "rate_limit_delay",  # existing
    # Proposed new fields inside models.<model>:
    #   capabilities.vision: bool
    #   capabilities.reasoning: bool
    #   capabilities.tools: bool
    #   capabilities.streaming: bool
}

2. Capability Resolution Order

When determining model capabilities, Hermes should check in this priority:

  1. Explicit config override (custom_providers[].models.<model>.capabilities)
  2. models.dev database (if available)
  3. Built-in pattern matching (regex on model name)
  4. Conservative defaults (tools: true, vision/reasoning: false)

3. Integration Points

Modify these functions to respect config overrides:

  • agent/models_dev.py:get_model_capabilities()

    • Add parameter to accept config override
    • Check custom_providers before falling back to models.dev
  • run_agent.py:AIAgent._check_native_vision_support()

    • Check config override before pattern matching
  • run_agent.py:AIAgent._supports_reasoning_extra_body()

    • Check config override for reasoning capability

4. Validation & UX

  • Config validation: Warn if capabilities contains unknown keys
  • hermes doctor: Check that declared capabilities match detected ones (warn on mismatch)
  • Setup wizard: When adding custom provider, optionally ask about capabilities
$ hermes setup custom
...
Does my-llava-model support vision/multimodal? [y/N]: y
Does it support tool calling? [Y/n]: y
Does it support reasoning parameters? [y/N]: n

Benefits

  1. Correct behavior for local models: LLaVA, Qwen-VL, local fine-tunes work correctly
  2. Reduced API costs: No unnecessary auxiliary model calls for vision
  3. Better UX: Users configure once, Hermes behaves correctly everywhere
  4. Foundation for future features: Enables smart routing based on declared capabilities

Backwards Compatibility

  • Fully backwards compatible: Existing configs without capabilities continue to work
  • Optional field: All capability fields default to null (auto-detect)
  • Explicit over implicit: Only declared capabilities override auto-detection
# Old config continues to work unchanged
custom_providers:
  - name: legacy-config
    base_url: http://localhost:8000/v1
    models:
      some-model:
        context_length: 4096
        # capabilities omitted - auto-detect as before

Related Issues

Implementation Notes

This feature builds on the foundation laid by:

The implementation should follow the same pattern used for context_length override.

Checklist

  • Extend DEFAULT_CONFIG schema for custom_providers
  • Update _VALID_CUSTOM_PROVIDER_FIELDS
  • Modify get_model_capabilities() to check config override
  • Update vision support detection in run_agent.py
  • Update reasoning support detection in run_agent.py
  • Add config validation in validate_config_structure()
  • Add hermes doctor checks
  • Update setup wizard (optional)
  • Add tests
  • Update documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havearea/configConfig system, migrations, profilescomp/cliCLI entry point, hermes_cli/, setup wizardtype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions