Bug Description
When using computer_use tool (backed by cua-driver), the image analysis for SOM (Set-of-Mark) mode ignores the auxiliary.vision configuration and instead uses the main session model. This causes 404 errors when the main model doesn't support image input.
Steps to Reproduce
- Configure Hermes Agent with:
- Main model:
tencent/hy3-preview (no image support on OpenRouter)
auxiliary.vision.provider: openrouter
auxiliary.vision.model: google/gemini-2.5-flash
- Enable
computer_use toolset
- Call
computer_use with action='capture', mode='som'
Expected Behavior
cua-driver should route the image analysis request to the model specified in auxiliary.vision (google/gemini-2.5-flash).
Actual Behavior
cua-driver attempts to use the main session model (tencent/hy3-preview) for image analysis, resulting in:
🔌 Provider: openrouter Model: tencent/hy3-preview
📝 Error: HTTP 404: No endpoints found that support image input
Configuration
model:
default: tencent/hy3-preview
provider: openrouter
auxiliary:
vision:
provider: openrouter
model: google/gemini-2.5-flash
Log Evidence
2026-05-11 ... ⚠️ API call failed (attempt1/3): NotFoundError [HTTP 404]
🔌 Provider: openrouter Model: tencent/hy3-preview
📝 Error: HTTP 404: No endpoints found that support image input
Suggested Fix
Update cua-driver / computer_use tool to check for auxiliary.vision configuration and use that model/provider for image analysis tasks instead of the main session model.
Bug Description
When using
computer_usetool (backed bycua-driver), the image analysis for SOM (Set-of-Mark) mode ignores theauxiliary.visionconfiguration and instead uses the main session model. This causes 404 errors when the main model doesn't support image input.Steps to Reproduce
tencent/hy3-preview(no image support on OpenRouter)auxiliary.vision.provider: openrouterauxiliary.vision.model: google/gemini-2.5-flashcomputer_usetoolsetcomputer_usewithaction='capture', mode='som'Expected Behavior
cua-drivershould route the image analysis request to the model specified inauxiliary.vision(google/gemini-2.5-flash).Actual Behavior
cua-driverattempts to use the main session model (tencent/hy3-preview) for image analysis, resulting in:Configuration
Log Evidence
Suggested Fix
Update
cua-driver/computer_usetool to check forauxiliary.visionconfiguration and use that model/provider for image analysis tasks instead of the main session model.