feat(kanban): attach images referenced in task bodies to worker vision by teknium1 · Pull Request #34210 · NousResearch/hermes-agent

teknium1 · 2026-05-29T00:41:12Z

Summary

Kanban workers now route images referenced in a task body to the model's vision input on their first turn — matching how the CLI and gateway already handle inbound images. Before this PR, pasting /home/me/screenshot.png or https://example.com/img.png into a task description sent it to the model as plain text and the pixels were never seen.

Why it was broken

The dispatcher spawns a worker as hermes -p <profile> chat -q "work kanban task <id>". The actual task description never appears in argv; the worker reads it via the kanban tool later in the loop, where the response is a plain tool-result string — multimodal content parts can't ride in tool messages.

How the fix works

agent/image_routing.py gains extract_image_refs(text) → (paths, urls). It mirrors gateway/platforms/base.py:extract_local_files semantics (absolute / ~/-relative paths, image extensions only, ignores fenced and inline code) and adds an http(s)://…<image-ext> matcher.
build_native_content_parts() now accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local files. Each attached image still gets a [Image attached at: …] / [Image attached: …] hint in the text part so tools that take an image path/URL as a string argument can be invoked on it.
cli.py single-query/quiet branch (the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via hermes_cli.kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: any failure in extraction logs at debug and leaves worker startup unaffected.

The image-routing decision (agent.image_input_mode config + model capability) is unchanged — vision-capable models get native attachment, others fall back to the existing vision_analyze text pipeline. Local images on the text path still go through _preprocess_images_with_vision as before; URL-only bodies on the text path are left as plain text since the existing text pipeline doesn't know how to download URLs.

Changes

agent/image_routing.py — new extract_image_refs(text); build_native_content_parts gains image_urls= kwarg (backward-compatible default None).
cli.py — kanban-body extraction wired into the -q single-query path between _collect_query_images and the image-routing decision.
tests/agent/test_image_routing.py — +22 tests covering the extractor (paths, URLs, code blocks, dedup, case-insensitive extensions, missing files) and URL pass-through in build_native_content_parts.
tests/hermes_cli/test_kanban_worker_image_extraction.py — new, +10 tests driving the full pipeline against a real kanban DB (create task → read body → extract refs → build multimodal parts).

Validation

Suite	Result
`tests/agent/test_image_routing.py`	76 → 98 passing (22 new)
`tests/hermes_cli/test_kanban_worker_image_extraction.py`	10 / 10 (new)
`tests/agent/test_vision_routing_31179.py`	12 / 12 (no regression)
`tests/hermes_cli/test_kanban_cli.py`	46 / 46 (no regression)
`tests/hermes_cli/test_kanban_core_functionality.py`	167 / 167 (no regression)
`tests/plugins/test_kanban_worker_runs.py`	11 / 11 (no regression)
`tests/tools/test_vision_native_fast_path.py`	passes
`tests/tools/test_computer_use_vision_routing.py`	passes

E2E: created a kanban task with a body referencing both a local PNG and an https://example.com/target.png URL, drove the same code path the dispatcher-spawned worker takes (HERMES_KANBAN_TASK set, _collect_query_images → extraction → build_native_content_parts), and confirmed the worker pipeline produces a 3-part multimodal user turn: 1 text part with both hints, 1 image_url data-URL for the local file, 1 image_url passthrough for the remote URL.

Infographic

Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote).

NousResearch#34210) Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote). #AI commit#

NousResearch#34210) Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote).

teknium1 merged commit 769ee86 into main May 29, 2026
25 checks passed

teknium1 deleted the hermes/hermes-084c5af1 branch May 29, 2026 00:50

Haderach-Ram mentioned this pull request May 29, 2026

Ecosystem Digest — 2026-05-29 Haderach-Ram/openclaw-radar#22

Open

BrewTestBot mentioned this pull request May 29, 2026

hermes-agent 2026.5.29 Homebrew/homebrew-core#285204

Merged

1 task

github-actions Bot mentioned this pull request May 29, 2026

chore: bump NousResearch/hermes-agent version from v2026.5.16 to v2026.5.29 Docker-Hub-sirmark/docker-hermes-agent#7

Merged

BrewTestBot mentioned this pull request Jun 6, 2026

hermes-agent 2026.6.5 Homebrew/homebrew-core#286569

Merged

1 task

github-actions Bot mentioned this pull request Jun 6, 2026

chore: bump NousResearch/hermes-agent version from v2026.5.29.2 to v2026.6.5 Docker-Hub-sirmark/docker-hermes-agent#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kanban): attach images referenced in task bodies to worker vision#34210

feat(kanban): attach images referenced in task bodies to worker vision#34210
teknium1 merged 1 commit into
mainfrom
hermes/hermes-084c5af1

teknium1 commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented May 29, 2026

Summary

Why it was broken

How the fix works

Changes

Validation

Infographic

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant