Skip to content

feat(kanban): attach images referenced in task bodies to worker vision#34210

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-084c5af1
May 29, 2026
Merged

feat(kanban): attach images referenced in task bodies to worker vision#34210
teknium1 merged 1 commit into
mainfrom
hermes/hermes-084c5af1

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Kanban workers now route images referenced in a task body to the model's vision input on their first turn — matching how the CLI and gateway already handle inbound images. Before this PR, pasting /home/me/screenshot.png or https://example.com/img.png into a task description sent it to the model as plain text and the pixels were never seen.

Why it was broken

The dispatcher spawns a worker as hermes -p <profile> chat -q "work kanban task <id>". The actual task description never appears in argv; the worker reads it via the kanban tool later in the loop, where the response is a plain tool-result string — multimodal content parts can't ride in tool messages.

How the fix works

  1. agent/image_routing.py gains extract_image_refs(text)(paths, urls). It mirrors gateway/platforms/base.py:extract_local_files semantics (absolute / ~/-relative paths, image extensions only, ignores fenced and inline code) and adds an http(s)://…<image-ext> matcher.
  2. build_native_content_parts() now accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local files. Each attached image still gets a [Image attached at: …] / [Image attached: …] hint in the text part so tools that take an image path/URL as a string argument can be invoked on it.
  3. cli.py single-query/quiet branch (the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via hermes_cli.kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: any failure in extraction logs at debug and leaves worker startup unaffected.

The image-routing decision (agent.image_input_mode config + model capability) is unchanged — vision-capable models get native attachment, others fall back to the existing vision_analyze text pipeline. Local images on the text path still go through _preprocess_images_with_vision as before; URL-only bodies on the text path are left as plain text since the existing text pipeline doesn't know how to download URLs.

Changes

  • agent/image_routing.py — new extract_image_refs(text); build_native_content_parts gains image_urls= kwarg (backward-compatible default None).
  • cli.py — kanban-body extraction wired into the -q single-query path between _collect_query_images and the image-routing decision.
  • tests/agent/test_image_routing.py — +22 tests covering the extractor (paths, URLs, code blocks, dedup, case-insensitive extensions, missing files) and URL pass-through in build_native_content_parts.
  • tests/hermes_cli/test_kanban_worker_image_extraction.py — new, +10 tests driving the full pipeline against a real kanban DB (create task → read body → extract refs → build multimodal parts).

Validation

Suite Result
tests/agent/test_image_routing.py 76 → 98 passing (22 new)
tests/hermes_cli/test_kanban_worker_image_extraction.py 10 / 10 (new)
tests/agent/test_vision_routing_31179.py 12 / 12 (no regression)
tests/hermes_cli/test_kanban_cli.py 46 / 46 (no regression)
tests/hermes_cli/test_kanban_core_functionality.py 167 / 167 (no regression)
tests/plugins/test_kanban_worker_runs.py 11 / 11 (no regression)
tests/tools/test_vision_native_fast_path.py passes
tests/tools/test_computer_use_vision_routing.py passes

E2E: created a kanban task with a body referencing both a local PNG and an https://example.com/target.png URL, drove the same code path the dispatcher-spawned worker takes (HERMES_KANBAN_TASK set, _collect_query_images → extraction → build_native_content_parts), and confirmed the worker pipeline produces a 3-part multimodal user turn: 1 text part with both hints, 1 image_url data-URL for the local file, 1 image_url passthrough for the remote URL.

Infographic

kanban-image-ref-autoattach

Kanban workers now scan the task body for local image paths and
http(s) image URLs and attach them to the worker's first user turn —
matching the CLI/gateway behaviour for inbound images. Before, a
user pasting `/home/me/screenshot.png` or `https://example.com/img.png`
into a kanban task description had it sent to the model as plain
text and the pixels were never seen.

How it works:
* agent/image_routing.py gains extract_image_refs(text) → (paths, urls)
  that mirrors gateway/platforms/base.py:extract_local_files (absolute /
  ~-relative paths, image extensions only, ignores fenced/inline code).
* build_native_content_parts() accepts an optional image_urls= kwarg
  and emits passthrough image_url parts for remote URLs alongside the
  base64 data: URLs used for local paths.
* cli.py (single-query/quiet branch — the path every dispatcher-spawned
  worker takes) detects HERMES_KANBAN_TASK, reads the task body via
  kanban_db.get_task, runs extract_image_refs, and threads the results
  into the existing image-routing decision (native vs text). Best-effort:
  enrichment failures never block worker startup.

Tested:
* tests/agent/test_image_routing.py — 22 new tests for extract_image_refs
  and URL pass-through in build_native_content_parts.
* tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests
  driving real kanban_db round-trip (create task → read body → extract
  refs → build parts).
* E2E: created a fake kanban task with a body referencing both a local
  PNG and an https URL; verified the worker pipeline produces a
  multimodal user turn with 1 text part + 2 image_url parts (data URL
  for the local file, passthrough URL for the remote).
@teknium1 teknium1 merged commit 769ee86 into main May 29, 2026
25 checks passed
@teknium1 teknium1 deleted the hermes/hermes-084c5af1 branch May 29, 2026 00:50
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
NousResearch#34210)

Kanban workers now scan the task body for local image paths and
http(s) image URLs and attach them to the worker's first user turn —
matching the CLI/gateway behaviour for inbound images. Before, a
user pasting `/home/me/screenshot.png` or `https://example.com/img.png`
into a kanban task description had it sent to the model as plain
text and the pixels were never seen.

How it works:
* agent/image_routing.py gains extract_image_refs(text) → (paths, urls)
  that mirrors gateway/platforms/base.py:extract_local_files (absolute /
  ~-relative paths, image extensions only, ignores fenced/inline code).
* build_native_content_parts() accepts an optional image_urls= kwarg
  and emits passthrough image_url parts for remote URLs alongside the
  base64 data: URLs used for local paths.
* cli.py (single-query/quiet branch — the path every dispatcher-spawned
  worker takes) detects HERMES_KANBAN_TASK, reads the task body via
  kanban_db.get_task, runs extract_image_refs, and threads the results
  into the existing image-routing decision (native vs text). Best-effort:
  enrichment failures never block worker startup.

Tested:
* tests/agent/test_image_routing.py — 22 new tests for extract_image_refs
  and URL pass-through in build_native_content_parts.
* tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests
  driving real kanban_db round-trip (create task → read body → extract
  refs → build parts).
* E2E: created a fake kanban task with a body referencing both a local
  PNG and an https URL; verified the worker pipeline produces a
  multimodal user turn with 1 text part + 2 image_url parts (data URL
  for the local file, passthrough URL for the remote).
#AI commit#
KKT-OPT pushed a commit to KKT-OPT/hermes-agent that referenced this pull request May 31, 2026
NousResearch#34210)

Kanban workers now scan the task body for local image paths and
http(s) image URLs and attach them to the worker's first user turn —
matching the CLI/gateway behaviour for inbound images. Before, a
user pasting `/home/me/screenshot.png` or `https://example.com/img.png`
into a kanban task description had it sent to the model as plain
text and the pixels were never seen.

How it works:
* agent/image_routing.py gains extract_image_refs(text) → (paths, urls)
  that mirrors gateway/platforms/base.py:extract_local_files (absolute /
  ~-relative paths, image extensions only, ignores fenced/inline code).
* build_native_content_parts() accepts an optional image_urls= kwarg
  and emits passthrough image_url parts for remote URLs alongside the
  base64 data: URLs used for local paths.
* cli.py (single-query/quiet branch — the path every dispatcher-spawned
  worker takes) detects HERMES_KANBAN_TASK, reads the task body via
  kanban_db.get_task, runs extract_image_refs, and threads the results
  into the existing image-routing decision (native vs text). Best-effort:
  enrichment failures never block worker startup.

Tested:
* tests/agent/test_image_routing.py — 22 new tests for extract_image_refs
  and URL pass-through in build_native_content_parts.
* tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests
  driving real kanban_db round-trip (create task → read body → extract
  refs → build parts).
* E2E: created a fake kanban task with a body referencing both a local
  PNG and an https URL; verified the worker pipeline produces a
  multimodal user turn with 1 text part + 2 image_url parts (data URL
  for the local file, passthrough URL for the remote).
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
NousResearch#34210)

Kanban workers now scan the task body for local image paths and
http(s) image URLs and attach them to the worker's first user turn —
matching the CLI/gateway behaviour for inbound images. Before, a
user pasting `/home/me/screenshot.png` or `https://example.com/img.png`
into a kanban task description had it sent to the model as plain
text and the pixels were never seen.

How it works:
* agent/image_routing.py gains extract_image_refs(text) → (paths, urls)
  that mirrors gateway/platforms/base.py:extract_local_files (absolute /
  ~-relative paths, image extensions only, ignores fenced/inline code).
* build_native_content_parts() accepts an optional image_urls= kwarg
  and emits passthrough image_url parts for remote URLs alongside the
  base64 data: URLs used for local paths.
* cli.py (single-query/quiet branch — the path every dispatcher-spawned
  worker takes) detects HERMES_KANBAN_TASK, reads the task body via
  kanban_db.get_task, runs extract_image_refs, and threads the results
  into the existing image-routing decision (native vs text). Best-effort:
  enrichment failures never block worker startup.

Tested:
* tests/agent/test_image_routing.py — 22 new tests for extract_image_refs
  and URL pass-through in build_native_content_parts.
* tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests
  driving real kanban_db round-trip (create task → read body → extract
  refs → build parts).
* E2E: created a fake kanban task with a body referencing both a local
  PNG and an https URL; verified the worker pipeline produces a
  multimodal user turn with 1 text part + 2 image_url parts (data URL
  for the local file, passthrough URL for the remote).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant