feat(kanban): attach images referenced in task bodies to worker vision#34210
Merged
Conversation
Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote).
1 task
Bryce-huang
pushed a commit
to wbkunlun/hermes-agent
that referenced
this pull request
May 29, 2026
NousResearch#34210) Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote). #AI commit#
KKT-OPT
pushed a commit
to KKT-OPT/hermes-agent
that referenced
this pull request
May 31, 2026
NousResearch#34210) Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote).
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
NousResearch#34210) Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote).
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Kanban workers now route images referenced in a task body to the model's vision input on their first turn — matching how the CLI and gateway already handle inbound images. Before this PR, pasting
/home/me/screenshot.pngorhttps://example.com/img.pnginto a task description sent it to the model as plain text and the pixels were never seen.Why it was broken
The dispatcher spawns a worker as
hermes -p <profile> chat -q "work kanban task <id>". The actual task description never appears in argv; the worker reads it via the kanban tool later in the loop, where the response is a plain tool-result string — multimodal content parts can't ride in tool messages.How the fix works
agent/image_routing.pygainsextract_image_refs(text)→(paths, urls). It mirrorsgateway/platforms/base.py:extract_local_filessemantics (absolute /~/-relative paths, image extensions only, ignores fenced and inline code) and adds anhttp(s)://…<image-ext>matcher.build_native_content_parts()now accepts an optionalimage_urls=kwarg and emits passthroughimage_urlparts for remote URLs alongside the base64data:URLs used for local files. Each attached image still gets a[Image attached at: …]/[Image attached: …]hint in the text part so tools that take an image path/URL as a string argument can be invoked on it.cli.pysingle-query/quiet branch (the path every dispatcher-spawned worker takes) detectsHERMES_KANBAN_TASK, reads the task body viahermes_cli.kanban_db.get_task, runsextract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: any failure in extraction logs at debug and leaves worker startup unaffected.The image-routing decision (
agent.image_input_modeconfig + model capability) is unchanged — vision-capable models get native attachment, others fall back to the existingvision_analyzetext pipeline. Local images on thetextpath still go through_preprocess_images_with_visionas before; URL-only bodies on thetextpath are left as plain text since the existing text pipeline doesn't know how to download URLs.Changes
agent/image_routing.py— newextract_image_refs(text);build_native_content_partsgainsimage_urls=kwarg (backward-compatible defaultNone).cli.py— kanban-body extraction wired into the-qsingle-query path between_collect_query_imagesand the image-routing decision.tests/agent/test_image_routing.py— +22 tests covering the extractor (paths, URLs, code blocks, dedup, case-insensitive extensions, missing files) and URL pass-through inbuild_native_content_parts.tests/hermes_cli/test_kanban_worker_image_extraction.py— new, +10 tests driving the full pipeline against a real kanban DB (create task → read body → extract refs → build multimodal parts).Validation
tests/agent/test_image_routing.pytests/hermes_cli/test_kanban_worker_image_extraction.pytests/agent/test_vision_routing_31179.pytests/hermes_cli/test_kanban_cli.pytests/hermes_cli/test_kanban_core_functionality.pytests/plugins/test_kanban_worker_runs.pytests/tools/test_vision_native_fast_path.pytests/tools/test_computer_use_vision_routing.pyE2E: created a kanban task with a body referencing both a local PNG and an
https://example.com/target.pngURL, drove the same code path the dispatcher-spawned worker takes (HERMES_KANBAN_TASKset,_collect_query_images→ extraction →build_native_content_parts), and confirmed the worker pipeline produces a 3-part multimodal user turn: 1 text part with both hints, 1image_urldata-URL for the local file, 1image_urlpassthrough for the remote URL.Infographic