✨ feat(hetero-agent): support multimodal input (text + images) across CLI / shared spawn / desktop by arvinxx · Pull Request #14433 · lobehub/lobehub

arvinxx · 2026-05-05T03:27:37Z

💻 Change Type

✨ feat
♻️ refactor

🔗 Related Issue

Refs LOBE-8523 (phase 1a follow-up; unblocks phase 1b ingest by making the shared input shape multimodal-ready).

Builds on #14431 (phase 1a).

🔀 Description of Change

spawnAgent and lh hetero exec previously accepted only a flat string prompt, so attaching images required bypassing the shared package — which is exactly what HeterogeneousAgentCtr did. This PR unifies the input shape, lifts image handling into the shared layer, and exposes multimodal flags to the CLI.

New shared input module (packages/heterogeneous-agents/src/spawn/input/):

types.ts — AgentPromptInput = string | AgentContentBlock[]; image sources accept url (with optional id for cache dedupe), local path, or inline base64.
normalizeImage.ts — fetch / read / decode → { buffer, mediaType, path? }. Optional on-disk URL cache keyed by sha256(id || url). Byte-signature sniffing falls back when MIME is generic (application/octet-stream).
buildAgentInput.ts — single source of truth for per-agent serialization. Claude Code receives base64 image blocks inline in stream-json; Codex receives raw text on stdin + repeatable --image <path> flags.

spawnAgent upgrade:

prompt: AgentPromptInput (string is sugar for one text block).
New inputOptions?: { cacheDir, fetcher, imageMaterializeDir } forwarded to buildAgentInput.
Now async — image normalization (URL fetch / file read) is awaited before spawn so a failed image surfaces before the child starts.

CLI gets three input modes (apps/cli/src/commands/hetero.ts):

```bash

Terminal sugar — text + repeatable images

lh hetero exec --type claude-code
--prompt 'describe this'
--image ./screen.png
--image https://cdn.example/photo.jpg
--image data:image/png;base64,...

Programmatic / sandbox — full content-blocks JSON

lh hetero exec --type claude-code --input-json ./input.json
echo '[{"type":"text",...},{"type":"image","source":{...}}]' |
lh hetero exec --type claude-code --input-json -

Stdin auto-detect — JSON if first non-whitespace is { or [, else text

cat prompt.txt | lh hetero exec --type claude-code
```

--prompt and --input-json are mutually exclusive; --image is rejected with --input-json (the JSON should hold its own images).

Desktop refactor:

HeterogeneousAgentCtr drops ~100 lines of duplicated image cache / mime-sniffing / extension-guessing code. The two helpers (buildStreamJsonInput, resolveCliImagePaths) are now thin wrappers around buildAgentInput / normalizeImage / materializeImageToPath. Driver interface and IPC contract are unchanged.

🧪 How to Test

Tested locally
Added/updated tests

Automated:

bunx vitest run packages/heterogeneous-agents — 127/127 pass (10 new buildAgentInput tests + 2 new spawnAgent multimodal tests + 12 existing migrated to async API).
bunx vitest run apps/cli/src/commands/hetero.test.ts — 11/11 pass (4 new for --image / --input-json / mutual exclusion; existing tests adapted to the now-async spawnAgent).
bunx vitest run apps/desktop/src/main/controllers/__tests__/HeterogeneousAgentCtr.test.ts — 28/28 pass; the path-traversal security tests retargeted to the shared normalizeImage (same invariant, correct layer).
bun run type-check — clean.

End-to-end with real Claude Code:

```bash

tiny 73-byte red PNG fixture

bun apps/cli/src/index.ts hetero exec --type claude-code
--prompt "Reply with one sentence about the color."
--image /tmp/red.png

→ 'I see a solid red color.' streamed as JSONL

JSON mode

bun apps/cli/src/index.ts hetero exec --type claude-code
--input-json /tmp/input.json

→ 'I see a deep red/crimson color.'

```

Both runs produced the full event chain (stream_start → stream_chunk × N → step_complete × 2 → stream_end → agent_runtime_end), all stamped with the same auto-generated operationId, and Claude correctly identified the image color.

📝 Additional Information

Breaking change (internal): spawnAgent is now async. The only caller in-tree was lh hetero exec; it is updated. Desktop's HeterogeneousAgentCtr doesn't use spawnAgent directly (host concerns differ) so it's unaffected.

Why share the input module rather than just the CLI? Desktop was already doing the exact same image normalization manually, just below the shared layer. Lifting it removes the duplication, validates that the shared abstraction can express what desktop needs, and makes phase 1b's cloud-sandbox ingest path naturally multimodal — the server can pass content-blocks JSON straight through lh hetero exec --input-json - without re-encoding.

Why URL fetching in CLI? Cloud sandbox will pass URLs for user-attached images. Doing the fetch in shared (with disk cache) means the server doesn't need a pre-resolve step before invoking lh hetero exec, and existing desktop URL caching semantics are preserved.

🤖 Generated with Claude Code

…awn / desktop `spawnAgent` and `lh hetero exec` could only take a flat string prompt, so attaching images required bypassing the shared layer (which is what desktop actually did). This adds a unified `AgentPromptInput` shape — string sugar or an array of text/image content blocks — and lifts image handling into the shared `@lobechat/heterogeneous-agents/spawn/input` module. Image sources accept URL (with optional id for cache dedupe), local path, or inline base64. The shared `normalizeImage` fetches/reads/decodes, with optional on-disk caching keyed by `sha256(id || url)`. `materializeImageToPath` writes buffers to a cache dir (used by Codex `--image <path>`), with byte- signature sniffing fallback when MIME is generic. `buildAgentInput` is the single source of truth for per-agent serialization: Claude Code receives base64 image blocks inline in stream-json; Codex receives text on stdin + repeatable `--image <path>` flags. CLI gets three input modes: `--prompt <text>` + `--image <path|url|data:>` (repeatable), `--input-json <file|->` for full content-block JSON, and stdin auto-detection (JSON vs plain text by first non-whitespace character). Mutually-exclusive flag combinations error early. Desktop's `HeterogeneousAgentCtr` drops ~100 lines of duplicated cache / sniffing code; helpers (`buildStreamJsonInput`, `resolveCliImagePaths`) become thin wrappers around the shared functions. Driver interface and IPC contract are unchanged. `spawnAgent` is now async (image normalization fetches/reads before spawn). Verified end-to-end: `lh hetero exec --type claude-code --prompt ... --image red.png` → CC replied "I see a solid red color." `--input-json` mode also verified. 28/28 desktop tests, 11/11 CLI hetero tests, 22/22 spawn package tests pass. Refs LOBE-8523 (phase 1a follow-up before phase 1b ingest). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-05T03:27:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lobehub	Ready	Preview, Comment	May 5, 2026 2:18pm

sourcery-ai

Sorry @arvinxx, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop workspace deps on @lobechat/web-crawler and @lobechat/python-interpreter from @lobechat/types by inlining CrawlSuccessResult / CrawlErrorResult / CrawlUniformResult and PythonOutput / PythonResult into the relevant tool type modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ceaf114182

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

codecov · 2026-05-05T03:36:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.69%. Comparing base (d83f0a0) to head (4e1bd44).
⚠️ Report is 7 commits behind head on canary.

Additional details and impacted files

@@            Coverage Diff            @@
##           canary   #14433     +/-   ##
=========================================
  Coverage   68.69%   68.69%             
=========================================
  Files        2499     2499             
  Lines      214321   214321             
  Branches    22469    26091   +3622     
=========================================
  Hits       147219   147219             
  Misses      66959    66959             
  Partials      143      143

Flag	Coverage Δ
app	`62.86% <ø> (ø)`
database	`92.37% <ø> (ø)`
packages/agent-runtime	`79.94% <ø> (ø)`
packages/builtin-tool-lobe-agent	`83.41% <ø> (ø)`
packages/context-engine	`83.88% <ø> (ø)`
packages/conversation-flow	`92.43% <ø> (ø)`
packages/file-loaders	`87.60% <ø> (ø)`
packages/memory-user-memory	`74.74% <ø> (ø)`
packages/model-bank	`99.94% <ø> (ø)`
packages/model-runtime	`83.87% <ø> (ø)`
packages/prompts	`69.57% <ø> (ø)`
packages/python-interpreter	`92.90% <ø> (ø)`
packages/ssrf-safe-fetch	`0.00% <ø> (ø)`
packages/types	`5.05% <ø> (ø)`
packages/utils	`88.02% <ø> (ø)`
packages/web-crawler	`88.29% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
Store	`66.77% <ø> (ø)`
Services	`53.80% <ø> (ø)`
Server	`69.93% <ø> (ø)`
Libs	`53.81% <ø> (ø)`
Utils	`79.95% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…nder header Show the human-readable `description` arg in the gh tool's collapsed inspector chip and result-card header when provided; fall back to the extracted subcommand. Full command is still visible in the expanded Command code block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sync spawnAgent failures Two issues raised on PR #14433 review: **P1 — generic Content-Type bypassed sniffing in normalizeImage** `fetchUrlImage` accepted any non-empty `Content-Type` as the final `mediaType`, so CDN responses defaulting to `application/octet-stream` (or `text/plain`) skipped URL/byte-based detection and forwarded an unrecognized type into Claude Code's stream-json `media_type` field — Anthropic rejects those even when the bytes are a valid PNG/JPEG. The same flaw existed for base64 sources whose declared `mediaType` was generic. Introduce `pickImageMediaType(headerType, url, buffer)`: the header value is preferred only when it's a recognized `image/*` type we know how to extension- map; otherwise it falls through to URL extension hint → byte-signature sniff → raw header → `image/png` final fallback. Applied uniformly to URL fetch, URL cache hit, and base64 decode paths. Path sources are unchanged (their "header" is the file extension, which is already authoritative when present). **P2 — async spawnAgent rejections crashed the CLI** `spawnAgent` is now async and can reject during image normalization (missing local `--image` path, fetch failure, decode error). The CLI awaited it outside any try/catch, so user-input errors surfaced as unhandled rejections with stack traces instead of the friendly `log.error + process.exit` path used for prompt validation. Wrap the `await spawnAgent(...)` in try/catch, log the error message, exit 1 (matching the existing "Stream error from agent process" convention). **Tests** - `buildAgentInput.test.ts`: 3 new tests covering octet-stream URL Content-Type → byte sniff, octet-stream base64 declared type → byte sniff, generic header + URL extension hint preferred over header. - `hetero.test.ts`: 1 new test verifying spawnAgent rejection produces clean `exit(1)` instead of an unhandled rejection. Manually verified: `lh hetero exec --image /tmp/does-not-exist.png` → `[ERROR] Failed to start agent: ENOENT: no such file or directory…` + exit 1 Refs LOBE-8523. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@hezhijie0327

# 🚀 LobeHub Release (20260509) **Release Date:** May 9, 2026 **Since v2.1.56:** 236 merged PRs · 19 contributors > Agent Task System reaches general availability, the Agent Signal pipeline runs nightly self-review with skill-aware policies, the heterogeneous-agent runtime crosses replica boundaries, inline documents become a first-class context source, and bot platforms expand across Messager, Line, and Telegram. --- ## ✨ Highlights - **Agent Task System (GA)** — End-to-end task execution platform: templates, tracking, comment tools, parent reassignment, scheduled cron, and dependency-ordered batch runs. (#14540, #14515, #14517, #14272, #14246, #14418, #14403, #14488) - **Agent Signal nightly self-review** — Wired self-review loop with prompt + DB support, exponential-backoff retry on receipt listing, skill-aware policy, and improved skill-intent detection. (#14543, #14542, #14281, #14409, #14526, #14437) - **Inline documents in KB tool** — BM25 search and `docs_*` read for inline document grounding; agent documents usable as VFS. (#14494, #14222) - **Inline agent cards in chat** — `lobeAgents` markdown tag renders agent profile cards inline; clickable card after `createAgent`. (#14495, #14493) - **Heterogeneous agent runtime** — Cloud hetero exec pipeline steps 3+4 land, persistence recovers across Vercel replicas, server-side ingest/finish handler, and `lh hetero exec` CLI. (#14486, #14539, #14444, #14431) - **Bot platforms expand** — Messager, Line, DM pair policy, and messenger DB tables; Telegram API path restored. (#14442, #14207, #14211, #14496, #14519) - **Visual analysis tool** — New visual understanding tool, with trigger tracking and flattened schema. (#14378, #14399, #14550) - **DeepSeek V4 Pro as OSS default** — OSS deployments ship with DeepSeek V4 Pro by default; DeepSeek Anthropic runtime supported. (#14555, #14312) --- ## 🏗️ Core Agent & Architecture ### Agent Task System - **Task System GA** — End-to-end execution platform now available. (#14540) - **Templates, comments, reparenting** — Template tracking, comment tools, and parent reassignment. (#14515, #14517, #14488) - **Cron + dependency-ordered runs** — Scheduled status with cron editor and dependency-ordered subtask batches. (#14246, #14418, #14272) - **Inspector + chip UI + batch tasks** — Task Inspector/Render registry, batch `createTasks`/`runTasks`, and chip-based agent-documents inspector. (#14403, #14404) - **Recommend templates regardless of brief count** — Recommendations no longer suppressed when briefs are sparse. (#14508) - **Scheduling resilience** — Manual run no longer eats next scheduled tick; recurring tasks survive brief resolution. (#14304, #14348) - **Brief synthesis** — Auto-synthesize topic briefs; brief actions revamp; mute resolved-brief icon on home. (#14324, #14228, #14452) - **Task list & detail polish** — Topic operation ID exposed; task drawer Gateway reconnect. (#14282) ### Agent Signal pipeline - **Nightly self-review wired** — Prompt + DB support for the self-review loop. (#14543) - **Self-review activities push to briefs** — Activities during nightly self-reflection now create briefs. (#14437) - **Skill management policy** — New policy for Skill management running inside Agent Signal. (#14281) - **Skill intent detection & routing** — Improved detection plus direct intent handling when `hintIsSkill`. (#14409, #14526) - **Document tool outcome rendering** — Decision view restores missing document tool outcomes. (#14534) - **Exponential backoff retry** — Listing signal receipts retries with jittered backoff. (#14542) - **Easier-to-use signals** — Structural simplification + recent-activities surface for receipts. (#14290, #14326, #14407) ### Heterogeneous agent runtime - **Cloud hetero exec pipeline (steps 3 + 4)** — Refactor lands the next two stages of the cloud hetero agent execution pipeline. (#14486) - **Persistence recovery on Vercel** — Hetero state recovered across replica boundaries. (#14539) - **Server-side ingest/finish + persistence** — `aiAgent.heteroIngest` / `heteroFinish` handlers. (#14444) - **`lh hetero exec` CLI** — Standalone heterogeneous agent runs from CLI. (#14431) - **Gateway round-trip loading** — `execAgentTask` keeps the input box in loading state through the full round-trip. (#14503) - **Provider SDK type routing** — Provider routing now respects SDK type. (#14520) - **DeepSeek reasoning preserved** — `reasoning_content` preserved in OpenAI-compatible runtime for DeepSeek models. (#14546) ### Knowledge & inline docs - **KB tool BM25 + docs read** — BM25 search and `docs_*` read integrated for inline documents. (#14494) - **Agent documents as VFS** — FS-compatible output for agent documents. (#14222) - **`lobeAgents` markdown tag** — Inline agent cards rendered from a markdown tag. (#14495) - **Clickable agent card after `createAgent`** — Mentions and recommendations become clickable. (#14493) - **ExplorerTree** — Generic tree component built on `@pierre/trees` for reusable explorer surfaces. (#14094) - **Local file mention snapshots** — Mentions can now snapshot local files. (#14278) ### Architecture - **Agent Hono routes** — New agent routes added on Hono. (#14535) - **`/api/agent` migrated to Hono** — Remaining `/api/agent` routes finish their migration. (#14478) - **Agent marketplace merged into web-onboarding** — Reduces package fragmentation. (#14514) - **Producer pipeline extracted** — Shared package for the producer pipeline. (#14425) - **`agentDispatcher.selectRuntimeType`** — New runtime selection abstraction. (#14428) - **pnpm v11 migration** — Workspace consolidated. (#14316) - **Browser-compatible frontmatter parser** — Replaces `gray-matter`. (#14435) --- ## 📱 Platforms & Integrations - **Messager support** — New messager package wired into the chat surface. (#14442) - **Messenger DB tables** — IM bot integration gains its persistence layer. (#14496) - **Line bot** — Initial Line support and downstream optimization. (#14207, #14448) - **DM pair policy** — Group/DM pair-based delivery. (#14211) - **Telegram API restored** — Missing Telegram API path reconnected. (#14519) - **xAI Responses tools stabilized** — Plus unsupported parameter handling. (#14462, #14445) - **Volcengine websearch via ResponseAPI** — Built-in websearch for Volcengine. (#14216) --- ## 🤖 Models & Providers - **DeepSeek V4 Pro default for OSS** — OSS distribution defaults to DeepSeek V4 Pro. (#14555) - **DeepSeek Anthropic runtime** — Anthropic-shape runtime support for DeepSeek. (#14312) - **GPT-5.5 / GPT-5.5 Pro** — New OpenAI tier. (#14142) - **Grok 4.20 / Grok 4.3 / LobeHub-hosted Grok 4.3** — (#14253, #14382, #14446) - **Gemma 4 + provider settings normalization** — (#13313) - **gpt-image-2 + step-image-edit-2** — (#14253, #14329) - **Model bank refresh + original-pricing display** — Batch model updates and pricing surfaces. (#14070, #14391) - **Hunyuan migrated to TokenHub for Hy3 Preview** — (#14108) - **Reject lobehub model ids no longer in the bank** — (#14261) - **Hide runtime-only aliases** — Runtime-only model aliases no longer leak into the model picker. (#14552) --- ## 🖥️ User Experience ### Onboarding - **Shared prefix steps** — Language and privacy extracted as shared prefix steps. (#14538) - **Identity intervention card simplified** — Plus tool result renders cleanup. (#14505, #14506) - **Welcome polish + web-onboarding tool UI** — (#14475) - **Templates fetched from market API** — (#14286) - **Virtual model id for default onboarding model** — (#14311) - **Skip / mode-switch footer behind feature flag** — Footer guarded for desktop and web initialization. (#14560) ### Home & navigation - **Home recents performance** — Recents refresh periodically and inline task status; brief and task-template fetch overhead trimmed. (#14518, #14516) - **Home refactor + skill-connect recommendations** — Restructured home with skill-connect recommendation system. (#14266, #14214) - **Tasks in agent sidebar** — Tasks moved from welcome card into the sidebar list. (#14500) - **Sidebar collapse persists** — Home sidebar collapse state stored. (#14473) - **Agent-specific topic grouping** — Plus improved empty state and agent identity in topic search. (#14225) - **MentionMenu scroll fix** — Mention menu no longer clips inside chat input. (#14533) ### Conversation & chat - **Follow-up chips fill input** — Clicking a follow-up chip now fills the input instead of sending immediately. (#14536) - **Quick-reply chips below assistant messages** — (#14350) - **Inline single-tool assistant group + leading sentence promotion** — (#14244) - **Assistant-group rendering** — Per-segment content overrides flow into MessageContent. (#14504) - **Tool call timer fix** — Timer no longer resets when tool calls collapse or expand. (#14513) - **Streaming re-render reduction** — Reference stabilization and self-subscribing components. (#14470) - **Topic chat drawer feedback input** — (#14392) ### Skills, agents, devtools - **Managed skill folders** — Agent view displays managed skill folders and aligns delete confirmations. (#14553) - **Review tab + bulk git diffs** — New Review tab with bulk diffs; gating uses effective working directory. (#14334, #14512) - **Devtools gallery rebuild** — Plus Review polish, queue-tray images. (#14423) - **Agent mock devtools** — Playback & fixture viewer. (#14436) ### Desktop & CLI - **App tray visibility setting** — (#14463) - **Notification settings in desktop** — (#14491) - **Multimodal input across CLI / shared spawn / desktop** — (#14433) - **CLI bot + userId guide** — (#14258) --- ## 🔧 Tooling - **Visual analysis tool** — New visual understanding tool with flattened schema. (#14378, #14550) - **GitHub marketplace tool UI** — (#14420) - **Drop "Local" prefix and `____builtin` suffix from tool names** — (#14364, #14289) - **Sanitize provider tool names** — Avoids invalid characters from external providers. (#14510) - **Generation moderation context** — Moderation context passed through the generation pipeline. (#14541) - **Visual analysis trigger tracking** — (#14399) - **Claude thinking signature sanitization** — History signatures sanitized when replaying Claude conversations. (#14499) - **Responses input media sanitization** — Assistant media sanitized in Responses input. (#14497) --- ## 🔒 Security & Reliability - **Security:** Removed the `/webapi/proxy` route and dead URL-manifest plugin code to shrink the SSRF surface. (#14549) - **Security:** Sessions revoked after password reset. (#14424) - **Reliability:** Added `prompt_cache_key` to OpenAI chat requests for stable cache hits. (#14349) - **Reliability:** `onFinish` now fires even when the browser tab is backgrounded mid-SSE stream. (#14461) - **Reliability:** Better-auth session refetch preserves user fields rather than overwriting them. (#14531) - **Reliability:** User-memory queries sanitize backticks; user-memory errors now explicitly injected so failures stay visible. (#14524, #14525) - **Reliability:** Auth captcha retries handled; input loading unsticks on `auth_failed` and recoverable `auth_expired`. (#14346, #14419) - **Reliability:** Trace snapshot finalized on error path. (#14440) - **Reliability:** Drop `switchTopic` race under rapid sidebar clicks. (#14115) - **Reliability:** PDF chunking logic fixed to prevent vectorization failure. (#14327) - **Performance:** Marketplace fork uses a batched API for parallel installs. (#14537) - **Performance:** Review tab open latency cut ~9× on large dirty trees. (#14338) --- ## 👥 Contributors Huge thanks to **18 contributors** who shipped **236 merged PRs** this cycle. @hezhijie0327 · @sxjeru · @yueyinqiu · @octo-patch · @hardy-one · @Coooolfan · @CanYuanA · @BillionClaw · @arvinxx · @tjx666 · @Innei · @neko · @AmAzing129 · @rdmclin2 · @lijian · @sudongyuer · @rivertwilight · @cy948 Plus @lobehubbot for i18n and translation maintenance. --- **Full Changelog**: v2.1.56...release/weekly-20260509

…awn / desktop (lobehub#14433) * ✨ feat(hetero-agent): support multimodal input across CLI / shared spawn / desktop `spawnAgent` and `lh hetero exec` could only take a flat string prompt, so attaching images required bypassing the shared layer (which is what desktop actually did). This adds a unified `AgentPromptInput` shape — string sugar or an array of text/image content blocks — and lifts image handling into the shared `@lobechat/heterogeneous-agents/spawn/input` module. Image sources accept URL (with optional id for cache dedupe), local path, or inline base64. The shared `normalizeImage` fetches/reads/decodes, with optional on-disk caching keyed by `sha256(id || url)`. `materializeImageToPath` writes buffers to a cache dir (used by Codex `--image <path>`), with byte- signature sniffing fallback when MIME is generic. `buildAgentInput` is the single source of truth for per-agent serialization: Claude Code receives base64 image blocks inline in stream-json; Codex receives text on stdin + repeatable `--image <path>` flags. CLI gets three input modes: `--prompt <text>` + `--image <path|url|data:>` (repeatable), `--input-json <file|->` for full content-block JSON, and stdin auto-detection (JSON vs plain text by first non-whitespace character). Mutually-exclusive flag combinations error early. Desktop's `HeterogeneousAgentCtr` drops ~100 lines of duplicated cache / sniffing code; helpers (`buildStreamJsonInput`, `resolveCliImagePaths`) become thin wrappers around the shared functions. Driver interface and IPC contract are unchanged. `spawnAgent` is now async (image normalization fetches/reads before spawn). Verified end-to-end: `lh hetero exec --type claude-code --prompt ... --image red.png` → CC replied "I see a solid red color." `--input-json` mode also verified. 28/28 desktop tests, 11/11 CLI hetero tests, 22/22 spawn package tests pass. Refs LOBE-8523 (phase 1a follow-up before phase 1b ingest). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🔧 chore(cli): include types/model-bank/business-const in workspace Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ♻️ refactor(types): inline crawler and python-interpreter types Drop workspace deps on @lobechat/web-crawler and @lobechat/python-interpreter from @lobechat/types by inlining CrawlSuccessResult / CrawlErrorResult / CrawlUniformResult and PythonOutput / PythonResult into the relevant tool type modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🔖 chore(cli): bump @lobehub/cli to 0.0.10 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 💄 style(github-tool): prefer description over command in inspector/render header Show the human-readable `description` arg in the gh tool's collapsed inspector chip and result-card header when provided; fall back to the extracted subcommand. Full command is still visible in the expanded Command code block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * 🐛 fix(hetero-agent): treat generic Content-Type as unknown + handle async spawnAgent failures Two issues raised on PR lobehub#14433 review: **P1 — generic Content-Type bypassed sniffing in normalizeImage** `fetchUrlImage` accepted any non-empty `Content-Type` as the final `mediaType`, so CDN responses defaulting to `application/octet-stream` (or `text/plain`) skipped URL/byte-based detection and forwarded an unrecognized type into Claude Code's stream-json `media_type` field — Anthropic rejects those even when the bytes are a valid PNG/JPEG. The same flaw existed for base64 sources whose declared `mediaType` was generic. Introduce `pickImageMediaType(headerType, url, buffer)`: the header value is preferred only when it's a recognized `image/*` type we know how to extension- map; otherwise it falls through to URL extension hint → byte-signature sniff → raw header → `image/png` final fallback. Applied uniformly to URL fetch, URL cache hit, and base64 decode paths. Path sources are unchanged (their "header" is the file extension, which is already authoritative when present). **P2 — async spawnAgent rejections crashed the CLI** `spawnAgent` is now async and can reject during image normalization (missing local `--image` path, fetch failure, decode error). The CLI awaited it outside any try/catch, so user-input errors surfaced as unhandled rejections with stack traces instead of the friendly `log.error + process.exit` path used for prompt validation. Wrap the `await spawnAgent(...)` in try/catch, log the error message, exit 1 (matching the existing "Stream error from agent process" convention). **Tests** - `buildAgentInput.test.ts`: 3 new tests covering octet-stream URL Content-Type → byte sniff, octet-stream base64 declared type → byte sniff, generic header + URL extension hint preferred over header. - `hetero.test.ts`: 1 new test verifying spawnAgent rejection produces clean `exit(1)` instead of an unhandled rejection. Manually verified: `lh hetero exec --image /tmp/does-not-exist.png` → `[ERROR] Failed to start agent: ENOENT: no such file or directory…` + exit 1 Refs LOBE-8523. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label May 5, 2026

sourcery-ai Bot reviewed May 5, 2026

View reviewed changes

dosubot Bot added feature:agent Assistant/Agent configuration and behavior feature:vision labels May 5, 2026

arvinxx and others added 3 commits May 5, 2026 11:28

🔧 chore(cli): include types/model-bank/business-const in workspace

ceb190b

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

🔖 chore(cli): bump @lobehub/cli to 0.0.10

f8d3264

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed May 5, 2026

View reviewed changes

Comment thread packages/heterogeneous-agents/src/spawn/input/normalizeImage.ts Outdated

Comment thread apps/cli/src/commands/hetero.ts Outdated

vercel Bot deployed to Preview May 5, 2026 03:39 View deployment

arvinxx and others added 2 commits May 5, 2026 22:04

vercel Bot deployed to Preview May 5, 2026 14:18 View deployment

arvinxx merged commit 10300ba into canary May 5, 2026
34 checks passed

arvinxx deleted the feat/hetero-multimodal-input branch May 5, 2026 15:06

Innei mentioned this pull request May 9, 2026

🚀 release: 20260509 #14563

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ feat(hetero-agent): support multimodal input (text + images) across CLI / shared spawn / desktop#14433

✨ feat(hetero-agent): support multimodal input (text + images) across CLI / shared spawn / desktop#14433
arvinxx merged 6 commits into
canaryfrom
feat/hetero-multimodal-input

arvinxx commented May 5, 2026

Uh oh!

vercel Bot commented May 5, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

arvinxx commented May 5, 2026

💻 Change Type

🔗 Related Issue

🔀 Description of Change

Terminal sugar — text + repeatable images

Programmatic / sandbox — full content-blocks JSON

Stdin auto-detect — JSON if first non-whitespace is { or [, else text

🧪 How to Test

tiny 73-byte red PNG fixture

→ 'I see a solid red color.' streamed as JSONL

JSON mode

→ 'I see a deep red/crimson color.'

📝 Additional Information

Uh oh!

vercel Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 5, 2026 •

edited

Loading

codecov Bot commented May 5, 2026 •

edited

Loading