feat(gateway): add inline image inputs to API server by dinedal · Pull Request #5621 · NousResearch/hermes-agent

dinedal · 2026-04-06T20:00:22Z

What does this PR do?

Adds inline image support to the OpenAI-compatible API server without adding file uploads. /v1/chat/completions now accepts text + image_url parts, /v1/responses accepts input_text + input_image and normalizes them into Hermes's canonical multimodal format, and the Codex/Responses conversion path preserves image parts instead of flattening them to text. This keeps the existing auth, SSE, idempotency, and session continuity behavior intact while unblocking inline image inputs and documenting the remaining file/document limitation.

Related Issue

None.

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

Added OpenAI-style validation and normalization for multimodal request parts in gateway/platforms/api_server.py
Preserved canonical text / image_url parts when converting chat history into Responses/Codex input in run_agent.py
Raised the API server JSON body limit to configurable API_SERVER_MAX_REQUEST_BYTES with a default of 10 MiB
Added coverage for inline images, invalid file/document inputs, base64 data URLs, replay through previous_response_id, and session continuity
Updated the API-server docs to describe inline image support and the remaining uploaded-file/document limitation

How to Test

source .venv/bin/activate && python -m pytest tests/gateway/test_api_server.py tests/test_run_agent_codex_responses.py -q
source .venv/bin/activate && python -m pytest tests/ -v
source .venv/bin/activate && export HERMES_HOME=/tmp/hermes-inline-images-pr API_SERVER_ENABLED=true API_SERVER_PORT=8765 GATEWAY_ALLOW_ALL_USERS=true && mkdir -p "$HERMES_HOME"/{cron,sessions,logs,memories,skills} && touch "$HERMES_HOME"/.env && hermes gateway run
Then, from another shell:
- POST an input_file payload to /v1/responses and expect an unsupported_content_type 400
- POST an input_image data URL payload to /v1/responses and confirm it is accepted past validation (in my environment it reached downstream auth and returned a 401 because no provider credentials were configured)

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS 15.7.4

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Focused regression tests passed: source .venv/bin/activate && python -m pytest tests/gateway/test_api_server.py tests/test_run_agent_codex_responses.py -q -> 137 passed
Full suite in this checkout is not green today: source .venv/bin/activate && python -m pytest tests/ -v -> 16 failed, 7642 passed, 214 skipped, 1 xfailed, 138 warnings, 5 errors
The full-suite errors include missing local acp module imports in tests/acp/*; unrelated failures also reproduce in areas like tests/tools/test_file_read_guards.py
Manual API-server smoke test via hermes gateway run:
- unsupported input_file returned unsupported_content_type with message Inline image inputs are supported, but uploaded files and document inputs are not supported on this endpoint.
- supported input_image accepted the request shape and reached the downstream auth layer, which returned Error code: 401 - {'error': {'message': 'Missing Authentication header', 'code': 401}}

…/responses OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text|image_url, image_url: {url, detail?}} Responses: {type: input_text|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes #5621, #8253, #4046, #6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>

teknium1 · 2026-04-20T11:16:49Z

Closed in favor of #12969 (merged as f683132). Thanks for the deep work on this — the cross-format input_image / image_url normalization and the Codex Responses preservation in particular informed the final implementation. The rewrite consolidates the multimodal path into a single validator shared by both /v1/chat/completions and /v1/responses, fixes the run_conversation prologue crash on list user_message, and was validated live against a real image via both endpoints. Credited in the merged PR body.

…/responses (NousResearch#12969) OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text|image_url, image_url: {url, detail?}} Responses: {type: input_text|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes NousResearch#5621, NousResearch#8253, NousResearch#4046, NousResearch#6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>

dinedal force-pushed the add-inline-images-to-openai-compatible branch 2 times, most recently from d62c61b to ba6d96b Compare April 13, 2026 21:21

feat(gateway): add inline image inputs to API server

da8bb90

dinedal force-pushed the add-inline-images-to-openai-compatible branch from ba6d96b to da8bb90 Compare April 18, 2026 23:26

sunxyless mentioned this pull request Apr 19, 2026

feat(api-server): support multimodal image uploads via auxiliary vision #12329

Closed

19 tasks

teknium1 mentioned this pull request Apr 20, 2026

feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses #12969

Merged

teknium1 closed this in f683132 Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway): add inline image inputs to API server#5621

feat(gateway): add inline image inputs to API server#5621
dinedal wants to merge 1 commit into
NousResearch:mainfrom
dinedal:add-inline-images-to-openai-compatible

dinedal commented Apr 6, 2026

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dinedal commented Apr 6, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

teknium1 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants