fix(gateway): make api_server body-size cap configurable, raise default to 100 MiB by DataAdvisory · Pull Request #8328 · NousResearch/hermes-agent

DataAdvisory · 2026-04-12T11:14:05Z

Summary

The OpenAI-compat HTTP API server (gateway/platforms/api_server.py) enforced a hardcoded 1 MB POST body limit in two places — the body_limit_middleware Content-Length pre-check and aiohttp's own transport-level cap (which defaulted to its 1 MiB client_max_size because web.Application(...) was constructed without an override). Together they made any non-trivial image attachment fail with Request body too large (HTTP 413) before the request could ever reach the backend model — even when the model itself supports vision and would accept the payload (e.g. gemma3:27b / gemma4:26b on a local Ollama).
This PR makes the limit configurable (matching the precedence pattern already used in the same constructor for host/port/api_key, and the max_body_bytes knob in gateway/platforms/webhook.py) and raises the default to 100 MiB so a Retina screenshot or a small PDF passes through without operator intervention.
No behaviour change for deployments that don't override the new key beyond the larger default. Tighter caps (e.g. for hardened public deployments) remain available via the new config key or env var.

Details

MAX_REQUEST_BYTES → renamed to DEFAULT_MAX_REQUEST_BYTES, value 100 * 1024 * 1024.
APIServerAdapter.__init__ reads the effective limit from, in order of precedence:
1. extra.max_request_bytes in the platform config (i.e. ~/.hermes/config.yaml under the api_server platform's extra: block)
2. API_SERVER_MAX_REQUEST_BYTES environment variable
3. DEFAULT_MAX_REQUEST_BYTES (100 MiB)
The configured value is stored on request.app[\"max_request_bytes\"] so the module-level body_limit_middleware reads it from the app instead of a stale module global.
web.Application(...) is now constructed with client_max_size=self._max_request_bytes so the aiohttp transport layer agrees with the middleware. Without this fix, raising the middleware constant alone would still leave bodies rejected one layer down by aiohttp's 1 MiB default.

Repro of the original bug

# config.yaml has model.base_url pointing at a local Ollama
hermes gateway start
# from any OpenAI-compat client, POST a chat completion containing a base64'd
# image attachment > 1 MB
#   → 413 {"error": {"message": "Request body too large", "code": "body_too_large"}}
# (the image never reaches the model)

After this PR the same request goes through, all the way to the backend model.

Test plan

python -c \"from gateway.platforms import api_server as a; print(a.DEFAULT_MAX_REQUEST_BYTES); assert not hasattr(a, 'MAX_REQUEST_BYTES')\" — module imports cleanly, default is 104857600, old constant is gone.
hermes gateway restart then hermes gateway status — gateway boots cleanly with the refactored code.
Reviewer: POST a > 1 MB body to /v1/chat/completions against an api_server-platform gateway and confirm it now reaches the backend.
Reviewer: set API_SERVER_MAX_REQUEST_BYTES=2000000 and confirm the cap drops back to 2 MB (the env var override path).
Reviewer: confirm existing tests still pass (no MAX_REQUEST_BYTES references found in tests/).

🤖 Generated with Claude Code

…lt to 100 MiB The OpenAI-compat HTTP API server enforced a hardcoded 1 MB POST body limit in two places — the `body_limit_middleware` Content-Length pre-check and aiohttp's own transport-level cap (which defaulted to its 1 MiB `client_max_size` because the `web.Application` was constructed without an override). Together they made any non-trivial image attachment fail with `Request body too large` (HTTP 413) before the request could ever reach the backend model — even when the model itself supports vision and would happily accept the payload. This change: - Renames `MAX_REQUEST_BYTES` → `DEFAULT_MAX_REQUEST_BYTES` and bumps the default to 100 MiB so a Retina screenshot or a small PDF passes through without operator intervention. - Reads the effective limit from `extra.max_request_bytes` (config.yaml) or the `API_SERVER_MAX_REQUEST_BYTES` env var in `APIServerAdapter.__init__`, mirroring the host/port/api_key precedence pattern already used in the same constructor and the `max_body_bytes` knob in `webhook.py`. - Stores the configured value on `request.app["max_request_bytes"]` so the module-level middleware reads it from the app instead of a stale module global. - Passes `client_max_size=self._max_request_bytes` to `web.Application(...)` so the aiohttp transport layer agrees with the middleware — without this the middleware could be raised but bodies would still be rejected upstream. No behaviour change for deployments that don't override the new config key beyond the larger default. Tighter caps remain available for hardened deployments via `api_server.max_request_bytes` or the env var. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

`AIAgent.run_conversation`'s `user_message` parameter is typed as `str` but in practice can also be an OpenAI-style multi-part list of content parts when the request comes through the OpenAI-compat HTTP API server (`gateway/platforms/api_server.py` lines 570 / 981 forward the raw `content` field straight through, and for vision requests that field is a list of `{type: "text"|"image_url"|...}` dicts). The list must reach the LLM unchanged so vision-capable models receive their image parts — but three places in `run_agent.py` assumed `str` and crashed or printed garbage on a list: - `_msg_preview = (user_message[:80] + "...") ...; .replace("\n", " ")` at the top of `run_conversation` — `[:80]` returned a sublist of dict parts, then `.replace()` raised `AttributeError: 'list' object has no attribute 'replace'`, failing the entire turn before the agent even started. - `f"💬 Starting conversation: '{user_message[:60]}{...}'"` a few lines below — same shape, didn't crash but printed a raw list of dicts to the user. - `(user_message or "").strip().lower()` in `_looks_like_codex_intermediate_ack` — would crash any vision request routed through a Codex-family provider. This change introduces a small module-level helper `_user_message_text(user_message)` that returns a flat string representation of either a `str`, an OpenAI multi-part list (text parts joined; non-text parts summarised as `[image]` / `[audio]` / `[file]` so previews don't lose track of them), `None`, or anything else (`str(...)`). All three call sites now go through the helper. The list itself is left untouched in `messages.append({"role": "user", "content": user_message})`, so the LLM still receives the original multi-part content and vision continues to work. Verified end-to-end on a local Ollama + gemma4:26b setup: a 14.8 MiB test PNG, base64-encoded into a 19.8 MiB OpenAI multi-part chat-completions request, was POSTed through the gateway api_server and returned `200 OK` with the expected `OK` reply — previously the same request died with `'list' object has no attribute 'replace'` in this exact preview line. Note: the bug was effectively dormant because the gateway api_server also enforced a hardcoded 1 MB body cap (see NousResearch#8328) that rejected most image attachments before they could reach this code path. With that cap raised, this latent crash became reachable, hence the companion fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sunxyless · 2026-04-21T13:19:04Z

+1 on this. Just closed my own PR (#12329) that was also addressing the 1 MiB wall — I'll let this one carry the body-size fix since the design here is cleaner (config-key + env, raw bytes, proper DI via request.app["max_request_bytes"] instead of a module-level constant).

Two things I dug into that may be useful if you want to add coverage:

The failure mode is not just 413 — it's opaque. Without client_max_size on the Application, aiohttp raises when request.json() reads past its internal cap; the handler's except (json.JSONDecodeError, Exception) collapses that into "Invalid JSON in request body". Users see the vague error and have no hint that the real cause is a size limit. Worth a one-liner regression test asserting 2 MiB body doesn't turn into "Invalid JSON" post-fix.
Field names in existing envs. The repo already has API_SERVER_HOST / API_SERVER_PORT / API_SERVER_KEY registered in hermes_cli/config.py's env-var catalog (that's where Setup wizard and docs pick up descriptions). If this lands, API_SERVER_MAX_REQUEST_BYTES wants a catalog entry too so operators discover it via hermes config.

(I had test cases for both in #12329; happy to cherry-pick them into here if it'd save you time.)

DataAdvisory mentioned this pull request Apr 14, 2026

fix(agent): handle multi-part user_message in logging/codex-ack helpers #9562

Open

5 tasks

teknium1 mentioned this pull request Apr 20, 2026

feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses #12969

Merged

sunxyless mentioned this pull request Apr 21, 2026

feat(api-server): support multimodal image uploads via auxiliary vision #12329

Closed

19 tasks

alt-glitch mentioned this pull request Apr 22, 2026

Gateway API server rejects image payloads >1MB due to missing client_max_size on aiohttp Application #13875

Open

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Apr 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): make api_server body-size cap configurable, raise default to 100 MiB#8328

fix(gateway): make api_server body-size cap configurable, raise default to 100 MiB#8328
DataAdvisory wants to merge 1 commit into
NousResearch:mainfrom
DataAdvisory:fix/gateway-api-server-body-limit

DataAdvisory commented Apr 12, 2026

Uh oh!

sunxyless commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DataAdvisory commented Apr 12, 2026

Summary

Details

Repro of the original bug

Test plan

Uh oh!

sunxyless commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants