fix(gateway): make api_server body-size cap configurable, raise default to 100 MiB#8328
Conversation
…lt to 100 MiB The OpenAI-compat HTTP API server enforced a hardcoded 1 MB POST body limit in two places — the `body_limit_middleware` Content-Length pre-check and aiohttp's own transport-level cap (which defaulted to its 1 MiB `client_max_size` because the `web.Application` was constructed without an override). Together they made any non-trivial image attachment fail with `Request body too large` (HTTP 413) before the request could ever reach the backend model — even when the model itself supports vision and would happily accept the payload. This change: - Renames `MAX_REQUEST_BYTES` → `DEFAULT_MAX_REQUEST_BYTES` and bumps the default to 100 MiB so a Retina screenshot or a small PDF passes through without operator intervention. - Reads the effective limit from `extra.max_request_bytes` (config.yaml) or the `API_SERVER_MAX_REQUEST_BYTES` env var in `APIServerAdapter.__init__`, mirroring the host/port/api_key precedence pattern already used in the same constructor and the `max_body_bytes` knob in `webhook.py`. - Stores the configured value on `request.app["max_request_bytes"]` so the module-level middleware reads it from the app instead of a stale module global. - Passes `client_max_size=self._max_request_bytes` to `web.Application(...)` so the aiohttp transport layer agrees with the middleware — without this the middleware could be raised but bodies would still be rejected upstream. No behaviour change for deployments that don't override the new config key beyond the larger default. Tighter caps remain available for hardened deployments via `api_server.max_request_bytes` or the env var. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`AIAgent.run_conversation`'s `user_message` parameter is typed as `str`
but in practice can also be an OpenAI-style multi-part list of content
parts when the request comes through the OpenAI-compat HTTP API server
(`gateway/platforms/api_server.py` lines 570 / 981 forward the raw
`content` field straight through, and for vision requests that field
is a list of `{type: "text"|"image_url"|...}` dicts). The list must
reach the LLM unchanged so vision-capable models receive their image
parts — but three places in `run_agent.py` assumed `str` and crashed
or printed garbage on a list:
- `_msg_preview = (user_message[:80] + "...") ...; .replace("\n", " ")`
at the top of `run_conversation` — `[:80]` returned a sublist of
dict parts, then `.replace()` raised
`AttributeError: 'list' object has no attribute 'replace'`,
failing the entire turn before the agent even started.
- `f"💬 Starting conversation: '{user_message[:60]}{...}'"` a few
lines below — same shape, didn't crash but printed a raw list of
dicts to the user.
- `(user_message or "").strip().lower()` in
`_looks_like_codex_intermediate_ack` — would crash any vision
request routed through a Codex-family provider.
This change introduces a small module-level helper
`_user_message_text(user_message)` that returns a flat string
representation of either a `str`, an OpenAI multi-part list (text
parts joined; non-text parts summarised as `[image]` / `[audio]` /
`[file]` so previews don't lose track of them), `None`, or anything
else (`str(...)`). All three call sites now go through the helper.
The list itself is left untouched in `messages.append({"role":
"user", "content": user_message})`, so the LLM still receives the
original multi-part content and vision continues to work.
Verified end-to-end on a local Ollama + gemma4:26b setup: a 14.8 MiB
test PNG, base64-encoded into a 19.8 MiB OpenAI multi-part
chat-completions request, was POSTed through the gateway api_server
and returned `200 OK` with the expected `OK` reply — previously the
same request died with `'list' object has no attribute 'replace'`
in this exact preview line.
Note: the bug was effectively dormant because the gateway api_server
also enforced a hardcoded 1 MB body cap (see NousResearch#8328) that rejected
most image attachments before they could reach this code path. With
that cap raised, this latent crash became reachable, hence the
companion fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
+1 on this. Just closed my own PR (#12329) that was also addressing the 1 MiB wall — I'll let this one carry the body-size fix since the design here is cleaner (config-key + env, raw bytes, proper DI via Two things I dug into that may be useful if you want to add coverage:
(I had test cases for both in #12329; happy to cherry-pick them into here if it'd save you time.) |
Summary
gateway/platforms/api_server.py) enforced a hardcoded 1 MB POST body limit in two places — thebody_limit_middlewareContent-Lengthpre-check and aiohttp's own transport-level cap (which defaulted to its 1 MiBclient_max_sizebecauseweb.Application(...)was constructed without an override). Together they made any non-trivial image attachment fail withRequest body too large(HTTP 413) before the request could ever reach the backend model — even when the model itself supports vision and would accept the payload (e.g.gemma3:27b/gemma4:26bon a local Ollama).max_body_bytesknob ingateway/platforms/webhook.py) and raises the default to 100 MiB so a Retina screenshot or a small PDF passes through without operator intervention.Details
MAX_REQUEST_BYTES→ renamed toDEFAULT_MAX_REQUEST_BYTES, value100 * 1024 * 1024.APIServerAdapter.__init__reads the effective limit from, in order of precedence:extra.max_request_bytesin the platform config (i.e.~/.hermes/config.yamlunder theapi_serverplatform'sextra:block)API_SERVER_MAX_REQUEST_BYTESenvironment variableDEFAULT_MAX_REQUEST_BYTES(100 MiB)request.app[\"max_request_bytes\"]so the module-levelbody_limit_middlewarereads it from the app instead of a stale module global.web.Application(...)is now constructed withclient_max_size=self._max_request_bytesso the aiohttp transport layer agrees with the middleware. Without this fix, raising the middleware constant alone would still leave bodies rejected one layer down by aiohttp's 1 MiB default.Repro of the original bug
After this PR the same request goes through, all the way to the backend model.
Test plan
python -c \"from gateway.platforms import api_server as a; print(a.DEFAULT_MAX_REQUEST_BYTES); assert not hasattr(a, 'MAX_REQUEST_BYTES')\"— module imports cleanly, default is104857600, old constant is gone.hermes gateway restartthenhermes gateway status— gateway boots cleanly with the refactored code./v1/chat/completionsagainst anapi_server-platform gateway and confirm it now reaches the backend.API_SERVER_MAX_REQUEST_BYTES=2000000and confirm the cap drops back to 2 MB (the env var override path).MAX_REQUEST_BYTESreferences found intests/).🤖 Generated with Claude Code