Skip to content

fix(gateway): make api_server body-size cap configurable, raise default to 100 MiB#8328

Open
DataAdvisory wants to merge 1 commit into
NousResearch:mainfrom
DataAdvisory:fix/gateway-api-server-body-limit
Open

fix(gateway): make api_server body-size cap configurable, raise default to 100 MiB#8328
DataAdvisory wants to merge 1 commit into
NousResearch:mainfrom
DataAdvisory:fix/gateway-api-server-body-limit

Conversation

@DataAdvisory

Copy link
Copy Markdown

Summary

  • The OpenAI-compat HTTP API server (gateway/platforms/api_server.py) enforced a hardcoded 1 MB POST body limit in two places — the body_limit_middleware Content-Length pre-check and aiohttp's own transport-level cap (which defaulted to its 1 MiB client_max_size because web.Application(...) was constructed without an override). Together they made any non-trivial image attachment fail with Request body too large (HTTP 413) before the request could ever reach the backend model — even when the model itself supports vision and would accept the payload (e.g. gemma3:27b / gemma4:26b on a local Ollama).
  • This PR makes the limit configurable (matching the precedence pattern already used in the same constructor for host/port/api_key, and the max_body_bytes knob in gateway/platforms/webhook.py) and raises the default to 100 MiB so a Retina screenshot or a small PDF passes through without operator intervention.
  • No behaviour change for deployments that don't override the new key beyond the larger default. Tighter caps (e.g. for hardened public deployments) remain available via the new config key or env var.

Details

  • MAX_REQUEST_BYTES → renamed to DEFAULT_MAX_REQUEST_BYTES, value 100 * 1024 * 1024.
  • APIServerAdapter.__init__ reads the effective limit from, in order of precedence:
    1. extra.max_request_bytes in the platform config (i.e. ~/.hermes/config.yaml under the api_server platform's extra: block)
    2. API_SERVER_MAX_REQUEST_BYTES environment variable
    3. DEFAULT_MAX_REQUEST_BYTES (100 MiB)
  • The configured value is stored on request.app[\"max_request_bytes\"] so the module-level body_limit_middleware reads it from the app instead of a stale module global.
  • web.Application(...) is now constructed with client_max_size=self._max_request_bytes so the aiohttp transport layer agrees with the middleware. Without this fix, raising the middleware constant alone would still leave bodies rejected one layer down by aiohttp's 1 MiB default.

Repro of the original bug

# config.yaml has model.base_url pointing at a local Ollama
hermes gateway start
# from any OpenAI-compat client, POST a chat completion containing a base64'd
# image attachment > 1 MB
#   → 413 {"error": {"message": "Request body too large", "code": "body_too_large"}}
# (the image never reaches the model)

After this PR the same request goes through, all the way to the backend model.

Test plan

  • python -c \"from gateway.platforms import api_server as a; print(a.DEFAULT_MAX_REQUEST_BYTES); assert not hasattr(a, 'MAX_REQUEST_BYTES')\" — module imports cleanly, default is 104857600, old constant is gone.
  • hermes gateway restart then hermes gateway status — gateway boots cleanly with the refactored code.
  • Reviewer: POST a > 1 MB body to /v1/chat/completions against an api_server-platform gateway and confirm it now reaches the backend.
  • Reviewer: set API_SERVER_MAX_REQUEST_BYTES=2000000 and confirm the cap drops back to 2 MB (the env var override path).
  • Reviewer: confirm existing tests still pass (no MAX_REQUEST_BYTES references found in tests/).

🤖 Generated with Claude Code

…lt to 100 MiB

The OpenAI-compat HTTP API server enforced a hardcoded 1 MB POST body limit
in two places — the `body_limit_middleware` Content-Length pre-check and
aiohttp's own transport-level cap (which defaulted to its 1 MiB
`client_max_size` because the `web.Application` was constructed without an
override). Together they made any non-trivial image attachment fail with
`Request body too large` (HTTP 413) before the request could ever reach the
backend model — even when the model itself supports vision and would happily
accept the payload.

This change:

- Renames `MAX_REQUEST_BYTES` → `DEFAULT_MAX_REQUEST_BYTES` and bumps the
  default to 100 MiB so a Retina screenshot or a small PDF passes through
  without operator intervention.
- Reads the effective limit from `extra.max_request_bytes` (config.yaml) or
  the `API_SERVER_MAX_REQUEST_BYTES` env var in `APIServerAdapter.__init__`,
  mirroring the host/port/api_key precedence pattern already used in the
  same constructor and the `max_body_bytes` knob in `webhook.py`.
- Stores the configured value on `request.app["max_request_bytes"]` so the
  module-level middleware reads it from the app instead of a stale module
  global.
- Passes `client_max_size=self._max_request_bytes` to `web.Application(...)`
  so the aiohttp transport layer agrees with the middleware — without this
  the middleware could be raised but bodies would still be rejected upstream.

No behaviour change for deployments that don't override the new config key
beyond the larger default. Tighter caps remain available for hardened
deployments via `api_server.max_request_bytes` or the env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DataAdvisory added a commit to DataAdvisory/hermes-agent that referenced this pull request Apr 14, 2026
`AIAgent.run_conversation`'s `user_message` parameter is typed as `str`
but in practice can also be an OpenAI-style multi-part list of content
parts when the request comes through the OpenAI-compat HTTP API server
(`gateway/platforms/api_server.py` lines 570 / 981 forward the raw
`content` field straight through, and for vision requests that field
is a list of `{type: "text"|"image_url"|...}` dicts). The list must
reach the LLM unchanged so vision-capable models receive their image
parts — but three places in `run_agent.py` assumed `str` and crashed
or printed garbage on a list:

- `_msg_preview = (user_message[:80] + "...") ...; .replace("\n", " ")`
  at the top of `run_conversation` — `[:80]` returned a sublist of
  dict parts, then `.replace()` raised
  `AttributeError: 'list' object has no attribute 'replace'`,
  failing the entire turn before the agent even started.
- `f"💬 Starting conversation: '{user_message[:60]}{...}'"` a few
  lines below — same shape, didn't crash but printed a raw list of
  dicts to the user.
- `(user_message or "").strip().lower()` in
  `_looks_like_codex_intermediate_ack` — would crash any vision
  request routed through a Codex-family provider.

This change introduces a small module-level helper
`_user_message_text(user_message)` that returns a flat string
representation of either a `str`, an OpenAI multi-part list (text
parts joined; non-text parts summarised as `[image]` / `[audio]` /
`[file]` so previews don't lose track of them), `None`, or anything
else (`str(...)`). All three call sites now go through the helper.
The list itself is left untouched in `messages.append({"role":
"user", "content": user_message})`, so the LLM still receives the
original multi-part content and vision continues to work.

Verified end-to-end on a local Ollama + gemma4:26b setup: a 14.8 MiB
test PNG, base64-encoded into a 19.8 MiB OpenAI multi-part
chat-completions request, was POSTed through the gateway api_server
and returned `200 OK` with the expected `OK` reply — previously the
same request died with `'list' object has no attribute 'replace'`
in this exact preview line.

Note: the bug was effectively dormant because the gateway api_server
also enforced a hardcoded 1 MB body cap (see NousResearch#8328) that rejected
most image attachments before they could reach this code path. With
that cap raised, this latent crash became reachable, hence the
companion fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sunxyless

Copy link
Copy Markdown

+1 on this. Just closed my own PR (#12329) that was also addressing the 1 MiB wall — I'll let this one carry the body-size fix since the design here is cleaner (config-key + env, raw bytes, proper DI via request.app["max_request_bytes"] instead of a module-level constant).

Two things I dug into that may be useful if you want to add coverage:

  1. The failure mode is not just 413 — it's opaque. Without client_max_size on the Application, aiohttp raises when request.json() reads past its internal cap; the handler's except (json.JSONDecodeError, Exception) collapses that into "Invalid JSON in request body". Users see the vague error and have no hint that the real cause is a size limit. Worth a one-liner regression test asserting 2 MiB body doesn't turn into "Invalid JSON" post-fix.

  2. Field names in existing envs. The repo already has API_SERVER_HOST / API_SERVER_PORT / API_SERVER_KEY registered in hermes_cli/config.py's env-var catalog (that's where Setup wizard and docs pick up descriptions). If this lands, API_SERVER_MAX_REQUEST_BYTES wants a catalog entry too so operators discover it via hermes config.

(I had test cases for both in #12329; happy to cherry-pick them into here if it'd save you time.)

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants