feat: OpenAI-compatible API server — Chat Completions + Responses API by teknium1 · Pull Request #828 · NousResearch/hermes-agent

teknium1 · 2026-03-10T09:09:28Z

Summary

Adds an OpenAI-compatible HTTP API server as a new gateway platform adapter, enabling any OpenAI-compatible frontend to use hermes-agent as a backend.

What this enables

Open WebUI (126k★), LobeChat (73k★), LibreChat (34k★),
AnythingLLM (56k★), NextChat (87k★), ChatBox (39k★), Jan (26k★),
HF Chat-UI (8k★), big-AGI (7k★), and any OpenAI-compatible client

All connect to hermes-agent by pointing at http://localhost:8642/v1.

Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	OpenAI Chat Completions API (stateless)
POST	`/v1/responses`	OpenAI Responses API (stateful via `previous_response_id`)
GET	`/v1/models`	Lists hermes-agent as available model
GET	`/health`	Health check

Key features

Chat Completions: Stateless. Full conversation in each request via messages array. Returns final agent response only (tool calls execute invisibly server-side).
Responses API: Server-side conversation state via previous_response_id. Stores full internal history (including tool calls and results) so multi-turn context is fully preserved across requests. In-memory LRU store (max 100 responses).
System prompt layering: Frontend system messages are layered ON TOP of hermes-agent's core prompt (tools, memory, skills all preserved).
Bearer token auth: Optional, configured via API_SERVER_KEY env var. No key = allow all (for local-only use).
Binds to 127.0.0.1 by default (secure).

Quick start

echo 'API_SERVER_ENABLED=true' >> ~/.hermes/.env
hermes gateway
# Point Open WebUI at http://localhost:8642/v1

Configuration

Env Var	Default	Description
`API_SERVER_ENABLED`	`false`	Enable the API server
`API_SERVER_PORT`	`8642`	HTTP server port
`API_SERVER_HOST`	`127.0.0.1`	Bind address
`API_SERVER_KEY`	(none)	Bearer token for auth

Files changed

File	Change
`gateway/platforms/api_server.py`	NEW — 563 lines, adapter + response store
`gateway/config.py`	Add Platform.API_SERVER + env var overrides
`gateway/run.py`	Register adapter in _create_adapter()
`tests/gateway/test_api_server.py`	NEW — 816 lines, 51 tests

Phase 1 (this PR)

Non-streaming responses (stream=true returns 501)
Chat Completions + Responses API
Bearer auth, in-memory response store

Future phases

Phase 2: SSE streaming for both endpoints
Phase 3: Tool transparency, model passthrough, CORS, rate limiting

Test results

51 new tests, all passing
Full suite: 2849 passed, 0 failures

Adds a new gateway platform adapter that exposes an HTTP server with OpenAI-compatible endpoints, allowing any OpenAI-compatible frontend (Open WebUI, LobeChat, etc.) to use hermes-agent as a backend. Endpoints: - POST /v1/chat/completions - OpenAI Chat Completions format (stateless) - POST /v1/responses - OpenAI Responses API format (stateful via previous_response_id) - GET /v1/models - lists hermes-agent as an available model - GET /health - health check Features: - Bearer token auth via API_SERVER_KEY env var (unauthenticated when no key set) - System messages/instructions become ephemeral system prompt (layered on top of core prompt) - In-memory LRU response store for Responses API conversation chaining - Agent runs in thread executor (run_conversation is synchronous) - Streaming returns 501 (not yet implemented) New files: - gateway/platforms/api_server.py - APIServerAdapter class (~470 lines) - tests/gateway/test_api_server.py - 51 tests covering all endpoints, auth, config Modified files: - gateway/config.py - Added Platform.API_SERVER enum, env var overrides, connected platforms - gateway/run.py - Added _create_adapter() case, auth map entries, auth bypass

… CORS - GET /v1/responses/{id} — retrieve stored responses - DELETE /v1/responses/{id} — delete stored responses - Tool call items in output (function_call + function_call_output) - Real token counting from AIAgent (prompt/completion/total) - Truncation parameter support (auto mode, max 100 messages) - CORS middleware (Access-Control-Allow-Origin: *) - Full response objects stored for retrieval - 16 new tests (67 total)

Like /title for sessions — clients can name conversations instead of tracking response IDs manually: POST /v1/responses {input: 'hi', conversation: 'my-project'} POST /v1/responses {input: 'next step', conversation: 'my-project'} Server automatically chains to the latest response in that conversation. Mutually exclusive with previous_response_id (returns 400 if both set). Not stored if store=false. 5 new tests (72 total).

- stream=true now returns SSE (role chunk → content chunk → finish → [DONE]) instead of 501. Not token-by-token but compatible with frontends like Open WebUI that require SSE format. - Add conversation parameter for named session chaining (like /title). Mutually exclusive with previous_response_id.

Step-by-step guide for connecting Open WebUI to hermes-agent via the API server. Covers Docker setup, Admin UI config, Chat Completions vs Responses API modes, troubleshooting, and Linux Docker networking.

Token-by-token streaming of LLM responses, disabled by default. Enable via streaming.enabled: true in config.yaml or HERMES_STREAMING_ENABLED=true env var. Core (run_agent.py): - stream_callback parameter on AIAgent - _run_streaming_chat_completion() for Chat Completions streaming - _run_codex_stream() now emits tokens via callback - _interruptible_api_call routes to streaming when callback is set - Graceful fallback to non-streaming on any error Gateway (gateway/run.py): - Read streaming config (master switch + per-platform overrides) - Queue-based callback bridging agent thread to async event loop - stream_preview task: progressive message editing with cursor - Skip normal send when streaming already delivered the response - Thread-safe, respects platform rate limits (1.5s edit interval) API Server (gateway/platforms/api_server.py): - Real token-by-token SSE when stream=true (replaces pseudo-streaming) - stream_callback wired through _create_agent and _run_agent - Background agent task + queue for concurrent streaming Config: - streaming.enabled (master switch, default: false) - Per-platform: streaming.telegram, streaming.discord, etc. - HERMES_STREAMING_ENABLED env var override

… WebUI - website/docs/user-guide/features/api-server.md — full API server docs: endpoints (chat completions, responses, models), system prompt handling, auth, config, compatible frontends matrix, limitations - website/docs/user-guide/features/streaming.md — streaming docs: per-platform support matrix, architecture, config reference, troubleshooting, interaction with tools/compression/interrupts - website/docs/user-guide/messaging/open-webui.md — already existed, step-by-step Open WebUI integration guide - website/docs/user-guide/messaging/index.md — updated to include API server in architecture diagram and description

Cherry-picked from PR #828, rebased onto current main with conflict resolution.

… CORS Cherry-picked from PR #828.

Cherry-picked from PR #828.

Cherry-picked from PR #828, resolved conflicts with main.

… WebUI Cherry-picked from PR #828, resolved conflicts with main.

teknium1 · 2026-03-11T16:02:08Z

Superseded by PR #956, which rebases this onto current main and includes several improvements:

Removed dead _write_sse_chat_completion pseudo-streaming method
Extracted _resolve_model() helper to eliminate duplicated config.yaml parsing
Streaming config cached at GatewayRunner init instead of YAML parsing per-message
API_SERVER_* env vars registered in OPTIONAL_ENV_VARS for hermes setup integration
Security warning added about network exposure without API_SERVER_KEY
Resolved conflicts with EMAIL platform, last_prompt_tokens tracking, and docs

Cherry-picked from PR NousResearch#828, rebased onto current main with conflict resolution.

… CORS Cherry-picked from PR NousResearch#828.

Cherry-picked from PR NousResearch#828.

Cherry-picked from PR NousResearch#828, resolved conflicts with main.

… WebUI Cherry-picked from PR NousResearch#828, resolved conflicts with main.

Cherry-picked from PR NousResearch#828, rebased onto current main with conflict resolution.

… CORS Cherry-picked from PR NousResearch#828.

Cherry-picked from PR NousResearch#828.

Cherry-picked from PR NousResearch#828, resolved conflicts with main.

… WebUI Cherry-picked from PR NousResearch#828, resolved conflicts with main.

Bartok9 mentioned this pull request Mar 10, 2026

Flaky test: test_vision_tools.py::TestErrorLoggingExcInfo::test_analysis_error_logs_exc_info failing on main #835

Closed

teknium1 added 6 commits March 10, 2026 16:05

docs: add Open WebUI integration guide

e00335f

Step-by-step guide for connecting Open WebUI to hermes-agent via the API server. Covers Docker setup, Admin UI config, Chat Completions vs Responses API modes, troubleshooting, and Linux Docker networking.

teknium1 force-pushed the hermes/hermes-21d8bacc branch from 5d14188 to 5a426e0 Compare March 10, 2026 23:30

teknium1 added a commit that referenced this pull request Mar 11, 2026

feat: add OpenAI-compatible API server platform adapter (Phase 1)

58dc5c4

Cherry-picked from PR #828, rebased onto current main with conflict resolution.

teknium1 added a commit that referenced this pull request Mar 11, 2026

feat: enhance Responses API — retrieval, deletion, tool calls, usage,…

7d771c2

… CORS Cherry-picked from PR #828.

teknium1 added a commit that referenced this pull request Mar 11, 2026

feat: add conversation parameter + named session chaining

7ae208b

Cherry-picked from PR #828.

teknium1 added a commit that referenced this pull request Mar 11, 2026

feat: add pseudo-streaming SSE + conversation parameter

b3c798d

Cherry-picked from PR #828.

teknium1 added a commit that referenced this pull request Mar 11, 2026

docs: add Open WebUI integration guide

b2a4092

Cherry-picked from PR #828.

teknium1 added a commit that referenced this pull request Mar 11, 2026

feat: add streaming LLM response support across all platforms

95d221c

Cherry-picked from PR #828, resolved conflicts with main.

teknium1 added a commit that referenced this pull request Mar 11, 2026

docs: comprehensive documentation for API server, streaming, and Open…

d54280e

… WebUI Cherry-picked from PR #828, resolved conflicts with main.

teknium1 mentioned this pull request Mar 11, 2026

feat: OpenAI-compatible API server + streaming support #956

Closed

teknium1 closed this Mar 11, 2026

teknium1 mentioned this pull request Mar 17, 2026

feat: OpenAI-compatible HTTP server platform adapter #1745

Closed

OutThisLife mentioned this pull request Apr 23, 2026

fix(ui-tui): heal post-resize alt-screen drift #14640

Merged

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026

feat: add OpenAI-compatible API server platform adapter (Phase 1)

181bfce

Cherry-picked from PR NousResearch#828, rebased onto current main with conflict resolution.

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026

feat: enhance Responses API — retrieval, deletion, tool calls, usage,…

14aa62e

… CORS Cherry-picked from PR NousResearch#828.

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026

feat: add conversation parameter + named session chaining

dd28361

Cherry-picked from PR NousResearch#828.

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026

feat: add pseudo-streaming SSE + conversation parameter

6f56570

Cherry-picked from PR NousResearch#828.

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026

docs: add Open WebUI integration guide

6ada3ed

Cherry-picked from PR NousResearch#828.

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026

feat: add streaming LLM response support across all platforms

cf9e4d8

Cherry-picked from PR NousResearch#828, resolved conflicts with main.

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026

docs: comprehensive documentation for API server, streaming, and Open…

47c1994

… WebUI Cherry-picked from PR NousResearch#828, resolved conflicts with main.

CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026

feat: add OpenAI-compatible API server platform adapter (Phase 1)

1a7b592

Cherry-picked from PR NousResearch#828, rebased onto current main with conflict resolution.

CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026

feat: enhance Responses API — retrieval, deletion, tool calls, usage,…

ceb37b8

… CORS Cherry-picked from PR NousResearch#828.

CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026

feat: add conversation parameter + named session chaining

c954595

Cherry-picked from PR NousResearch#828.

CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026

feat: add pseudo-streaming SSE + conversation parameter

9fc7b88

Cherry-picked from PR NousResearch#828.

CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026

docs: add Open WebUI integration guide

621e769

Cherry-picked from PR NousResearch#828.

CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026

feat: add streaming LLM response support across all platforms

f60dcce

Cherry-picked from PR NousResearch#828, resolved conflicts with main.

CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026

docs: comprehensive documentation for API server, streaming, and Open…

9845ec6

… WebUI Cherry-picked from PR NousResearch#828, resolved conflicts with main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OpenAI-compatible API server — Chat Completions + Responses API#828

feat: OpenAI-compatible API server — Chat Completions + Responses API#828
teknium1 wants to merge 7 commits into
mainfrom
hermes/hermes-21d8bacc

teknium1 commented Mar 10, 2026

Uh oh!

teknium1 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Mar 10, 2026

Summary

What this enables

Endpoints

Key features

Quick start

Configuration

Files changed

Phase 1 (this PR)

Future phases

Test results

Uh oh!

teknium1 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant