Skip to content

feat: OpenAI-compatible API server — Chat Completions + Responses API#828

Closed
teknium1 wants to merge 7 commits into
mainfrom
hermes/hermes-21d8bacc
Closed

feat: OpenAI-compatible API server — Chat Completions + Responses API#828
teknium1 wants to merge 7 commits into
mainfrom
hermes/hermes-21d8bacc

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Adds an OpenAI-compatible HTTP API server as a new gateway platform adapter, enabling any OpenAI-compatible frontend to use hermes-agent as a backend.

What this enables

Open WebUI (126k★), LobeChat (73k★), LibreChat (34k★),
AnythingLLM (56k★), NextChat (87k★), ChatBox (39k★), Jan (26k★),
HF Chat-UI (8k★), big-AGI (7k★), and any OpenAI-compatible client

All connect to hermes-agent by pointing at http://localhost:8642/v1.

Endpoints

Method Path Description
POST /v1/chat/completions OpenAI Chat Completions API (stateless)
POST /v1/responses OpenAI Responses API (stateful via previous_response_id)
GET /v1/models Lists hermes-agent as available model
GET /health Health check

Key features

  • Chat Completions: Stateless. Full conversation in each request via messages array. Returns final agent response only (tool calls execute invisibly server-side).
  • Responses API: Server-side conversation state via previous_response_id. Stores full internal history (including tool calls and results) so multi-turn context is fully preserved across requests. In-memory LRU store (max 100 responses).
  • System prompt layering: Frontend system messages are layered ON TOP of hermes-agent's core prompt (tools, memory, skills all preserved).
  • Bearer token auth: Optional, configured via API_SERVER_KEY env var. No key = allow all (for local-only use).
  • Binds to 127.0.0.1 by default (secure).

Quick start

echo 'API_SERVER_ENABLED=true' >> ~/.hermes/.env
hermes gateway
# Point Open WebUI at http://localhost:8642/v1

Configuration

Env Var Default Description
API_SERVER_ENABLED false Enable the API server
API_SERVER_PORT 8642 HTTP server port
API_SERVER_HOST 127.0.0.1 Bind address
API_SERVER_KEY (none) Bearer token for auth

Files changed

File Change
gateway/platforms/api_server.py NEW — 563 lines, adapter + response store
gateway/config.py Add Platform.API_SERVER + env var overrides
gateway/run.py Register adapter in _create_adapter()
tests/gateway/test_api_server.py NEW — 816 lines, 51 tests

Phase 1 (this PR)

  • Non-streaming responses (stream=true returns 501)
  • Chat Completions + Responses API
  • Bearer auth, in-memory response store

Future phases

  • Phase 2: SSE streaming for both endpoints
  • Phase 3: Tool transparency, model passthrough, CORS, rate limiting

Test results

  • 51 new tests, all passing
  • Full suite: 2849 passed, 0 failures

Adds a new gateway platform adapter that exposes an HTTP server with
OpenAI-compatible endpoints, allowing any OpenAI-compatible frontend
(Open WebUI, LobeChat, etc.) to use hermes-agent as a backend.

Endpoints:
- POST /v1/chat/completions - OpenAI Chat Completions format (stateless)
- POST /v1/responses - OpenAI Responses API format (stateful via previous_response_id)
- GET /v1/models - lists hermes-agent as an available model
- GET /health - health check

Features:
- Bearer token auth via API_SERVER_KEY env var (unauthenticated when no key set)
- System messages/instructions become ephemeral system prompt (layered on top of core prompt)
- In-memory LRU response store for Responses API conversation chaining
- Agent runs in thread executor (run_conversation is synchronous)
- Streaming returns 501 (not yet implemented)

New files:
- gateway/platforms/api_server.py - APIServerAdapter class (~470 lines)
- tests/gateway/test_api_server.py - 51 tests covering all endpoints, auth, config

Modified files:
- gateway/config.py - Added Platform.API_SERVER enum, env var overrides, connected platforms
- gateway/run.py - Added _create_adapter() case, auth map entries, auth bypass
… CORS

- GET /v1/responses/{id} — retrieve stored responses
- DELETE /v1/responses/{id} — delete stored responses
- Tool call items in output (function_call + function_call_output)
- Real token counting from AIAgent (prompt/completion/total)
- Truncation parameter support (auto mode, max 100 messages)
- CORS middleware (Access-Control-Allow-Origin: *)
- Full response objects stored for retrieval
- 16 new tests (67 total)
Like /title for sessions — clients can name conversations instead of
tracking response IDs manually:

  POST /v1/responses {input: 'hi', conversation: 'my-project'}
  POST /v1/responses {input: 'next step', conversation: 'my-project'}

Server automatically chains to the latest response in that conversation.
Mutually exclusive with previous_response_id (returns 400 if both set).
Not stored if store=false.

5 new tests (72 total).
- stream=true now returns SSE (role chunk → content chunk → finish → [DONE])
  instead of 501. Not token-by-token but compatible with frontends like
  Open WebUI that require SSE format.
- Add conversation parameter for named session chaining (like /title).
  Mutually exclusive with previous_response_id.
Step-by-step guide for connecting Open WebUI to hermes-agent via the
API server. Covers Docker setup, Admin UI config, Chat Completions
vs Responses API modes, troubleshooting, and Linux Docker networking.
Token-by-token streaming of LLM responses, disabled by default.
Enable via streaming.enabled: true in config.yaml or
HERMES_STREAMING_ENABLED=true env var.

Core (run_agent.py):
- stream_callback parameter on AIAgent
- _run_streaming_chat_completion() for Chat Completions streaming
- _run_codex_stream() now emits tokens via callback
- _interruptible_api_call routes to streaming when callback is set
- Graceful fallback to non-streaming on any error

Gateway (gateway/run.py):
- Read streaming config (master switch + per-platform overrides)
- Queue-based callback bridging agent thread to async event loop
- stream_preview task: progressive message editing with cursor
- Skip normal send when streaming already delivered the response
- Thread-safe, respects platform rate limits (1.5s edit interval)

API Server (gateway/platforms/api_server.py):
- Real token-by-token SSE when stream=true (replaces pseudo-streaming)
- stream_callback wired through _create_agent and _run_agent
- Background agent task + queue for concurrent streaming

Config:
- streaming.enabled (master switch, default: false)
- Per-platform: streaming.telegram, streaming.discord, etc.
- HERMES_STREAMING_ENABLED env var override
@teknium1 teknium1 force-pushed the hermes/hermes-21d8bacc branch from 5d14188 to 5a426e0 Compare March 10, 2026 23:30
… WebUI

- website/docs/user-guide/features/api-server.md — full API server docs:
  endpoints (chat completions, responses, models), system prompt handling,
  auth, config, compatible frontends matrix, limitations
- website/docs/user-guide/features/streaming.md — streaming docs:
  per-platform support matrix, architecture, config reference,
  troubleshooting, interaction with tools/compression/interrupts
- website/docs/user-guide/messaging/open-webui.md — already existed,
  step-by-step Open WebUI integration guide
- website/docs/user-guide/messaging/index.md — updated to include
  API server in architecture diagram and description
teknium1 added a commit that referenced this pull request Mar 11, 2026
Cherry-picked from PR #828, rebased onto current main with conflict resolution.
teknium1 added a commit that referenced this pull request Mar 11, 2026
teknium1 added a commit that referenced this pull request Mar 11, 2026
teknium1 added a commit that referenced this pull request Mar 11, 2026
teknium1 added a commit that referenced this pull request Mar 11, 2026
teknium1 added a commit that referenced this pull request Mar 11, 2026
Cherry-picked from PR #828, resolved conflicts with main.
teknium1 added a commit that referenced this pull request Mar 11, 2026
… WebUI

Cherry-picked from PR #828, resolved conflicts with main.
@teknium1

Copy link
Copy Markdown
Contributor Author

Superseded by PR #956, which rebases this onto current main and includes several improvements:

  • Removed dead _write_sse_chat_completion pseudo-streaming method
  • Extracted _resolve_model() helper to eliminate duplicated config.yaml parsing
  • Streaming config cached at GatewayRunner init instead of YAML parsing per-message
  • API_SERVER_* env vars registered in OPTIONAL_ENV_VARS for hermes setup integration
  • Security warning added about network exposure without API_SERVER_KEY
  • Resolved conflicts with EMAIL platform, last_prompt_tokens tracking, and docs

@teknium1 teknium1 closed this Mar 11, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
Cherry-picked from PR NousResearch#828, rebased onto current main with conflict resolution.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
Cherry-picked from PR NousResearch#828, resolved conflicts with main.
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
… WebUI

Cherry-picked from PR NousResearch#828, resolved conflicts with main.
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
Cherry-picked from PR NousResearch#828, rebased onto current main with conflict resolution.
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
Cherry-picked from PR NousResearch#828, resolved conflicts with main.
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
… WebUI

Cherry-picked from PR NousResearch#828, resolved conflicts with main.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant