feat: OpenAI-compatible API server — Chat Completions + Responses API#828
Closed
teknium1 wants to merge 7 commits into
Closed
feat: OpenAI-compatible API server — Chat Completions + Responses API#828teknium1 wants to merge 7 commits into
teknium1 wants to merge 7 commits into
Conversation
Adds a new gateway platform adapter that exposes an HTTP server with OpenAI-compatible endpoints, allowing any OpenAI-compatible frontend (Open WebUI, LobeChat, etc.) to use hermes-agent as a backend. Endpoints: - POST /v1/chat/completions - OpenAI Chat Completions format (stateless) - POST /v1/responses - OpenAI Responses API format (stateful via previous_response_id) - GET /v1/models - lists hermes-agent as an available model - GET /health - health check Features: - Bearer token auth via API_SERVER_KEY env var (unauthenticated when no key set) - System messages/instructions become ephemeral system prompt (layered on top of core prompt) - In-memory LRU response store for Responses API conversation chaining - Agent runs in thread executor (run_conversation is synchronous) - Streaming returns 501 (not yet implemented) New files: - gateway/platforms/api_server.py - APIServerAdapter class (~470 lines) - tests/gateway/test_api_server.py - 51 tests covering all endpoints, auth, config Modified files: - gateway/config.py - Added Platform.API_SERVER enum, env var overrides, connected platforms - gateway/run.py - Added _create_adapter() case, auth map entries, auth bypass
… CORS
- GET /v1/responses/{id} — retrieve stored responses
- DELETE /v1/responses/{id} — delete stored responses
- Tool call items in output (function_call + function_call_output)
- Real token counting from AIAgent (prompt/completion/total)
- Truncation parameter support (auto mode, max 100 messages)
- CORS middleware (Access-Control-Allow-Origin: *)
- Full response objects stored for retrieval
- 16 new tests (67 total)
Like /title for sessions — clients can name conversations instead of
tracking response IDs manually:
POST /v1/responses {input: 'hi', conversation: 'my-project'}
POST /v1/responses {input: 'next step', conversation: 'my-project'}
Server automatically chains to the latest response in that conversation.
Mutually exclusive with previous_response_id (returns 400 if both set).
Not stored if store=false.
5 new tests (72 total).
- stream=true now returns SSE (role chunk → content chunk → finish → [DONE]) instead of 501. Not token-by-token but compatible with frontends like Open WebUI that require SSE format. - Add conversation parameter for named session chaining (like /title). Mutually exclusive with previous_response_id.
Step-by-step guide for connecting Open WebUI to hermes-agent via the API server. Covers Docker setup, Admin UI config, Chat Completions vs Responses API modes, troubleshooting, and Linux Docker networking.
Token-by-token streaming of LLM responses, disabled by default. Enable via streaming.enabled: true in config.yaml or HERMES_STREAMING_ENABLED=true env var. Core (run_agent.py): - stream_callback parameter on AIAgent - _run_streaming_chat_completion() for Chat Completions streaming - _run_codex_stream() now emits tokens via callback - _interruptible_api_call routes to streaming when callback is set - Graceful fallback to non-streaming on any error Gateway (gateway/run.py): - Read streaming config (master switch + per-platform overrides) - Queue-based callback bridging agent thread to async event loop - stream_preview task: progressive message editing with cursor - Skip normal send when streaming already delivered the response - Thread-safe, respects platform rate limits (1.5s edit interval) API Server (gateway/platforms/api_server.py): - Real token-by-token SSE when stream=true (replaces pseudo-streaming) - stream_callback wired through _create_agent and _run_agent - Background agent task + queue for concurrent streaming Config: - streaming.enabled (master switch, default: false) - Per-platform: streaming.telegram, streaming.discord, etc. - HERMES_STREAMING_ENABLED env var override
5d14188 to
5a426e0
Compare
… WebUI - website/docs/user-guide/features/api-server.md — full API server docs: endpoints (chat completions, responses, models), system prompt handling, auth, config, compatible frontends matrix, limitations - website/docs/user-guide/features/streaming.md — streaming docs: per-platform support matrix, architecture, config reference, troubleshooting, interaction with tools/compression/interrupts - website/docs/user-guide/messaging/open-webui.md — already existed, step-by-step Open WebUI integration guide - website/docs/user-guide/messaging/index.md — updated to include API server in architecture diagram and description
teknium1
added a commit
that referenced
this pull request
Mar 11, 2026
Cherry-picked from PR #828, rebased onto current main with conflict resolution.
teknium1
added a commit
that referenced
this pull request
Mar 11, 2026
… CORS Cherry-picked from PR #828.
teknium1
added a commit
that referenced
this pull request
Mar 11, 2026
Cherry-picked from PR #828, resolved conflicts with main.
teknium1
added a commit
that referenced
this pull request
Mar 11, 2026
… WebUI Cherry-picked from PR #828, resolved conflicts with main.
Contributor
Author
|
Superseded by PR #956, which rebases this onto current main and includes several improvements:
|
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
Cherry-picked from PR NousResearch#828, rebased onto current main with conflict resolution.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
… CORS Cherry-picked from PR NousResearch#828.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
Cherry-picked from PR NousResearch#828.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
Cherry-picked from PR NousResearch#828.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
Cherry-picked from PR NousResearch#828.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
Cherry-picked from PR NousResearch#828, resolved conflicts with main.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 28, 2026
… WebUI Cherry-picked from PR NousResearch#828, resolved conflicts with main.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
Cherry-picked from PR NousResearch#828, rebased onto current main with conflict resolution.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
… CORS Cherry-picked from PR NousResearch#828.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
Cherry-picked from PR NousResearch#828.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
Cherry-picked from PR NousResearch#828.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
Cherry-picked from PR NousResearch#828.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
Cherry-picked from PR NousResearch#828, resolved conflicts with main.
CumulusService
pushed a commit
to Cumulus-Service-GmbH/hermes-agent
that referenced
this pull request
May 30, 2026
… WebUI Cherry-picked from PR NousResearch#828, resolved conflicts with main.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an OpenAI-compatible HTTP API server as a new gateway platform adapter, enabling any OpenAI-compatible frontend to use hermes-agent as a backend.
What this enables
All connect to hermes-agent by pointing at
http://localhost:8642/v1.Endpoints
/v1/chat/completions/v1/responsesprevious_response_id)/v1/models/healthKey features
messagesarray. Returns final agent response only (tool calls execute invisibly server-side).previous_response_id. Stores full internal history (including tool calls and results) so multi-turn context is fully preserved across requests. In-memory LRU store (max 100 responses).API_SERVER_KEYenv var. No key = allow all (for local-only use).Quick start
Configuration
API_SERVER_ENABLEDfalseAPI_SERVER_PORT8642API_SERVER_HOST127.0.0.1API_SERVER_KEYFiles changed
gateway/platforms/api_server.pygateway/config.pygateway/run.pytests/gateway/test_api_server.pyPhase 1 (this PR)
Future phases
Test results