Skip to content

feat: OpenAI-compatible API server + streaming support#956

Closed
teknium1 wants to merge 10 commits into
mainfrom
hermes/hermes-106e92b2
Closed

feat: OpenAI-compatible API server + streaming support#956
teknium1 wants to merge 10 commits into
mainfrom
hermes/hermes-106e92b2

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Rebased and improved version of PR #828. Adds an OpenAI-compatible HTTP API server as a gateway platform adapter, plus streaming support across all platforms.

What this enables

Any OpenAI-compatible frontend — Open WebUI (126k★), LobeChat (73k★), LibreChat (34k★), AnythingLLM (56k★), NextChat (87k★), ChatBox (39k★), etc. — can connect to hermes-agent by pointing at http://localhost:8642/v1.

Endpoints

Method Path Description
POST /v1/chat/completions OpenAI Chat Completions API (stateless)
POST /v1/responses OpenAI Responses API (stateful via previous_response_id or conversation)
GET /v1/responses/{id} Retrieve a stored response
DELETE /v1/responses/{id} Delete a stored response
GET /v1/models Lists hermes-agent as available model
GET /health Health check

Key features

  • Chat Completions: Full conversation in each request, returns final agent response
  • Responses API: Server-side conversation state, named conversations via conversation parameter
  • Streaming: Real SSE streaming for API server + progressive message editing for Telegram/Discord/Slack
  • System prompt layering: Frontend system messages layered ON TOP of core prompt
  • Bearer token auth: Optional via API_SERVER_KEY env var
  • CORS support: Browser-based frontends can connect directly
  • Usage tracking: Real token counts in responses

Improvements over PR #828

  • Removed dead code: Unused _write_sse_chat_completion pseudo-streaming method deleted
  • Deduplicated model resolution: Extracted _resolve_model() helper in gateway/run.py — API server now imports it instead of duplicating the YAML parsing
  • Cached streaming config: Streaming config is loaded once at GatewayRunner.__init__ instead of parsing config.yaml on every message
  • Setup integration: API_SERVER_ENABLED, API_SERVER_KEY, API_SERVER_PORT, API_SERVER_HOST registered in OPTIONAL_ENV_VARS so hermes setup prompts for them
  • Security docs: Added prominent warning about network exposure when binding to 0.0.0.0 without auth
  • Rebased onto current main: Resolved conflicts with EMAIL platform, last_prompt_tokens tracking, and docs

Documentation

  • API server guide (features/api-server.md)
  • Streaming guide (features/streaming.md)
  • Open WebUI integration guide with Docker Compose (messaging/open-webui.md)

Test results

  • 82 new tests (51 API server + 31 streaming), all passing
  • 1 skipped (gateway streaming config test — optional helper)

Cherry-picked from PR #828, rebased onto current main with conflict resolution.
Cherry-picked from PR #828, resolved conflicts with main.
… WebUI

Cherry-picked from PR #828, resolved conflicts with main.
…tion, cache streaming config, add setup integration and security docs

- Remove unused _write_sse_chat_completion pseudo-streaming method (dead code)
- Extract _resolve_model() helper in gateway/run.py, use from api_server
- Cache streaming config at GatewayRunner init instead of YAML parsing per-message
- Add API_SERVER_* env vars to OPTIONAL_ENV_VARS for hermes setup integration
- Add security warning about network exposure without API_SERVER_KEY
Add configurable reply_to_mode for Telegram multi-chunk replies:
- off: never thread replies to original message
- first: only first chunk threads (default, preserves current behavior)
- all: all chunks thread to original message

Configurable via reply_to_mode in platform config or TELEGRAM_REPLY_TO_MODE
env var.

Cherry-picked from PR #855 by raulvidis, rebased onto current main.
Dropped asyncio_mode=auto pyproject.toml change, added @pytest.mark.asyncio
decorators, fixed test IDs to use numeric strings.

Co-authored-by: Raul <77628552+raulvidis@users.noreply.github.com>
teknium1 added a commit that referenced this pull request Mar 17, 2026
Salvaged from PR #956 (teknium1) onto current main.

Adds an OpenAI-compatible HTTP server as a new gateway platform
adapter. Any frontend that speaks the OpenAI format — Open WebUI,
LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc. — can
connect to hermes-agent by pointing at http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions — stateless chat (full conversation per request)
- POST /v1/responses — stateful via previous_response_id or named conversations
- GET/DELETE /v1/responses/{id} — retrieve/delete stored responses
- GET /v1/models — model discovery
- GET /health — health check

Key features:
- Real SSE streaming via stream_delta_callback
- Responses API with server-side conversation state (in-memory LRU, max 100)
- Named conversations via 'conversation' parameter
- System prompt layering (frontend prompts add to core agent prompt)
- Bearer token auth (optional, via HTTP_SERVER_KEY)
- CORS support for browser-based frontends
- Binds to 127.0.0.1 by default (secure)
- Uses aiohttp (existing dependency, no new deps)

Files:
- gateway/platforms/http_server.py — adapter implementation
- gateway/config.py — Platform.HTTP_SERVER enum + env var overrides
- gateway/run.py — adapter factory branch
- toolsets.py — hermes-http_server toolset
- hermes_cli/config.py — setup env vars
- .env.example — HTTP server env vars
- tests/gateway/test_http_server.py — 72 tests
- website/docs/ — HTTP server guide, Open WebUI integration guide

Closes #956
teknium1 added a commit that referenced this pull request Mar 17, 2026
Salvaged from PR #956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
teknium1 added a commit that referenced this pull request Mar 17, 2026
…ix (#1756)

* feat: OpenAI-compatible API server platform adapter

Salvaged from PR #956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…ix (NousResearch#1756)

* feat: OpenAI-compatible API server platform adapter

Salvaged from PR NousResearch#956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR NousResearch#956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR NousResearch#1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR NousResearch#1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
Salvaged from PR NousResearch#956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR NousResearch#956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 28, 2026
Salvaged from PR NousResearch#956 (teknium1) onto current main.

Adds an OpenAI-compatible HTTP server as a new gateway platform
adapter. Any frontend that speaks the OpenAI format — Open WebUI,
LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc. — can
connect to hermes-agent by pointing at http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions — stateless chat (full conversation per request)
- POST /v1/responses — stateful via previous_response_id or named conversations
- GET/DELETE /v1/responses/{id} — retrieve/delete stored responses
- GET /v1/models — model discovery
- GET /health — health check

Key features:
- Real SSE streaming via stream_delta_callback
- Responses API with server-side conversation state (in-memory LRU, max 100)
- Named conversations via 'conversation' parameter
- System prompt layering (frontend prompts add to core agent prompt)
- Bearer token auth (optional, via HTTP_SERVER_KEY)
- CORS support for browser-based frontends
- Binds to 127.0.0.1 by default (secure)
- Uses aiohttp (existing dependency, no new deps)

Files:
- gateway/platforms/http_server.py — adapter implementation
- gateway/config.py — Platform.HTTP_SERVER enum + env var overrides
- gateway/run.py — adapter factory branch
- toolsets.py — hermes-http_server toolset
- hermes_cli/config.py — setup env vars
- .env.example — HTTP server env vars
- tests/gateway/test_http_server.py — 72 tests
- website/docs/ — HTTP server guide, Open WebUI integration guide

Closes NousResearch#956
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…ix (NousResearch#1756)

* feat: OpenAI-compatible API server platform adapter

Salvaged from PR NousResearch#956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR NousResearch#956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR NousResearch#1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR NousResearch#1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…ix (NousResearch#1756)

* feat: OpenAI-compatible API server platform adapter

Salvaged from PR NousResearch#956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR NousResearch#956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR NousResearch#1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR NousResearch#1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
Salvaged from PR NousResearch#956 (teknium1) onto current main.

Adds an OpenAI-compatible HTTP server as a new gateway platform
adapter. Any frontend that speaks the OpenAI format — Open WebUI,
LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc. — can
connect to hermes-agent by pointing at http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions — stateless chat (full conversation per request)
- POST /v1/responses — stateful via previous_response_id or named conversations
- GET/DELETE /v1/responses/{id} — retrieve/delete stored responses
- GET /v1/models — model discovery
- GET /health — health check

Key features:
- Real SSE streaming via stream_delta_callback
- Responses API with server-side conversation state (in-memory LRU, max 100)
- Named conversations via 'conversation' parameter
- System prompt layering (frontend prompts add to core agent prompt)
- Bearer token auth (optional, via HTTP_SERVER_KEY)
- CORS support for browser-based frontends
- Binds to 127.0.0.1 by default (secure)
- Uses aiohttp (existing dependency, no new deps)

Files:
- gateway/platforms/http_server.py — adapter implementation
- gateway/config.py — Platform.HTTP_SERVER enum + env var overrides
- gateway/run.py — adapter factory branch
- toolsets.py — hermes-http_server toolset
- hermes_cli/config.py — setup env vars
- .env.example — HTTP server env vars
- tests/gateway/test_http_server.py — 72 tests
- website/docs/ — HTTP server guide, Open WebUI integration guide

Closes NousResearch#956
CumulusService pushed a commit to Cumulus-Service-GmbH/hermes-agent that referenced this pull request May 30, 2026
Salvaged from PR NousResearch#956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR NousResearch#956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ix (NousResearch#1756)

* feat: OpenAI-compatible API server platform adapter

Salvaged from PR NousResearch#956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR NousResearch#956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR NousResearch#1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR NousResearch#1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ix (NousResearch#1756)

* feat: OpenAI-compatible API server platform adapter

Salvaged from PR NousResearch#956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR NousResearch#956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR NousResearch#1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR NousResearch#1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants