feat: OpenAI-compatible API server + streaming support by teknium1 · Pull Request #956 · NousResearch/hermes-agent

teknium1 · 2026-03-11T16:01:57Z

Summary

Rebased and improved version of PR #828. Adds an OpenAI-compatible HTTP API server as a gateway platform adapter, plus streaming support across all platforms.

What this enables

Any OpenAI-compatible frontend — Open WebUI (126k★), LobeChat (73k★), LibreChat (34k★), AnythingLLM (56k★), NextChat (87k★), ChatBox (39k★), etc. — can connect to hermes-agent by pointing at http://localhost:8642/v1.

Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	OpenAI Chat Completions API (stateless)
POST	`/v1/responses`	OpenAI Responses API (stateful via `previous_response_id` or `conversation`)
GET	`/v1/responses/{id}`	Retrieve a stored response
DELETE	`/v1/responses/{id}`	Delete a stored response
GET	`/v1/models`	Lists hermes-agent as available model
GET	`/health`	Health check

Key features

Chat Completions: Full conversation in each request, returns final agent response
Responses API: Server-side conversation state, named conversations via conversation parameter
Streaming: Real SSE streaming for API server + progressive message editing for Telegram/Discord/Slack
System prompt layering: Frontend system messages layered ON TOP of core prompt
Bearer token auth: Optional via API_SERVER_KEY env var
CORS support: Browser-based frontends can connect directly
Usage tracking: Real token counts in responses

Improvements over PR #828

Removed dead code: Unused _write_sse_chat_completion pseudo-streaming method deleted
Deduplicated model resolution: Extracted _resolve_model() helper in gateway/run.py — API server now imports it instead of duplicating the YAML parsing
Cached streaming config: Streaming config is loaded once at GatewayRunner.__init__ instead of parsing config.yaml on every message
Setup integration: API_SERVER_ENABLED, API_SERVER_KEY, API_SERVER_PORT, API_SERVER_HOST registered in OPTIONAL_ENV_VARS so hermes setup prompts for them
Security docs: Added prominent warning about network exposure when binding to 0.0.0.0 without auth
Rebased onto current main: Resolved conflicts with EMAIL platform, last_prompt_tokens tracking, and docs

Documentation

API server guide (features/api-server.md)
Streaming guide (features/streaming.md)
Open WebUI integration guide with Docker Compose (messaging/open-webui.md)

Test results

82 new tests (51 API server + 31 streaming), all passing
1 skipped (gateway streaming config test — optional helper)

Cherry-picked from PR #828, rebased onto current main with conflict resolution.

… CORS Cherry-picked from PR #828.

Cherry-picked from PR #828.

Cherry-picked from PR #828, resolved conflicts with main.

… WebUI Cherry-picked from PR #828, resolved conflicts with main.

…tion, cache streaming config, add setup integration and security docs - Remove unused _write_sse_chat_completion pseudo-streaming method (dead code) - Extract _resolve_model() helper in gateway/run.py, use from api_server - Cache streaming config at GatewayRunner init instead of YAML parsing per-message - Add API_SERVER_* env vars to OPTIONAL_ENV_VARS for hermes setup integration - Add security warning about network exposure without API_SERVER_KEY

Add configurable reply_to_mode for Telegram multi-chunk replies: - off: never thread replies to original message - first: only first chunk threads (default, preserves current behavior) - all: all chunks thread to original message Configurable via reply_to_mode in platform config or TELEGRAM_REPLY_TO_MODE env var. Cherry-picked from PR #855 by raulvidis, rebased onto current main. Dropped asyncio_mode=auto pyproject.toml change, added @pytest.mark.asyncio decorators, fixed test IDs to use numeric strings. Co-authored-by: Raul <77628552+raulvidis@users.noreply.github.com>

Salvaged from PR #956 (teknium1) onto current main. Adds an OpenAI-compatible HTTP server as a new gateway platform adapter. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc. — can connect to hermes-agent by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless chat (full conversation per request) - POST /v1/responses — stateful via previous_response_id or named conversations - GET/DELETE /v1/responses/{id} — retrieve/delete stored responses - GET /v1/models — model discovery - GET /health — health check Key features: - Real SSE streaming via stream_delta_callback - Responses API with server-side conversation state (in-memory LRU, max 100) - Named conversations via 'conversation' parameter - System prompt layering (frontend prompts add to core agent prompt) - Bearer token auth (optional, via HTTP_SERVER_KEY) - CORS support for browser-based frontends - Binds to 127.0.0.1 by default (secure) - Uses aiohttp (existing dependency, no new deps) Files: - gateway/platforms/http_server.py — adapter implementation - gateway/config.py — Platform.HTTP_SERVER enum + env var overrides - gateway/run.py — adapter factory branch - toolsets.py — hermes-http_server toolset - hermes_cli/config.py — setup env vars - .env.example — HTTP server env vars - tests/gateway/test_http_server.py — 72 tests - website/docs/ — HTTP server guide, Open WebUI integration guide Closes #956

Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference

…ix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>

…ix (NousResearch#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR NousResearch#956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR NousResearch#956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR NousResearch#1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR NousResearch#1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>

Salvaged from PR NousResearch#956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR NousResearch#956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference

Salvaged from PR NousResearch#956 (teknium1) onto current main. Adds an OpenAI-compatible HTTP server as a new gateway platform adapter. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc. — can connect to hermes-agent by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless chat (full conversation per request) - POST /v1/responses — stateful via previous_response_id or named conversations - GET/DELETE /v1/responses/{id} — retrieve/delete stored responses - GET /v1/models — model discovery - GET /health — health check Key features: - Real SSE streaming via stream_delta_callback - Responses API with server-side conversation state (in-memory LRU, max 100) - Named conversations via 'conversation' parameter - System prompt layering (frontend prompts add to core agent prompt) - Bearer token auth (optional, via HTTP_SERVER_KEY) - CORS support for browser-based frontends - Binds to 127.0.0.1 by default (secure) - Uses aiohttp (existing dependency, no new deps) Files: - gateway/platforms/http_server.py — adapter implementation - gateway/config.py — Platform.HTTP_SERVER enum + env var overrides - gateway/run.py — adapter factory branch - toolsets.py — hermes-http_server toolset - hermes_cli/config.py — setup env vars - .env.example — HTTP server env vars - tests/gateway/test_http_server.py — 72 tests - website/docs/ — HTTP server guide, Open WebUI integration guide Closes NousResearch#956

…ix (NousResearch#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR NousResearch#956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR NousResearch#956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR NousResearch#1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR NousResearch#1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>

Salvaged from PR NousResearch#956 (teknium1) onto current main. Adds an OpenAI-compatible HTTP server as a new gateway platform adapter. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc. — can connect to hermes-agent by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless chat (full conversation per request) - POST /v1/responses — stateful via previous_response_id or named conversations - GET/DELETE /v1/responses/{id} — retrieve/delete stored responses - GET /v1/models — model discovery - GET /health — health check Key features: - Real SSE streaming via stream_delta_callback - Responses API with server-side conversation state (in-memory LRU, max 100) - Named conversations via 'conversation' parameter - System prompt layering (frontend prompts add to core agent prompt) - Bearer token auth (optional, via HTTP_SERVER_KEY) - CORS support for browser-based frontends - Binds to 127.0.0.1 by default (secure) - Uses aiohttp (existing dependency, no new deps) Files: - gateway/platforms/http_server.py — adapter implementation - gateway/config.py — Platform.HTTP_SERVER enum + env var overrides - gateway/run.py — adapter factory branch - toolsets.py — hermes-http_server toolset - hermes_cli/config.py — setup env vars - .env.example — HTTP server env vars - tests/gateway/test_http_server.py — 72 tests - website/docs/ — HTTP server guide, Open WebUI integration guide Closes NousResearch#956

Salvaged from PR NousResearch#956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR NousResearch#956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference

…ix (NousResearch#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR NousResearch#956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR NousResearch#956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR NousResearch#1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR NousResearch#1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>

teknium1 added 8 commits March 11, 2026 08:53

feat: add OpenAI-compatible API server platform adapter (Phase 1)

58dc5c4

Cherry-picked from PR #828, rebased onto current main with conflict resolution.

feat: enhance Responses API — retrieval, deletion, tool calls, usage,…

7d771c2

… CORS Cherry-picked from PR #828.

feat: add conversation parameter + named session chaining

7ae208b

Cherry-picked from PR #828.

feat: add pseudo-streaming SSE + conversation parameter

b3c798d

Cherry-picked from PR #828.

docs: add Open WebUI integration guide

b2a4092

Cherry-picked from PR #828.

feat: add streaming LLM response support across all platforms

95d221c

Cherry-picked from PR #828, resolved conflicts with main.

docs: comprehensive documentation for API server, streaming, and Open…

d54280e

… WebUI Cherry-picked from PR #828, resolved conflicts with main.

teknium1 mentioned this pull request Mar 11, 2026

feat: OpenAI-compatible API server — Chat Completions + Responses API #828

Closed

teknium1 mentioned this pull request Mar 11, 2026

feat(gateway): Telegram reply threading modes + forum topic fix #855

Closed

3 tasks

docs: add reply threading mode section to Telegram docs

79b3d36

This was referenced Mar 16, 2026

feat: unified streaming infrastructure — real-time token delivery for CLI + gateway #1538

Merged

Add HTTP API platform adapter #1164

Closed

teknium1 mentioned this pull request Mar 17, 2026

feat: OpenAI-compatible HTTP server platform adapter #1745

Closed

teknium1 mentioned this pull request Mar 17, 2026

feat: OpenAI-compatible API server platform adapter #1756

Merged

teknium1 closed this in #1756 Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OpenAI-compatible API server + streaming support#956

feat: OpenAI-compatible API server + streaming support#956
teknium1 wants to merge 10 commits into
mainfrom
hermes/hermes-106e92b2

teknium1 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented Mar 11, 2026

Summary

What this enables

Endpoints

Key features

Improvements over PR #828

Documentation

Test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants