fix(api-server): emit tool.completed lifecycle SSE for chat completions (#16588) by briandevans · Pull Request #16666 · NousResearch/hermes-agent

briandevans · 2026-04-27T18:19:00Z

Summary

Add tool_start_callback / tool_complete_callback to the chat completions agent invocation so SSE consumers receive a matching status: completed event tied to the same toolCallId as the running event.
Emit the lifecycle pair on the existing event: hermes.tool.progress line; payload now carries toolCallId + status (running / completed) alongside the legacy emoji/label event.
Filter internal tools (names starting with _) and drop orphan completed callbacks that have no prior running so clients never see uncorrelatable lifecycle updates.
Adds two regression tests in tests/gateway/test_api_server.py: one asserts both halves of the lifecycle pair land on the wire with matching toolCallId; the other proves filtering handles internal tools and orphan completes.

The bug

/v1/chat/completions streaming emits event: hermes.tool.progress for tool start, but never a matching completed event with the exact toolCallId. Frontends rendering tool cards stay stuck in a "running" state until the full assistant response ends or they guess completion locally. The /v1/responses path already does this correctly via tool_start_callback / tool_complete_callback (api_server.py:1797-1829) — only chat completions was missing it.

The fix

Define _on_tool_start and _on_tool_complete in the chat completions handler that push tagged ("__tool_lifecycle__", payload) items to the existing stream queue. The SSE writer's _emit is extended to recognize the new tag on the same event: hermes.tool.progress SSE line, so no new event type is introduced — clients consuming the legacy emoji/label event keep working unchanged. A small _started_tool_call_ids set tracks running calls so a completed callback for a filtered-out internal tool, or one without a prior matching start, is silently dropped.

Test plan

Focused regression test test_stream_emits_tool_lifecycle_with_call_id: parses hermes.tool.progress payloads from the SSE body and asserts running + completed arrive in order with the same toolCallId.
Edge-case test test_stream_tool_lifecycle_skips_internal_and_orphan_completes: internal tool starts and orphan completes (no matching start) produce no lifecycle events on the wire.
Adjacent suite: tests/gateway/test_api_server.py, tests/gateway/test_api_server_runs.py, tests/gateway/test_api_server_multimodal.py — 159/159 pass.
Regression guard: with gateway/platforms/api_server.py reverted to origin/main, the new lifecycle test fails with missing running status; got []; with the fix applied it passes.

Fixes [Bug]: chat completions SSE does not emit tool completion progress #16588
Mirrors the existing structured-callback pattern used on the /v1/responses path (gateway/platforms/api_server.py:1797-1829).
Backward-compatible with the legacy emoji/label __tool_progress__ event introduced in [Bug]: Tool progress markers in SSE content corrupt model behavior over time (Open WebUI / OpenAI-compatible API) #6972.

…ns (NousResearch#16588) `/v1/chat/completions` streaming previously emitted only a ``tool.started``-style ``hermes.tool.progress`` SSE event for tool calls. Clients rendering tool lifecycle UI (Open WebUI, LobeChat, custom dashboards) had no matching ``status: completed`` event with a stable ``toolCallId``, so tool cards stayed in a "running" state until the full assistant response ended or the client guessed completion locally. Pass ``tool_start_callback`` / ``tool_complete_callback`` (the same pair already used by the ``/v1/responses`` path for exact call-id correlation) into ``_run_agent`` from chat completions, and emit each half on the existing ``event: hermes.tool.progress`` SSE line with ``toolCallId`` and ``status``. Internal tools (names starting with ``_``) and orphan ``completed`` events without a matching ``running`` are filtered out so clients never see uncorrelatable lifecycle updates. The pre-existing emoji/label ``__tool_progress__`` event continues unchanged for backward compatibility — frontends that already render the running marker keep working, and frontends that need lifecycle pairs now get them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR fixes /v1/chat/completions streaming so SSE consumers can reliably track tool lifecycles by emitting a matching status: completed event correlated via a stable toolCallId, aligning chat-completions behavior with the existing structured callbacks used elsewhere in the API server.

Changes:

Wire tool_start_callback / tool_complete_callback into the chat-completions streaming agent invocation.
Extend the SSE writer to emit tagged tool lifecycle payloads on event: hermes.tool.progress.
Add regression tests asserting lifecycle emission and filtering of internal/orphan lifecycle events.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`gateway/platforms/api_server.py`	Adds lifecycle callbacks for chat-completions streaming and teaches SSE writer to emit lifecycle payloads on `hermes.tool.progress`.
`tests/gateway/test_api_server.py`	Adds regression tests for lifecycle `running/completed` emission, and for filtering internal tools/orphan completes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-27T18:22:33Z

+                    "toolCallId": tool_call_id,
+                    "tool": function_name,
+                    "status": "completed",
+                }))


/v1/chat/completions streaming will now emit two event: hermes.tool.progress messages for every real tool start: one from tool_progress_callback ("tool_progress") and another from tool_start_callback ("tool_lifecycle"). This happens because run_agent.py fires both callbacks for each tool call. This duplication is likely to confuse SSE consumers (and it doesn’t match the PR description’s “payload now carries toolCallId + status alongside the legacy emoji/label event”). Consider emitting a single “running” payload that includes the legacy emoji/label fields plus toolCallId/status, or suppressing the legacy __tool_progress__ emission when lifecycle callbacks are enabled.

Copilot · 2026-04-27T18:22:34Z

+            statuses = []
+            seen_call_ids = set()
+            lines = body.splitlines()
+            for i, line in enumerate(lines):
+                if line.strip() != "event: hermes.tool.progress":
+                    continue
+                # Find the next non-blank ``data:`` line for this event.
+                for follow in lines[i + 1: i + 4]:
+                    if follow.startswith("data: "):
+                        try:
+                            payload = _json.loads(follow[len("data: "):])
+                        except _json.JSONDecodeError:
+                            break
+                        status = payload.get("status")
+                        if status:
+                            statuses.append(status)
+                        if "toolCallId" in payload:
+                            seen_call_ids.add(payload["toolCallId"])
+                        break
+
+            # Both halves of the lifecycle pair must be emitted, in order,
+            # and tied to the same toolCallId — that's the contract the
+            # issue asks for.
+            assert "running" in statuses, f"missing running status; got {statuses}"
+            assert "completed" in statuses, f"missing completed status; got {statuses}"
+            assert statuses.index("running") < statuses.index("completed")
+            assert seen_call_ids == {"call_terminal_1"}, seen_call_ids


This test intends to assert that both lifecycle halves are correlated by the same toolCallId, but it only checks seen_call_ids == {"call_terminal_1"} overall. That would still pass if (for example) the running payload omitted toolCallId while only completed included it. Consider collecting (status, toolCallId) pairs and asserting both running and completed events each carry toolCallId == "call_terminal_1" (and in the right order).

alt-glitch · 2026-04-27T18:26:21Z

Related to #16591 (earlier fix attempt for same issue #16588). This PR appears more complete with filtering for internal tools and orphan completed events.

Address Copilot review on PR NousResearch#16666: 1. **Duplicate event on every tool start** — both ``tool_progress_callback`` and ``tool_start_callback`` fire side-by-side in ``run_agent.py``, so wiring both into chat completions emitted *two* ``hermes.tool.progress`` events per real tool call. Drop the legacy ``_on_tool_progress`` emit entirely; ``_on_tool_start`` now produces a single unified event that carries the legacy ``tool``/``emoji``/``label`` fields plus the new ``toolCallId``/``status`` correlation fields. Label is computed inline via ``build_tool_preview`` so callers do not need to pre-format it. 2. **Weak per-event correlation in the regression test** — the previous assertion checked that a ``toolCallId`` appeared *somewhere* in the aggregate, which would have passed even if ``running`` lacked the id. Collect ``(status, toolCallId)`` per event and assert each event carries the correct pair, plus exactly two events on the wire (no silent duplication regression). The two existing chat-completions tool-progress tests are updated to fire ``tool_start_callback`` instead of ``tool_progress_callback``, matching production reality where ``run_agent`` always pairs them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

briandevans · 2026-04-27T18:26:51Z

@copilot All findings addressed in commit 3dd69f9:

Duplicate hermes.tool.progress on every tool start — collapsed to a single emit. Dropped _on_tool_progress entirely from the chat completions path; _on_tool_start now produces one unified hermes.tool.progress event carrying the legacy tool/emoji/label payload and the new toolCallId/status correlation fields. tool_progress_callback is no longer wired into the chat completions _run_agent invocation, so the structured callbacks are the sole source for SSE tool lifecycle events. The two pre-existing tests were updated to fire tool_start_callback instead of tool_progress_callback, matching production reality where run_agent pairs them 1:1.
Weak per-event correlation — test_stream_emits_tool_lifecycle_with_call_id now collects (status, toolCallId) per event and asserts each event carries the correct pair plus exactly two events on the wire (assert pairs[0] == ("running", "call_terminal_1") / assert pairs[1] == ("completed", "call_terminal_1")). A regression that omitted toolCallId from the running payload would now fail loudly.

briandevans · 2026-04-27T18:37:47Z

CI audit — all 16 `test` job failures are pre-existing baselines on clean `origin/main` (6d24880). Zero failures are in touched code.

Test	Symptom	Baseline?
`tests/gateway/test_run_progress_topics.py::test_run_agent_progress_uses_event_message_id_for_slack_dm`	`TypeError: NoneType not callable` on `tool_progress_callback`	reproduces on origin/main locally
`tests/hermes_cli/test_config_env_expansion.py::TestLoadConfigExpansion::test_load_config_*`	`TypeError: string indices must be integers, not str`	unrelated to api_server
`tests/hermes_cli/test_container_aware_cli.py::test_get_container_exec_info_defaults`	`assert None is not None`	unrelated
`tests/gateway/test_agent_cache.py::TestAgentCacheSpilloverLive::test_concurrent_inserts_settle_at_cap`	30s test timeout	unrelated, flaky
`tests/hermes_cli/test_pty_bridge.py::TestPtyBridgeResize::test_resize_updates_child_winsize`	`tput: No value for $TERM` in CI runner	env-specific
`tests/hermes_cli/test_web_server.py::TestBuildSchemaFromConfig::test_no_single_field_categories`	`prompt_caching` category has 1 field	upstream config schema regression
`tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_resize_escape_is_forwarded`	30s timeout	unrelated, flaky
`tests/run_agent/test_background_review_toolset_restriction.py::test_background_review_agent_uses_restricted_toolsets`	`AIAgent.init was not called`	unrelated
`tests/run_agent/test_tool_arg_coercion.py::TestCoerceNumber::test_inf_stays_string_for_integer_only`	`assert "inf" == inf`	reproduces on origin/main locally
`tests/tools/test_clipboard.py::TestIsWsl::*`	WSL detection on non-WSL CI	env-specific
`tests/tui_gateway/test_make_agent_provider.py::test_make_agent_passes_resolved_provider`	unexpected `target_model` kwarg	reproduces on origin/main locally
`tests/tui_gateway/test_protocol.py::test_session_resume_returns_hydrated_messages`	unexpected `include_ancestors` kwarg	reproduces on origin/main locally

Touched code (`tests/gateway/test_api_server.py::TestChatCompletionsEndpoint`) — all 17 tests pass locally, including the 4 progress-event tests both before and after the fix-up commit.

The `test` job is failing on baseline; my regression test `test_stream_emits_tool_lifecycle_with_call_id` passes. The `nix (ubuntu-latest)` failure tracks the same failed pytest run.

Address Copilot review on PR #16666: 1. **Duplicate event on every tool start** — both ``tool_progress_callback`` and ``tool_start_callback`` fire side-by-side in ``run_agent.py``, so wiring both into chat completions emitted *two* ``hermes.tool.progress`` events per real tool call. Drop the legacy ``_on_tool_progress`` emit entirely; ``_on_tool_start`` now produces a single unified event that carries the legacy ``tool``/``emoji``/``label`` fields plus the new ``toolCallId``/``status`` correlation fields. Label is computed inline via ``build_tool_preview`` so callers do not need to pre-format it. 2. **Weak per-event correlation in the regression test** — the previous assertion checked that a ``toolCallId`` appeared *somewhere* in the aggregate, which would have passed even if ``running`` lacked the id. Collect ``(status, toolCallId)`` per event and assert each event carries the correct pair, plus exactly two events on the wire (no silent duplication regression). The two existing chat-completions tool-progress tests are updated to fire ``tool_start_callback`` instead of ``tool_progress_callback``, matching production reality where ``run_agent`` always pairs them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

teknium1 · 2026-04-29T15:08:36Z

Merged via PR #17505 — your commit was cherry-picked with authorship preserved (commit e0a03f3 on main). Thanks for the rigorous regression tests and the _started_tool_call_ids orphan filter.

Address Copilot review on PR NousResearch#16666: 1. **Duplicate event on every tool start** — both ``tool_progress_callback`` and ``tool_start_callback`` fire side-by-side in ``run_agent.py``, so wiring both into chat completions emitted *two* ``hermes.tool.progress`` events per real tool call. Drop the legacy ``_on_tool_progress`` emit entirely; ``_on_tool_start`` now produces a single unified event that carries the legacy ``tool``/``emoji``/``label`` fields plus the new ``toolCallId``/``status`` correlation fields. Label is computed inline via ``build_tool_preview`` so callers do not need to pre-format it. 2. **Weak per-event correlation in the regression test** — the previous assertion checked that a ``toolCallId`` appeared *somewhere* in the aggregate, which would have passed even if ``running`` lacked the id. Collect ``(status, toolCallId)`` per event and assert each event carries the correct pair, plus exactly two events on the wire (no silent duplication regression). The two existing chat-completions tool-progress tests are updated to fire ``tool_start_callback`` instead of ``tool_progress_callback``, matching production reality where ``run_agent`` always pairs them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 27, 2026 18:19

Copilot started reviewing on behalf of briandevans April 27, 2026 18:19 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Apr 27, 2026

teknium1 mentioned this pull request Apr 29, 2026

fix(api-server): emit tool.completed lifecycle SSE for chat completions (#16588) #17505

Merged

teknium1 closed this Apr 29, 2026

Schwartz10 mentioned this pull request Jun 4, 2026

[Feature]: Stream correlated tool lifecycle events from session chat endpoint #38994

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api-server): emit tool.completed lifecycle SSE for chat completions (#16588)#16666

fix(api-server): emit tool.completed lifecycle SSE for chat completions (#16588)#16666
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/api-server-tool-completion-sse-16588

briandevans commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

alt-glitch commented Apr 27, 2026

Uh oh!

briandevans commented Apr 27, 2026

Uh oh!

briandevans commented Apr 27, 2026

Uh oh!

teknium1 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

briandevans commented Apr 27, 2026

Summary

The bug

The fix

Test plan

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

alt-glitch commented Apr 27, 2026

Uh oh!

briandevans commented Apr 27, 2026

Uh oh!

briandevans commented Apr 27, 2026

Uh oh!

teknium1 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants