Skip to content

fix(gateway): webhook delivery_info persistence + full session id in /status#5942

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-ef25015a
Apr 8, 2026
Merged

fix(gateway): webhook delivery_info persistence + full session id in /status#5942
teknium1 merged 2 commits into
mainfrom
hermes/hermes-ef25015a

Conversation

@teknium1

@teknium1 teknium1 commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Salvages two contributor PRs onto current main:

1. Webhook delivery_info persistence (PR #5864 by @jescalan)

When the gateway sends interim messages (fallback-model notifications, context-pressure warnings) before the final response, .pop() on delivery_info consumed the entry on the first interim message. The actual final response then found no delivery_info and silently fell through to "log" delivery — effectively lost.

Fix: .pop().get() so all send() calls read the same delivery_info. Adds TTL-based cleanup (_prune_delivery_info) to prevent memory leak.

2. Full session ID + title in /status (PR #5912 by @xinbenlv)

Session ID was truncated to 12 chars, making it impossible to copy/paste for /resume. Now shows the full ID and the session title when available.

Test plan

  • 44 tests pass (webhook adapter, webhook integration, status command)
  • Cherry-picked cleanly onto current main, no conflicts

Closes #5864, closes #5912

jescalan and others added 2 commits April 7, 2026 14:15
The webhook adapter stored per-request `deliver`/`deliver_extra` config in
`_delivery_info[chat_id]` during POST handling and consumed it via `.pop()`
inside `send()`. That worked for routes whose agent run produced exactly
one outbound message — the final response — but it broke whenever the
agent emitted any interim status message before the final response.

Status messages flow through the same `send(chat_id, ...)` path as the
final response (see `gateway/run.py::_status_callback_sync` →
`adapter.send(...)`). Common triggers include:

  - "🔄 Primary model failed — switching to fallback: ..."
    (run_agent.py::_emit_status when `fallback_providers` activates)
  - context-pressure / compression notices
  - any other lifecycle event routed through `status_callback`

When any of those fired, the first `send()` call popped the entry, so the
subsequent final-response `send()` saw an empty dict and silently
downgraded `deliver_type` from `"telegram"` (or `discord`/`slack`/etc.) to
the default `"log"`. The agent's response was logged to the gateway log
instead of being delivered to the configured cross-platform target — no
warning, no error, just a missing message.

This was easy to hit in practice. Any user with `fallback_providers`
configured saw it the first time their primary provider hiccuped on a
webhook-triggered run. Routes that worked perfectly in dev (where the
primary stays healthy) silently dropped responses in prod.

Fix: read `_delivery_info` with `.get()` so multiple `send()` calls for
the same `chat_id` all see the same delivery config. To keep the dict
bounded without relying on per-send cleanup, add a parallel
`_delivery_info_created` timestamp dict and a `_prune_delivery_info()`
helper that drops entries older than `_idempotency_ttl` (1h, same window
already used by `_seen_deliveries`). Pruning runs on each POST, mirroring
the existing `_seen_deliveries` cleanup pattern.

Worst-case memory footprint is now `rate_limit * TTL = 30/min * 60min =
1800` entries, each ~1KB → under 2 MB. In practice it'll be far smaller
because most webhooks complete in seconds, not the full hour.

Test changes:
  - `test_delivery_info_cleaned_after_send` is replaced with
    `test_delivery_info_survives_multiple_sends`, which is now the
    regression test for this bug — it asserts that two consecutive
    `send()` calls both see the delivery config.
  - A new `test_delivery_info_pruned_via_ttl` covers the TTL cleanup
    behavior.
  - The two integration tests that asserted `chat_id not in
    adapter._delivery_info` after `send()` now assert the opposite, with
    a comment explaining why.

All 40 tests in `tests/gateway/test_webhook_adapter.py` and
`tests/gateway/test_webhook_integration.py` pass. Verified end-to-end
locally against a dynamic `hermes webhook subscribe` route configured
with `--deliver telegram --deliver-chat-id <user>`: with `gpt-5.4` as
the primary (currently flaky) and `claude-opus-4.6` as the fallback,
the fallback notification fires, the agent finishes, and the final
response is delivered to Telegram as expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants