Skip to content

fix(gateway/webhook): don't pop delivery_info on send#5864

Closed
jescalan wants to merge 1 commit into
NousResearch:mainfrom
jescalan:fix/webhook-delivery-info-pop
Closed

fix(gateway/webhook): don't pop delivery_info on send#5864
jescalan wants to merge 1 commit into
NousResearch:mainfrom
jescalan:fix/webhook-delivery-info-pop

Conversation

@jescalan

@jescalan jescalan commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Summary

The webhook adapter silently downgrades the configured deliver type to "log" whenever the agent emits any interim status message before the final response. The most common trigger is fallback-model activation:

[webhook] POST event=unknown route=agentmail-email delivery=...
[webhook] Response for ...: 🔄 Primary model failed — switching to fallback: claude-opus-4.6 via anthropic
gateway.run: response ready: platform=webhook ... response=912 chars
[Webhook] Sending response (912 chars) to webhook:agentmail-email:...
[webhook] Response for ...: <agent's actual response>     ← logged, never delivered

The user configured --deliver telegram --deliver-chat-id <id>, but the response lands in the gateway log instead of Telegram. No warning, no error.

Root cause

gateway/platforms/webhook.py::send() consumed _delivery_info[chat_id] via .pop():

delivery = self._delivery_info.pop(chat_id, {})
deliver_type = delivery.get("deliver", "log")

That works when the agent run produces exactly one outbound message — the final response. But interim status messages flow through the same send() path. When run_agent.py::_emit_status() fires (e.g. on fallback activation, context-pressure warnings, or any other lifecycle event routed through status_callback), the gateway calls adapter.send(chat_id, message, ...) for the same chat_id. That first call pops the entry. The subsequent final-response send() call sees an empty dict, falls back to deliver_type = "log", and writes the response to the gateway log instead of crossing over to the configured platform.

This is easy to hit in practice. Anyone with fallback_providers configured runs into it the first time their primary provider hiccups on a webhook-triggered run. Routes that work perfectly in dev silently drop responses in prod.

The bug is not specific to fallback notifications — it affects any code path that emits an interim status message before the final response.

Fix

  1. Read _delivery_info with .get() instead of .pop() so multiple send() calls for the same chat_id all see the same delivery config.
  2. Add a parallel _delivery_info_created: Dict[str, float] timestamp dict.
  3. Add _prune_delivery_info(now) that drops entries older than _idempotency_ttl (1h, the same window already used by _seen_deliveries).
  4. Call _prune_delivery_info() on each POST, mirroring the existing cleanup pattern for _seen_deliveries.

Worst-case bound on _delivery_info size is now rate_limit * TTL = 30/min * 60min = 1800 entries, each ~1 KB → under 2 MB. In practice it'll be much smaller because most webhooks complete in seconds.

Tests

  • test_delivery_info_cleaned_after_send is replaced with test_delivery_info_survives_multiple_sends — the regression test for this bug. It asserts that two consecutive send() calls for the same chat_id both see the delivery config.
  • New test_delivery_info_pruned_via_ttl covers the TTL cleanup behavior.
  • Two integration tests in test_webhook_integration.py that previously asserted chat_id not in adapter._delivery_info after send() now assert the opposite, with a comment explaining why.

All 40 tests in tests/gateway/test_webhook_adapter.py and tests/gateway/test_webhook_integration.py pass:

$ pytest tests/gateway/test_webhook_adapter.py tests/gateway/test_webhook_integration.py -q
........................................                                 [100%]
40 passed in 0.89s

End-to-end verification

Reproduced and verified locally with a dynamic subscription:

hermes webhook subscribe agentmail-email \
  --prompt "{text}\n\nReview this email and decide what to do..." \
  --deliver telegram \
  --deliver-chat-id <user-id> \
  --secret INSECURE_NO_AUTH

With gpt-5.4 as primary (currently flaky in my environment) and claude-opus-4.6 as fallback:

  • Before the fix: every POST hit the fallback path, the fallback notification consumed _delivery_info, and the final response landed in the gateway log. Zero Telegram messages delivered.
  • After the fix: same conditions, fallback still fires, and the final response is delivered to Telegram as expected.

Notes

  • Behavior change is intentional: _delivery_info entries now live until TTL expiry instead of being consumed on send. This is consistent with the existing _seen_deliveries lifecycle and is what the cross-platform delivery contract actually requires.
  • No public API changes.
  • No config changes.

The webhook adapter stored per-request `deliver`/`deliver_extra` config in
`_delivery_info[chat_id]` during POST handling and consumed it via `.pop()`
inside `send()`. That worked for routes whose agent run produced exactly
one outbound message — the final response — but it broke whenever the
agent emitted any interim status message before the final response.

Status messages flow through the same `send(chat_id, ...)` path as the
final response (see `gateway/run.py::_status_callback_sync` →
`adapter.send(...)`). Common triggers include:

  - "🔄 Primary model failed — switching to fallback: ..."
    (run_agent.py::_emit_status when `fallback_providers` activates)
  - context-pressure / compression notices
  - any other lifecycle event routed through `status_callback`

When any of those fired, the first `send()` call popped the entry, so the
subsequent final-response `send()` saw an empty dict and silently
downgraded `deliver_type` from `"telegram"` (or `discord`/`slack`/etc.) to
the default `"log"`. The agent's response was logged to the gateway log
instead of being delivered to the configured cross-platform target — no
warning, no error, just a missing message.

This was easy to hit in practice. Any user with `fallback_providers`
configured saw it the first time their primary provider hiccuped on a
webhook-triggered run. Routes that worked perfectly in dev (where the
primary stays healthy) silently dropped responses in prod.

Fix: read `_delivery_info` with `.get()` so multiple `send()` calls for
the same `chat_id` all see the same delivery config. To keep the dict
bounded without relying on per-send cleanup, add a parallel
`_delivery_info_created` timestamp dict and a `_prune_delivery_info()`
helper that drops entries older than `_idempotency_ttl` (1h, same window
already used by `_seen_deliveries`). Pruning runs on each POST, mirroring
the existing `_seen_deliveries` cleanup pattern.

Worst-case memory footprint is now `rate_limit * TTL = 30/min * 60min =
1800` entries, each ~1KB → under 2 MB. In practice it'll be far smaller
because most webhooks complete in seconds, not the full hour.

Test changes:
  - `test_delivery_info_cleaned_after_send` is replaced with
    `test_delivery_info_survives_multiple_sends`, which is now the
    regression test for this bug — it asserts that two consecutive
    `send()` calls both see the delivery config.
  - A new `test_delivery_info_pruned_via_ttl` covers the TTL cleanup
    behavior.
  - The two integration tests that asserted `chat_id not in
    adapter._delivery_info` after `send()` now assert the opposite, with
    a comment explaining why.

All 40 tests in `tests/gateway/test_webhook_adapter.py` and
`tests/gateway/test_webhook_integration.py` pass. Verified end-to-end
locally against a dynamic `hermes webhook subscribe` route configured
with `--deliver telegram --deliver-chat-id <user>`: with `gpt-5.4` as
the primary (currently flaky) and `claude-opus-4.6` as the fallback,
the fallback notification fires, the agent finishes, and the final
response is delivered to Telegram as expected.
@teknium1

teknium1 commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Merged via PR #5942. Your commit was cherry-picked onto current main with authorship preserved in git log. Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants