fix(gateway/webhook): don't pop delivery_info on send#5864
Closed
jescalan wants to merge 1 commit into
Closed
Conversation
The webhook adapter stored per-request `deliver`/`deliver_extra` config in
`_delivery_info[chat_id]` during POST handling and consumed it via `.pop()`
inside `send()`. That worked for routes whose agent run produced exactly
one outbound message — the final response — but it broke whenever the
agent emitted any interim status message before the final response.
Status messages flow through the same `send(chat_id, ...)` path as the
final response (see `gateway/run.py::_status_callback_sync` →
`adapter.send(...)`). Common triggers include:
- "🔄 Primary model failed — switching to fallback: ..."
(run_agent.py::_emit_status when `fallback_providers` activates)
- context-pressure / compression notices
- any other lifecycle event routed through `status_callback`
When any of those fired, the first `send()` call popped the entry, so the
subsequent final-response `send()` saw an empty dict and silently
downgraded `deliver_type` from `"telegram"` (or `discord`/`slack`/etc.) to
the default `"log"`. The agent's response was logged to the gateway log
instead of being delivered to the configured cross-platform target — no
warning, no error, just a missing message.
This was easy to hit in practice. Any user with `fallback_providers`
configured saw it the first time their primary provider hiccuped on a
webhook-triggered run. Routes that worked perfectly in dev (where the
primary stays healthy) silently dropped responses in prod.
Fix: read `_delivery_info` with `.get()` so multiple `send()` calls for
the same `chat_id` all see the same delivery config. To keep the dict
bounded without relying on per-send cleanup, add a parallel
`_delivery_info_created` timestamp dict and a `_prune_delivery_info()`
helper that drops entries older than `_idempotency_ttl` (1h, same window
already used by `_seen_deliveries`). Pruning runs on each POST, mirroring
the existing `_seen_deliveries` cleanup pattern.
Worst-case memory footprint is now `rate_limit * TTL = 30/min * 60min =
1800` entries, each ~1KB → under 2 MB. In practice it'll be far smaller
because most webhooks complete in seconds, not the full hour.
Test changes:
- `test_delivery_info_cleaned_after_send` is replaced with
`test_delivery_info_survives_multiple_sends`, which is now the
regression test for this bug — it asserts that two consecutive
`send()` calls both see the delivery config.
- A new `test_delivery_info_pruned_via_ttl` covers the TTL cleanup
behavior.
- The two integration tests that asserted `chat_id not in
adapter._delivery_info` after `send()` now assert the opposite, with
a comment explaining why.
All 40 tests in `tests/gateway/test_webhook_adapter.py` and
`tests/gateway/test_webhook_integration.py` pass. Verified end-to-end
locally against a dynamic `hermes webhook subscribe` route configured
with `--deliver telegram --deliver-chat-id <user>`: with `gpt-5.4` as
the primary (currently flaky) and `claude-opus-4.6` as the fallback,
the fallback notification fires, the agent finishes, and the final
response is delivered to Telegram as expected.
Contributor
|
Merged via PR #5942. Your commit was cherry-picked onto current main with authorship preserved in git log. Thanks for the fix! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The webhook adapter silently downgrades the configured
delivertype to"log"whenever the agent emits any interim status message before the final response. The most common trigger is fallback-model activation:The user configured
--deliver telegram --deliver-chat-id <id>, but the response lands in the gateway log instead of Telegram. No warning, no error.Root cause
gateway/platforms/webhook.py::send()consumed_delivery_info[chat_id]via.pop():That works when the agent run produces exactly one outbound message — the final response. But interim status messages flow through the same
send()path. Whenrun_agent.py::_emit_status()fires (e.g. on fallback activation, context-pressure warnings, or any other lifecycle event routed throughstatus_callback), the gateway callsadapter.send(chat_id, message, ...)for the same chat_id. That first call pops the entry. The subsequent final-responsesend()call sees an empty dict, falls back todeliver_type = "log", and writes the response to the gateway log instead of crossing over to the configured platform.This is easy to hit in practice. Anyone with
fallback_providersconfigured runs into it the first time their primary provider hiccups on a webhook-triggered run. Routes that work perfectly in dev silently drop responses in prod.The bug is not specific to fallback notifications — it affects any code path that emits an interim status message before the final response.
Fix
_delivery_infowith.get()instead of.pop()so multiplesend()calls for the samechat_idall see the same delivery config._delivery_info_created: Dict[str, float]timestamp dict._prune_delivery_info(now)that drops entries older than_idempotency_ttl(1h, the same window already used by_seen_deliveries)._prune_delivery_info()on each POST, mirroring the existing cleanup pattern for_seen_deliveries.Worst-case bound on
_delivery_infosize is nowrate_limit * TTL = 30/min * 60min = 1800entries, each ~1 KB → under 2 MB. In practice it'll be much smaller because most webhooks complete in seconds.Tests
test_delivery_info_cleaned_after_sendis replaced withtest_delivery_info_survives_multiple_sends— the regression test for this bug. It asserts that two consecutivesend()calls for the samechat_idboth see the delivery config.test_delivery_info_pruned_via_ttlcovers the TTL cleanup behavior.test_webhook_integration.pythat previously assertedchat_id not in adapter._delivery_infoaftersend()now assert the opposite, with a comment explaining why.All 40 tests in
tests/gateway/test_webhook_adapter.pyandtests/gateway/test_webhook_integration.pypass:End-to-end verification
Reproduced and verified locally with a dynamic subscription:
With
gpt-5.4as primary (currently flaky in my environment) andclaude-opus-4.6as fallback:_delivery_info, and the final response landed in the gateway log. Zero Telegram messages delivered.Notes
_delivery_infoentries now live until TTL expiry instead of being consumed on send. This is consistent with the existing_seen_deliverieslifecycle and is what the cross-platform delivery contract actually requires.