Skip to content

fix(telegram): _GATEWAY_PROVIDER_ERROR_RE false-positives on legitimate HTTP prose #28670

@teknium1

Description

@teknium1

From post-merge audit of PR #28510 (#24014 salvage, quiet noisy Telegram gateway errors).

Bug

_sanitize_gateway_final_response in gateway/run.py runs only on Telegram final responses. If the response text matches _GATEWAY_PROVIDER_ERROR_RE, the ENTIRE answer is replaced with a canned error message. The pattern \bhttp\s*\d{3}\b triggers on any HTTP status reference in prose.

Reproduction

import re
_GATEWAY_PROVIDER_ERROR_RE = re.compile(
    r'(api\s+(?:call\s+)?failed|provider\s+authentication\s+failed|non-retryable\s+error'
    r'|rate\s+limited\s+after\s+\d+\s+retries|error\s+code\s*:|\bhttp\s*\d{3}\b'
    r'|incorrect\s+api\s+key|invalid\s+api\s+key)',
    re.IGNORECASE,
)

# All these match — agent's answer to legit user question gets replaced:
_GATEWAY_PROVIDER_ERROR_RE.search("HTTP 404 means 'not found'")              # MATCH
_GATEWAY_PROVIDER_ERROR_RE.search("When you get HTTP 500 errors, check logs") # MATCH
_GATEWAY_PROVIDER_ERROR_RE.search("The API call failed because token expired")# MATCH (also fine — but also matches non-error "API call failed" prose)

Impact

User asks "what does HTTP 404 mean?" on Telegram. Agent answers correctly. Response is silently replaced with "⚠️ The model provider failed after retries. I kept raw provider details out of chat; check gateway logs for diagnostics." Telegram-only — CLI/Discord/Slack unaffected.

Confirmed: Curl returned 'HTTP/1.1 200 OK' does NOT match (slash prevents \b boundary) — so the bug specifically hits the common HTTP NNN form.

Proposed fixes (one of)

  1. Length cap: only sanitize messages under N lines or N characters. Real provider errors are short; assistant answers are long.
  2. Require preamble position: only sanitize when the regex matches in the FIRST line, not anywhere in the body.
  3. Require AT LEAST TWO matches from the union to fire. http NNN alone in prose is a false positive; http NNN + error code: or + api call failed together signals a real error body.
  4. Drop \bhttp\s*\d{3}\b from the union entirely, rely on the other markers (api call failed, error code:, provider authentication failed, etc.).

Option (1) or (2) preserves the original PR's intent (suppress noisy provider failures) while avoiding the false-positive. Option (4) is simplest but loses coverage for raw bodies that JUST contain HTTP 500 Internal Server Error and nothing else.

Scope

Telegram only. _sanitize_gateway_final_response short-circuits for any other platform. Limits blast radius but the bug is real for Telegram users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliveryplatform/telegramTelegram bot adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions