Skip to content

fix(telegram): log document/video send failures instead of printing to stdout (#13356)#13387

Closed
briandevans wants to merge 1 commit into
NousResearch:mainfrom
briandevans:fix/telegram-document-video-log-errors
Closed

fix(telegram): log document/video send failures instead of printing to stdout (#13356)#13387
briandevans wants to merge 1 commit into
NousResearch:mainfrom
briandevans:fix/telegram-document-video-log-errors

Conversation

@briandevans

@briandevans briandevans commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Addresses the observability portion of #13356.

TL;DR

TelegramAdapter.send_document and send_video wrote native-send exceptions to stdout via print(...) before falling back to the base adapter. On systemd / Docker / journald-captured gateway deployments stdout isn't captured in the gateway log stream — so operators reporting "Hermes says success, Telegram received nothing" had no visible error to diagnose.

Fix: replace print with logger.error(..., exc_info=True), matching the established pattern in send_voice and send_image_file in the same file.

Root cause

gateway/platforms/telegram.py:

# send_document (line 1777)
except Exception as e:
    print(f"[{self.name}] Failed to send document: {e}")   # ← stdout, invisible
    return await super().send_document(...)                # ← fallback to text
# send_video (line 1808) — same pattern
except Exception as e:
    print(f"[{self.name}] Failed to send video: {e}")
    return await super().send_video(...)

The two adjacent native-send methods in the same file already use the correct pattern:

# send_voice (line 1701)
logger.error(
    "[%s] Failed to send Telegram voice/audio, falling back to base adapter: %s",
    self.name, e, exc_info=True,
)

# send_image_file (line 1738) — identical
logger.error(
    "[%s] Failed to send Telegram local image, falling back to base adapter: %s",
    self.name, e, exc_info=True,
)

Only send_document and send_video drifted to print.

Fix

# send_document
except Exception as e:
    logger.error(
        "[%s] Failed to send Telegram document, falling back to base adapter: %s",
        self.name, e, exc_info=True,
    )
    return await super().send_document(chat_id, file_path, caption, file_name, reply_to)

Identical change for send_video. 4 production-line delta total.

Behaviour matrix

Scenario Before After
Native send fails → text fallback succeeds print to stdout (invisible in journald), fallback returns success logger.error(exc_info=True) to gateway log, fallback returns success
Native send fails → fallback also fails print to stdout, exception propagates logger.error(exc_info=True), exception propagates
Native send succeeds unchanged unchanged

Narrow scope — explicitly not changed

  • Fallback routing. Still falls through to super().send_document(...) / super().send_video(...), matching send_voice / send_image_file. This PR only fixes observability.
  • Return value. SendResult still reflects the text fallback's success, same as the sibling methods.
  • Other adapters. Only Telegram had the print pattern; other platforms already use logger.
  • The deeper reporter question — whether Telegram's native send returning a Message object (with message_id) for a subsequently-dropped file is also a bug — is out of scope here. With proper logging in place operators can now diagnose the actual upstream error and decide if that semantic needs changing.

Regression coverage

tests/gateway/test_telegram_documents.py gets 5 new cases (3 for send_document, 2 for send_video), leveraging caplog + capsys:

TestSendDocument:

  • test_send_document_api_error_is_logged_with_exc_info — asserts an ERROR record lands in gateway.platforms.telegram logger with (1) exc_info attached, (2) exception text present, (3) adapter name in message.
  • test_send_document_logs_do_not_hit_stdout — structural pin: the old "[Telegram] Failed to send document" string must not appear on stdout. Prevents regression back to print.
  • test_send_document_still_invokes_base_fallback_on_error — preserved-behaviour canary: this PR only changes observability, not routing.

TestSendVideo: matching pair (..._is_logged_with_exc_info, ..._logs_do_not_hit_stdout).

4 of 4 stdout-and-logger tests fail on clean origin/main (6fb69229) with the exact symptom:

AssertionError: '[Telegram] Failed to send document' is contained here:
  [Telegram] Failed to send document: boom
AssertionError: Expected an ERROR-level log from gateway.platforms.telegram
  on send_document failure

Validation

source venv/bin/activate
python -m pytest tests/gateway/test_telegram_documents.py -q
# 36 passed

Broader Telegram suite (4 test files) → 68 passed, 0 regressions.

Pre-empted review questions

Q. Why not also remove the super() text fallback — isn't returning success=True misleading when the file didn't actually send?
Maybe, but that's a deliberate scope-narrowing choice. The same "fall through to base on native failure" pattern exists in send_voice and send_image_file — changing it for Telegram specifically would break adapter symmetry and risk regressions in callers that rely on the fallback. Happy to follow up with a separate PR if the maintainer wants that behavior for Telegram.

Q. Why test both caplog and capsys?
caplog pins the positive invariant (error IS logged correctly). capsys pins the negative invariant (error is NOT leaked to stdout). Together they're a two-sided guard — a future refactor that adds print back in addition to logger.error would still be caught by capsys.

Q. Is this a complete fix for #13356?
For the observability portion, yes. For the reporter's deeper "success reported, file never arrives" concern, the fix here at minimum makes any raised exception diagnosable. If Telegram is returning message_id for a file that's subsequently server-side dropped, that's either a python-telegram-bot semantics issue or a message_thread_id / path-access issue that needs separate investigation — with proper logging in place operators can now surface the actual upstream error.


Co-authored via LLM assistance; I've reviewed every line and am responsible for correctness.

Copilot AI review requested due to automatic review settings April 21, 2026 07:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Telegram gateway observability by logging native send_document / send_video failures via the module logger (with stack traces) instead of printing to stdout, while preserving the existing base-adapter fallback behavior.

Changes:

  • Replace print(...) in TelegramAdapter.send_document and send_video exception handlers with logger.error(..., exc_info=True).
  • Add regression tests asserting failures are logged (with exc_info) and not emitted to stdout.
  • Add a canary test ensuring the base-adapter fallback path still executes on send_document failures.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
gateway/platforms/telegram.py Switch error reporting from stdout print to structured logging with stack traces for document/video sends.
tests/gateway/test_telegram_documents.py Add caplog/capsys regression coverage to ensure failures are logged (not printed) and fallback behavior remains intact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +542 to +550
# --- Regression coverage for #13356 -----------------------------------
# Before the fix, ``send_document`` wrote the exception to stdout via
# ``print(...)`` instead of ``logger.error``. On systemd/Docker gateway
# deployments the diagnostic was invisible, so operators seeing
# "Hermes says success, Telegram received nothing" had no way to
# see the underlying error. These tests pin that every failure
# routes through the standard logger with a full stack trace (matches
# the pattern established by ``send_voice`` and ``send_image_file``).

Copilot AI Apr 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says this file adds “4 new cases (2 per method)”, but this diff adds 5 new tests (3 under TestSendDocument plus 2 under TestSendVideo). Please update the PR description to match the actual test changes (or drop the extra test if the intent really was 4).

Copilot uses AI. Check for mistakes.
@briandevans

Copy link
Copy Markdown
Contributor Author

Thanks @copilot-pull-request-reviewer — caught pattern #7 (docstring / PR-description accuracy) cleanly. You're right: the prose said "4 new cases (2 per method)" but the actual diff adds 5 (3 under TestSendDocument, 2 under TestSendVideo). The bullet list further down in the PR body was internally consistent (3+2), but the summary line contradicted it.

The 3 send_document tests are deliberate — the extra one (test_send_document_still_invokes_base_fallback_on_error) is a preserved-behaviour canary pinning that this PR changes observability only, not routing. I'll update the PR description to match reality.

No code change needed.

@briandevans

Copy link
Copy Markdown
Contributor Author

Supply-chain FAILURE on this PR is a false positive from a scan bug, not a real finding. Opened #13411 with a root-cause analysis and a one-character fix.

Quick version: .github/workflows/supply-chain-audit.yml computes the PR diff with git diff "$BASE".."$HEAD" (two-dot) which includes every file main has drifted through since the PR was forked. On this PR, that adds hermes_cli/setup.py (modified upstream, not by this PR) to the scan's file list, which trips the install-hook check. The 3-dot merge-base diff shows only this PR's actual 2 files (neither of which matches any attack pattern).

Was masked pre-19db7fa3 because that commit changed the fail condition from critical == 'true' to found == 'true', unmasking the pre-existing 2-dot bug.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels Apr 22, 2026
…o stdout (NousResearch#13356)

``TelegramAdapter.send_document`` and ``send_video`` wrote native-send
exceptions to ``stdout`` via ``print(...)`` before falling back to the
base adapter's text-fallback.  On systemd / Docker / journald-captured
gateway deployments ``stdout`` from the Python process isn't captured
in the gateway log stream — so operators who reported "Hermes says
success, Telegram received nothing" had no visible error to diagnose.

The adjacent paths in the same file already use the right pattern:

* ``send_voice`` (line 1701) → ``logger.error(..., exc_info=True)``
* ``send_image_file`` (line 1738) → ``logger.error(..., exc_info=True)``

Only ``send_document`` and ``send_video`` had drifted to ``print``.

Fix
---
Replace ``print`` with ``logger.error`` and attach ``exc_info=True``
so the full stack + exception chain lands in the gateway log.  Matches
the sibling-method pattern exactly.

Narrow scope — explicitly not changed
-------------------------------------
* **Fallback routing.**  Still falls through to
  ``super().send_document(...)`` / ``super().send_video(...)`` on
  native-send failure, same as ``send_voice`` / ``send_image_file``.
  If Teknium wants to change that semantic for Telegram (where
  native support exists so the fallback masks real failures), that's
  a separate, larger-scope PR.
* **Return value.**  ``SendResult`` still reflects the text fallback's
  success, unchanged.
* **Other adapters.**  Only Telegram had this ``print`` pattern;
  other platforms already use ``logger``.

Regression coverage
-------------------
``tests/gateway/test_telegram_documents.py`` gets 4 new cases (2 per
method):

* ``test_send_document_api_error_is_logged_with_exc_info`` — asserts
  an ``ERROR``-level record lands in the ``gateway.platforms.telegram``
  logger with ``exc_info`` attached, the exception text present, and
  the adapter name in the message.
* ``test_send_document_logs_do_not_hit_stdout`` — structural pin:
  the old ``"[Telegram] Failed to send document"`` string must not
  appear on ``stdout`` (would re-introduce the systemd/Docker-invisible
  regression).
* ``test_send_document_still_invokes_base_fallback_on_error`` —
  preserved-behaviour canary: this PR only changes observability, not
  routing.
* Matching pair for ``send_video``.

All 4 new tests fail on clean ``origin/main`` (``6fb69229``) with::

    AssertionError: '[Telegram] Failed to send document' is contained
      here: '[Telegram] Failed to send document: boom\\n'
    AssertionError: Expected an ERROR-level log from
      gateway.platforms.telegram on send_document failure

Validation
----------
``source venv/bin/activate && python -m pytest
tests/gateway/test_telegram_documents.py -q`` → **36 passed**.

Broader Telegram suite
(``test_telegram_documents.py`` + ``test_telegram_approval_buttons.py``
+ ``test_telegram_caption_merge.py`` + ``test_telegram_conflict.py``) →
**68 passed, 0 regressions**.

Related
-------
This PR addresses the observability portion of NousResearch#13356.  The reporter
also describes "reports success, Telegram receives nothing" — the
deeper question of whether the base-adapter text fallback itself
should surface a more distinct failure signal (or whether Telegram
native send silently succeeds for files that later server-side drop)
is out of scope for this PR; with proper logging in place, operators
can now diagnose the actual upstream error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@briandevans briandevans force-pushed the fix/telegram-document-video-log-errors branch from a656776 to adc2b02 Compare April 29, 2026 16:09
@briandevans

Copy link
Copy Markdown
Contributor Author

Rebased onto current origin/main (b2820cd20) — the previous base (6fb69229c) was 8 days behind. Clean rebase: gateway/platforms/telegram.py::send_document shifted from line 1775 to 1942, and send_video from 1806 to 1973, but the diff applied in place — no conflicts, the print(...) lines being replaced are unchanged on main.

Re-ran focused tests on rebased head (adc2b026f):

$ uv run --with pytest --with pytest-xdist python3 -m pytest tests/gateway/test_telegram_documents.py -v
=========== 41 passed in 3.87s ===========

The two new regression tests (test_send_document_api_error_is_logged_with_exc_info, test_send_document_logs_do_not_hit_stdout and their send_video siblings) still pass and remain orthogonal to the intervening Telegram changes (no_proxy honoring, MarkdownV2 link parens, mp4 caching, atomic DM topic writes, etc. — none of which touch the post-API-error path).

Still applicable: print(f"[{self.name}] Failed to send document: {e}") and the matching send_video line are still present verbatim on main (gateway/platforms/telegram.py:1945 and 1976).

@briandevans

Copy link
Copy Markdown
Contributor Author

Closing to keep the queue clean — happy to reopen if this is still useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants