Skip to content

fix(feishu): add delivery observability and root_id fallback#16131

Closed
highland0971 wants to merge 1 commit into
NousResearch:mainfrom
highland0971:fix/feishu-silent-message-drop
Closed

fix(feishu): add delivery observability and root_id fallback#16131
highland0971 wants to merge 1 commit into
NousResearch:mainfrom
highland0971:fix/feishu-silent-message-drop

Conversation

@highland0971

Copy link
Copy Markdown

Problem

Messages sent via the Feishu adapter could be silently dropped with zero observability — no error log, no warning, no trace. This made it impossible to diagnose why a message never reached the user.

Root Cause

Two gaps in the delivery chain combined to create a black hole:

  1. stream_consumer.py _send_new_chunk(): When adapter.send() returns success=False, the method silently sets _edit_supported = False and returns reply_to_idno log, no retry, no alert. Any failed send is completely invisible.

  2. feishu.py send() and _finalize_send_result(): Neither the send attempt nor its result were ever logged. A failed API call (e.g. invalid post payload, rate limit, permission error) would return SendResult(success=False) which the stream consumer then silently swallowed.

  3. feishu.py thread_id resolution: Some Feishu message events provide root_id but not thread_id. Without the root_id fallback, topic replies could fail to route correctly — another silent failure path.

Fix

1. stream_consumer.py — Log send failures

# Before: silent swallow
else:
    self._edit_supported = False
    return reply_to_id

# After: log with full context
else:
    self._edit_supported = False
    logger.warning(
        "[StreamConsumer] Send failed (success=%s, msg_id=%s, error=%s) "
        "chat=%s reply_to=%s text_len=%d",
        result.success, result.message_id, getattr(result, "error", None),
        self.chat_id, reply_to_id, len(text),
    )
    return reply_to_id

2. feishu.py — Add delivery logging to send()

Each chunk send now logs: chat_id, reply_to, chunk index, msg_type, payload_len, content_len.

3. feishu.py — Add result logging to _finalize_send_result()

  • Success: logger.info("[Feishu] send succeeded: msg_id=%s", msg_id)
  • Failure: logger.warning("[Feishu] send failed: code=%s msg=%s", code, msg)

4. feishu.py — Add root_id fallback for thread_id

# Before
thread_id=getattr(message, "thread_id", None) or None,

# After  
thread_id=getattr(message, "thread_id", None) or getattr(message, "root_id", None) or None,

Impact

  • Zero behavioral changes — all fixes are logging-only (patches 1-3) or a defensive fallback (patch 4)
  • Any future silent drop will now produce at least two log entries: one from the feishu adapter and one from the stream consumer
  • The root_id fallback fixes a known issue where Feishu topic replies could fail to route when thread_id is not present in the message event

Testing

  • Verified syntax with py_compile.compile() for both files
  • Tested Feishu API directly with markdown table payloads (create, reply, reply_in_thread) — all return code:0
  • Confirmed the fix branch applies cleanly to main

@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery platform/feishu Feishu / Lark adapter P2 Medium — degraded but workaround exists labels Apr 26, 2026
Three fixes for silent message drop in Feishu adapter:

1. stream_consumer: log send failures instead of silently swallowing them
   - When adapter.send() returns success=False, we now log a warning
     with chat_id, reply_to, error details, and content length
   - Previously these failures were invisible — no log, no retry, no alert

2. feishu.py: add root_id fallback for thread_id resolution
   - Some Feishu message events provide root_id but not thread_id
   - Without this fallback, topic replies could fail to route correctly

3. feishu.py: add delivery logging to send() and _finalize_send_result()
   - send() now logs each chunk with chat_id, reply_to, msg_type, sizes
   - _finalize_send_result() logs success (with msg_id) or failure
     (with API code and error message)
   - This makes silent drops diagnosable from gateway logs

Root cause: messages could be dropped at multiple points in the
delivery chain with zero observability. The stream consumer's
_send_new_chunk silently returned on failure, and the feishu adapter
never logged successful or failed sends. Combined, a dropped message
left absolutely no trace in any log file.
@teknium1

teknium1 commented May 4, 2026

Copy link
Copy Markdown
Contributor

The root_id fallback portion of your PR was independently salvaged via #19711 (@julysir's parallel fix #16620). The logging-observability additions in your PR are useful but would land cleaner as a dedicated observability PR separate from the bug fix — the feishu send-path is noisy enough that we'd want to pick log levels deliberately. Closing as duplicate on the bug-fix side; thanks for flagging the silent-drop problem!

@teknium1 teknium1 closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/feishu Feishu / Lark adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants