Skip to content

feat: Include audio duration in Feishu voice message payload #16524

@111-test-111

Description

@111-test-111

Problem

When sending audio files via the Feishu gateway, the voice message is delivered as a file attachment instead of a playable voice bubble with duration displayed.

The root cause is in _send_uploaded_file_message (feishu.py line 3892):

payload = json.dumps({"file_key": file_key})

The Feishu audio message API requires a duration field (in milliseconds):

{"file_key": "xxx", "duration": 7854}

Without it, the client renders the audio as a generic file attachment (green music note icon) rather than a voice bubble with pre-play duration display.

Steps to Reproduce

  1. Send a voice message from Hermes to a Feishu chat (e.g., via text_to_speech tool or MEDIA:/path/to/audio.opus)
  2. The message arrives as a file attachment, not a voice bubble
  3. Duration is only visible after clicking play

Proposed Fix

In _send_uploaded_file_message, when resolved_message_type == "audio", extract duration from the audio file using ffprobe (already available in the environment) and include it in the payload:

import subprocess, json

def _get_audio_duration_ms(file_path: str) -> int:
    """Extract audio duration in milliseconds using ffprobe."""
    try:
        result = subprocess.run(
            ["ffprobe", "-v", "error", "-show_entries", "format=duration",
             "-of", "default=noprint_wrappers=1:nokey=1", file_path],
            capture_output=True, text=True, timeout=10
        )
        return int(float(result.stdout.strip()) * 1000)
    except Exception:
        return 0

Then in the no-caption branch:

if resolved_message_type == "audio":
    duration_ms = _get_audio_duration_ms(file_path)
    payload = json.dumps({"file_key": file_key, "duration": duration_ms})
else:
    payload = json.dumps({"file_key": file_key})

Environment

  • Hermes Agent: latest main
  • macOS Apple Silicon
  • Feishu bot API
  • Audio format: opus (also affects mp3, wav, etc.)

Additional Context

The caption branch (line 3876-3887) uses a post message type with media tag, which may handle duration differently — but the no-caption path (the common case for voice messages) definitely needs the duration field.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliveryplatform/feishuFeishu / Lark adaptertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions