Problem
When sending audio files via the Feishu gateway, the voice message is delivered as a file attachment instead of a playable voice bubble with duration displayed.
The root cause is in _send_uploaded_file_message (feishu.py line 3892):
payload = json.dumps({"file_key": file_key})
The Feishu audio message API requires a duration field (in milliseconds):
{"file_key": "xxx", "duration": 7854}
Without it, the client renders the audio as a generic file attachment (green music note icon) rather than a voice bubble with pre-play duration display.
Steps to Reproduce
- Send a voice message from Hermes to a Feishu chat (e.g., via
text_to_speech tool or MEDIA:/path/to/audio.opus)
- The message arrives as a file attachment, not a voice bubble
- Duration is only visible after clicking play
Proposed Fix
In _send_uploaded_file_message, when resolved_message_type == "audio", extract duration from the audio file using ffprobe (already available in the environment) and include it in the payload:
import subprocess, json
def _get_audio_duration_ms(file_path: str) -> int:
"""Extract audio duration in milliseconds using ffprobe."""
try:
result = subprocess.run(
["ffprobe", "-v", "error", "-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1", file_path],
capture_output=True, text=True, timeout=10
)
return int(float(result.stdout.strip()) * 1000)
except Exception:
return 0
Then in the no-caption branch:
if resolved_message_type == "audio":
duration_ms = _get_audio_duration_ms(file_path)
payload = json.dumps({"file_key": file_key, "duration": duration_ms})
else:
payload = json.dumps({"file_key": file_key})
Environment
- Hermes Agent: latest main
- macOS Apple Silicon
- Feishu bot API
- Audio format: opus (also affects mp3, wav, etc.)
Additional Context
The caption branch (line 3876-3887) uses a post message type with media tag, which may handle duration differently — but the no-caption path (the common case for voice messages) definitely needs the duration field.
Problem
When sending audio files via the Feishu gateway, the voice message is delivered as a file attachment instead of a playable voice bubble with duration displayed.
The root cause is in
_send_uploaded_file_message(feishu.py line 3892):The Feishu audio message API requires a
durationfield (in milliseconds):{"file_key": "xxx", "duration": 7854}Without it, the client renders the audio as a generic file attachment (green music note icon) rather than a voice bubble with pre-play duration display.
Steps to Reproduce
text_to_speechtool orMEDIA:/path/to/audio.opus)Proposed Fix
In
_send_uploaded_file_message, whenresolved_message_type == "audio", extract duration from the audio file using ffprobe (already available in the environment) and include it in the payload:Then in the no-caption branch:
Environment
Additional Context
The caption branch (line 3876-3887) uses a
postmessage type withmediatag, which may handle duration differently — but the no-caption path (the common case for voice messages) definitely needs the duration field.