feat(tools): add video analysis via ffmpeg frame extraction + vision LLM#2294
feat(tools): add video analysis via ffmpeg frame extraction + vision LLM#2294Add1ct1ve wants to merge 2 commits into
Conversation
Videos sent on Telegram are now downloaded to ~/.hermes/video_cache/, and the agent receives a context note pointing it to the video_analyze tool. The tool extracts frames with ffmpeg (tiered interval, max 30 frames, scaled to 768px), encodes them as base64, and sends them to the configured vision model for multi-frame analysis. Gated on ffmpeg + ffprobe + vision provider availability — the tool silently hides itself from the agent's toolset when requirements are not met.
|
disclaimer this is fully vibe coded based on a need i had like 4 hours ago |
Add 29 unit tests covering frame interval calculation, ffmpeg detection, async tool flow, handler prompt construction, schema validation, registry integration, and requirements checks. Update tools-reference and toolsets-reference docs to include video_analyze in the vision toolset. Fix pre-existing duplicate vision/web sections in tools-reference.
|
Orb Code Review (powered by GLM-4.7 on Orb Cloud) SummaryThis PR adds a new ArchitectureThe PR follows the existing Hermes architecture well:
IssuesMajor Issues1. Hardcoded Frame Quality May Result in Loss of Important Details Location: def _extract_frames(video_path: str, output_dir: str, interval: float) -> list[str]:
subprocess.run(
[
"ffmpeg", "-i", video_path,
"-vf", f"fps=1/{interval},scale=768:-1",
"-q:v", "2",
pattern,
],
...
)Issue: The frame extraction hardcodes Impact:
Suggested Fix: Make resolution and quality configurable: def _extract_frames(
video_path: str,
output_dir: str,
interval: float,
scale_width: int = 1280, # Default to HD
quality: int = 3, # Default to moderate quality
) -> list[str]:
subprocess.run(
[
"ffmpeg", "-i", video_path,
"-vf", f"fps=1/{interval},scale={scale_width}:-1",
"-q:v", str(quality),
pattern,
],
...
)And update the tool signature: async def video_analyze_tool(
video_path: str,
user_prompt: str,
model: Optional[str] = None,
scale_width: int = 1280, # Added
quality: int = 3, # Added
) -> str:2. Timeout Values May Be Insufficient for Long Videos Location: def _get_video_duration(path: str) -> float:
result = subprocess.run(
["ffprobe", "-v", "error", ...],
timeout=30, # Only 30 seconds
)
def _extract_frames(video_path: str, output_dir: str, interval: float) -> list[str]:
subprocess.run(
["ffmpeg", "-i", video_path, ...],
timeout=120, # Only 120 seconds
)Issue: The timeout values are fixed and may be insufficient:
Impact:
Suggested Fix: Make timeout scale with video duration: def _get_video_duration(path: str) -> float:
# First, get approximate duration quickly (smaller timeout)
result = subprocess.run(
["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "csv=p=0", path],
capture_output=True,
text=True,
timeout=30,
)
duration = float(result.stdout.strip())
# Scale timeout with duration (5x safety margin)
timeout = max(30, int(duration * 5))
return duration
def _extract_frames(video_path: str, output_dir: str, interval: float, estimated_duration: float) -> list[str]:
# Scale timeout with estimated duration and frame count
num_frames = int(estimated_duration / interval)
# Allow 5 seconds per frame plus 60 second overhead
timeout = max(120, num_frames * 5 + 60)
subprocess.run([...], timeout=timeout)And update the call to # In video_analyze_tool()
frames = await asyncio.to_thread(
_extract_frames, video_path, tmp_dir, interval, duration, # Pass duration
)3. Video Download Limit Is Hardcoded Location: # Telegram Bot API limit: 20 MB for file downloads
MAX_VIDEO_BYTES = 20 * 1024 * 1024
if msg.video.file_size and msg.video.file_size > MAX_VIDEO_BYTES:
...Issue: The 20MB video limit is hardcoded in the Telegram adapter. This may be:
Impact:
Suggested Fix: Make the limit configurable: # In gateway/platforms/base.py
VIDEO_MAX_BYTES = int(os.getenv("VIDEO_MAX_MB", "20")) * 1024 * 1024
# In telegram.py
if msg.video.file_size and msg.video.file_size > VIDEO_MAX_BYTES:Or document it as a platform-specific consideration: # Telegram Bot API limit: 20 MB for file downloads
# Note: This is a Telegram-specific limitation. Other platforms may have
# different limits. Consider adjusting this value based on your deployment.
MAX_VIDEO_BYTES = 20 * 1024 * 10244. Frame Cap May Be Too Restrictive for Complex Videos Location: def _calculate_frame_interval(duration: float) -> float:
MAX_FRAMES = 30 # Hard cap
if duration < 60:
interval = 1.0
elif duration < 300:
interval = 5.0
else:
interval = 10.0
# Enforce max-frames cap
estimated_frames = duration / interval
if estimated_frames > MAX_FRAMES:
interval = duration / MAX_FRAMESIssue: The 30-frame cap means:
This may be insufficient for:
Impact:
Suggested Fix: Make the cap configurable or increase it significantly: VIDEO_MAX_FRAMES = int(os.getenv("VIDEO_MAX_FRAMES", "100")) # Default to 100, allow override
def _calculate_frame_interval(duration: float, max_frames: int = VIDEO_MAX_FRAMES) -> float:
# ... existing logic ...
# Use configured max instead of hardcoded 30
estimated_frames = duration / interval
if estimated_frames > max_frames:
interval = duration / max_frames
return intervalOr provide multiple quality tiers: def _calculate_frame_interval(duration: float, quality: str = "medium") -> float:
# quality: "low", "medium", "high", "max"
quality_settings = {
"low": {"max_frames": 30, "interval_multiplier": 1.0},
"medium": {"max_frames": 60, "interval_multiplier": 5.0},
"high": {"max_frames": 120, "interval_multiplier": 10.0},
"max": {"max_frames": 300, "interval_multiplier": 30.0},
}
settings = quality_settings[quality]
# ... calculate interval using settings ...Warnings5. Frame Extraction Pattern Matching May Miss Files Location: frames = sorted(
str(p) for p in Path(output_dir).glob("frame_*.jpg")
)
return framesIssue: The glob pattern Impact:
Suggested Fix: Use a more robust pattern or check the actual ffmpeg output: # Option 1: More robust glob pattern
frames = sorted(
str(p) for p in Path(output_dir).glob("frame_[0-9]*.jpg")
)
# Option 2: Parse ffmpeg output to get exact frame paths
import re
def _extract_frames(video_path: str, output_dir: str, interval: float) -> list[str]:
result = subprocess.run(
[...],
capture_output=True,
text=True,
check=True,
)
# Parse ffmpeg output to extract frame filenames
frame_files = []
for line in result.stderr.split('\n'):
if 'frame_' in line:
# Extract filename using regex
match = re.search(r'(frame_\d+\.jpg)', line)
if match:
frame_files.append(output_dir + '/' + match.group(0))
return sorted(frame_files)6. Error Messages Could Be More Specific for Debugging Location: if not frames:
return json.dumps({
"success": False,
"error": "No frames extracted from video",
"analysis": "ffmpeg did not produce any frames from the video.",
})Issue: The error message "No frames extracted from video" is generic and doesn't help with debugging. It doesn't indicate:
Impact:
Suggested Fix: Provide more detailed error information: if not frames:
return json.dumps({
"success": False,
"error": (
"No frames extracted from video. "
"This could be due to: (1) ffmpeg execution error, "
"(2) invalid video format, (3) timeout, or "
"(4) insufficient disk space. Check logs for details."
),
"ffmpeg_exit_code": exit_code, # Add if available
"ffmpeg_stderr": stderr_output, # Add if available
"analysis": None,
})Or capture and return ffmpeg's actual error: def _extract_frames(...):
result = subprocess.run(..., capture_output=True, text=True, check=True)
if result.returncode != 0:
raise FrameExtractionError(
f"ffmpeg failed with exit code {result.returncode}: {result.stderr}"
)
return sorted(Path(output_dir).glob("frame_*.jpg"))7. Test Coverage for Error Paths Could Be Expanded Location: The tests are comprehensive but could benefit from:
Impact:
Suggested Fix: Add tests for edge cases: class TestVideoAnalyzeTool:
# ... existing tests ...
@pytest.mark.asyncio
@patch("tools.video_analysis_tool._has_ffmpeg", return_value=True)
@patch("tools.video_analysis_tool._has_ffprobe", return_value=True)
async def test_corrupt_video(self, _probe, _ff, tmp_path):
"""Test handling of corrupt video files."""
# Create a file with invalid video data
corrupt_video = tmp_path / "corrupt.mp4"
corrupt_video.write_bytes(b"NOT_A_VIDEO_FILE")
result = json.loads(await video_analyze_tool(str(corrupt_video), "describe"))
assert result["success"] is False
assert "invalid" in result["error"].lower()
@pytest.mark.asyncio
async def test_zero_length_video(self, _probe, _ff, tmp_path):
"""Test handling of zero-length videos."""
zero_video = tmp_path / "zero.mp4"
zero_video.write_bytes(b"")
result = json.loads(await video_analyze_tool(str(zero_video), "describe"))
assert result["success"] is False
assert "empty" in result["error"].lower() or "zero" in result["error"].lower()
@pytest.mark.asyncio
@patch("tools.video_analysis_tool._get_video_duration", return_value=60.0)
async def test_exact_boundary_60s(self, _probe, _ff, _dur, tmp_path):
"""Test video at exact 60s boundary."""
video_path = _make_video(tmp_path)
# Should use medium tier (5s interval)
result = json.loads(await video_analyze_tool(video_path, "describe"))
assert result["success"] is True
assert result["duration"] == 60.0
# 60/5 = 12 framesCross-file ImpactPublic API ChangesModified:
Added:
Callers AffectedDirect Callers:
Indirect Callers:
DependenciesModified:
No External Dependencies Added:
Test CoverageAdded:
Missing:
DependenciesNo New External Dependencies: Uses system Modified:
User ImpactPositive:
Negative:
Assessment✅ Approve with suggestions This is a well-implemented feature that adds valuable video analysis capabilities to Hermes. The code follows existing architectural patterns, includes comprehensive tests, and is properly integrated with the Telegram platform. The video caching infrastructure and automated cleanup are particularly good additions. However, there are several opportunities for improvement that would make the feature more robust and flexible: Improvements to consider:
These are suggestions, not blockers. The current implementation is functional and well-tested. The feature is ready to merge, and the suggested improvements can be addressed in follow-up PRs. Recommended Path Forward:
This is a solid addition to Hermes that extends its capabilities in a thoughtful and well-tested manner. |
|
Superceded by @alt-glitch |
What does this PR do?
Adds end-to-end video analysis support for the Hermes agent. When a user sends a video on Telegram, it is downloaded, cached locally, and the agent is given a context note with the file path and instructions to use the new
video_analyzetool. The tool extracts frames using ffmpeg, encodes them as base64, and sends them to the configured vision LLM for multi-frame analysis.This follows the same pattern as the existing image analysis pipeline (download → cache → enrich → tool), but intentionally skips auto-analysis since processing video frames is more expensive than a single image.
Type of Change
Changes Made
gateway/platforms/base.py— AddedVIDEO_CACHE_DIR,get_video_cache_dir(),cache_video_from_bytes(),cleanup_video_cache()following the existing image/audio/document cache patterngateway/platforms/telegram.py— Addedelif msg.video:download block with 20MB Telegram Bot API size check, MIME-to-extension mapping, and cache writetools/video_analysis_tool.py(new) — Frame extraction via ffmpeg (fps=1/<interval>, scaled to 768px, JPEG quality 2), tiered interval strategy (<1min: 1s, 1-5min: 5s, 5min+: 10s, hard cap 30 frames), async vision LLM call, tool registrationmodel_tools.py— Added"tools.video_analysis_tool"to_discover_tools()toolsets.py— Added"video_analyze"to_HERMES_CORE_TOOLS,"vision"toolset, and"hermes-acp"presetgateway/run.py— Added video context enrichment block (tells agent a video is available + how to analyze it), addedcleanup_video_cache()to hourly cron tickertests/test_video_analysis_tool.py(new) — 29 unit tests covering frame interval calculation, ffmpeg detection, async tool flow, handler/schema/registry integration, and requirements checkswebsite/docs/reference/tools-reference.md— Addedvideo_analyzeto vision toolset documentation, fixed pre-existing duplicate sectionswebsite/docs/reference/toolsets-reference.md— Addedvideo_analyzeto vision, hermes-acp, and hermes-cli toolset listingsHow to Test
winget install ffmpeg/brew install ffmpeg/apt install ffmpeg)~/.hermes/video_cache/video_analyzeif the user's caption asks about the videoEdge cases:
video_analyzetool should not appear in the agent's toolset (check_fnreturns False)Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/A