Skip to content

πŸ† Winner of Meta Llama WebAI Track Hackathon - A Llama-powered personal memory system that processes Meta Ray-Ban glasses footage to help you recall conversations, insights, and daily moments through intelligent video analysis

Notifications You must be signed in to change notification settings

yizucodes/memory

Β 
Β 

Repository files navigation

🎬 CLI Video Analysis System with Llama 4

A command-line multimodal AI video analysis tool that extracts frames, transcribes audio, and performs intelligent analysis using Llama 4 models. Features a two-step workflow: analyze videos then chat interactively about the results.

πŸš€ Features

  • πŸ–ΌοΈ Frame Extraction: Extract video frames at custom intervals using OpenCV
  • 🎡 Audio Transcription: Transcribe video audio using OpenAI Whisper
  • πŸ€– Multimodal AI Analysis: Analyze both visual and audio content with Llama 4
  • πŸ’¬ Interactive Chat: Natural language querying of analysis results
  • πŸ“Š Multiple Analysis Modes: Comprehensive, overview, frames-only, or transcript-only
  • πŸ”’ Secure API Management: Environment-based API key configuration
  • πŸ“ Dual Output: Human-readable text + machine-readable JSON results
  • ⚑ CLI Interface: Simple command-line tools with flexible options

πŸ› οΈ Installation

Prerequisites

  1. Python 3.8+
  2. FFmpeg (required for Whisper audio processing):
    # macOS
    brew install ffmpeg
    
    # Ubuntu/Debian
    sudo apt install ffmpeg
    
    # Windows: Download from https://ffmpeg.org/

Install Dependencies

pip install -r requirements.txt

βš™οΈ Setup

1. Configure API Key

Create your environment file:

cp .env.example .env

Edit .env and add your Llama API key:

LLAMA4_API_KEY=your_api_key_here

2. Test API Connection

python -c "
from openai import OpenAI
import os
from dotenv import load_dotenv
load_dotenv()

client = OpenAI(
    api_key=os.getenv('LLAMA4_API_KEY'),
    base_url='https://api.llama.com/compat/v1/'
)

response = client.chat.completions.create(
    model='Llama-4-Maverick-17B-128E-Instruct-FP8',
    messages=[{'role': 'user', 'content': 'Hello!'}]
)
print('βœ… API connection successful!')
"

πŸš€ Quick Demo (No Setup Required)

Try the interactive chat with real example data:

Instant Demo - No Video Processing Needed

# Test the interactive chat interface immediately
python interactive_video_chat.py examples/videoNetworking_llama_analysis.json

This uses a pre-processed networking conversation analysis, so you can:

  • βœ… Test the chat interface without API setup
  • βœ… See sample questions and responses
  • βœ… Understand output format before processing your own videos
  • βœ… Demo the system to others instantly

Sample Chat Session

$ python interactive_video_chat.py examples/videoNetworking_llama_analysis.json

🎬 Video Analysis Chat - Ask me anything about the video!
Commands: 'quit', 'exit', 'clear', 'context', 'help'
============================================================

πŸ’¬ You: What were the main topics discussed?
πŸ€– Llama: [Response based on the networking conversation analysis...]

πŸ’¬ You: What networking advice would you give?
πŸ€– Llama: [Insights about the conversation effectiveness...]

πŸ’¬ You: help
πŸ“š Available Commands:
- quit/exit/q: End the chat
- clear: Clear conversation history  
- context: Show video details
- help: Show this help

πŸ’‘ Example Questions:
- "What were the main topics discussed?"
- "How did the participants' body language change?"
- "What networking advice would you give?"
- "Summarize the key insights"

Example Files Included

  • examples/videoNetworking_llama_analysis.json - Complete analysis data for chat interface
  • examples/videoNetworking_llama_analysis.txt - Human-readable analysis results
  • examples/videoNetworking_transcript.txt - Raw transcript for reference

Try These Example Questions

# Start the demo
python interactive_video_chat.py examples/videoNetworking_llama_analysis.json

# Try asking:
"What were the main topics discussed?"
"How effective was this networking conversation?"
"What follow-up actions were mentioned?"
"What could have been improved?"
"Summarize the key insights from this conversation"

🎯 Command Line Usage

Basic Commands

Quick transcript analysis:

python llama_video_analyzer.py data/your_video.MOV --mode transcript_only

Visual frame analysis:

python llama_video_analyzer.py data/your_video.MOV --mode frames_only

Complete multimodal analysis:

python llama_video_analyzer.py data/your_video.MOV --mode comprehensive

Fast overview (recommended for demos):

python llama_video_analyzer.py data/your_video.MOV --mode overview

Command Line Options

python llama_video_analyzer.py <video_file> [options]

Required:
  video_file             Path to video file (MP4, MOV, AVI, etc.)

Options:
  --interval SECONDS     Frame extraction interval (default: 20)
  --whisper MODEL        Whisper model: tiny,base,small,medium,large (default: base)
  --mode MODE            Analysis mode: comprehensive,frames_only,transcript_only,overview
  --output FILE          Output file prefix

Examples:
  # High-quality analysis
  python llama_video_analyzer.py meeting.MOV --interval 10 --whisper medium
  
  # Quick demo mode
  python llama_video_analyzer.py presentation.MP4 --mode overview --interval 30
  
  # Custom output filename
  python llama_video_analyzer.py interview.MOV --output job_interview_analysis
  
  # Transcript only for fast text analysis
  python llama_video_analyzer.py call.MOV --mode transcript_only --whisper large

πŸ“Š Analysis Modes

Mode Speed API Calls Use Case
transcript_only ⚑ Fast 1 Text analysis, quick insights
overview πŸš€ Medium 1 Demo-ready multimodal analysis
frames_only ⏱️ Medium N frames Visual-focused analysis
comprehensive πŸ” Detailed N+2 calls Complete research analysis

πŸ“ Output Files

Each analysis generates two files:

  • filename_llama_analysis.txt - Human-readable results
  • filename_llama_analysis.json - Machine-readable data

Sample CLI Workflow

# 1. Analyze networking video
python llama_video_analyzer.py data/networking_call.MOV --mode comprehensive

# 2. View results
cat networking_call_llama_analysis.txt

# 3. Process JSON data
python -c "import json; data=json.load(open('networking_call_llama_analysis.json')); print(f'Frames: {data[\"frames_extracted\"]}, Transcript: {data[\"transcript_length\"]} chars')"

Sample Output Structure

{
  "video_path": "data/networking_video.MOV",
  "frames_extracted": 5,
  "transcript_length": 2196,
  "analysis": {
    "individual_frames": [...],
    "comprehensive": "...",
    "transcript_only": "..."
  }
}

πŸ—οΈ Architecture

CLI Command β†’ Video Input β†’ [Frame Extractor] β†’ Base64 Images
                          ↓
                       [Whisper] β†’ Transcript
                          ↓
                       [Llama 4] β†’ Analysis
                          ↓
                      [Output] β†’ .txt + .json files

πŸ”§ Technical Details

Frame Processing

  • Resolution: Auto-resize 1920x1080 β†’ 1280x720
  • Format: JPEG with base64 encoding
  • Timestamps: Precise frame timing metadata
  • Intervals: Configurable extraction frequency

Audio Processing

  • Engine: OpenAI Whisper (local processing)
  • Models: tiny, base, small, medium, large
  • Formats: Supports all major video formats
  • Quality: Automatic audio extraction via FFmpeg

AI Analysis

  • Context: Full transcript provided to each frame analysis
  • Focus Areas: Networking, meetings, professional communication
  • Output: Structured insights on dynamics, body language, effectiveness

🚨 Troubleshooting

Common CLI Issues

401 Authentication Error:

# Check API key is loaded
python -c "import os; from dotenv import load_dotenv; load_dotenv(); print('Key loaded:', bool(os.getenv('LLAMA4_API_KEY')))"

500 Inference Error (too many frames):

# Use fewer frames
python llama_video_analyzer.py video.MOV --mode overview --interval 60

FFmpeg Not Found:

# Install FFmpeg first
brew install ffmpeg  # macOS
sudo apt install ffmpeg  # Linux

File Not Found:

# Check video file path
ls -la data/your_video.MOV

Permission Issues:

# Make script executable
chmod +x llama_video_analyzer.py

πŸ“ Example: CLI Analysis Workflow

# Step 1: Quick transcript check
python llama_video_analyzer.py data/meeting.MOV --mode transcript_only

# Step 2: If transcript looks good, run full analysis
python llama_video_analyzer.py data/meeting.MOV --mode comprehensive --whisper medium

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI Whisper for speech-to-text capabilities
  • Llama 4 for multimodal AI analysis
  • OpenCV for video frame processing

Pure CLI power for video analysis! πŸš€

About

πŸ† Winner of Meta Llama WebAI Track Hackathon - A Llama-powered personal memory system that processes Meta Ray-Ban glasses footage to help you recall conversations, insights, and daily moments through intelligent video analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages