Project designed to improve accessibility by translating sports commentary into synchronized sign-language animations, enabling the Deaf and Hard-of-Hearing community to enjoy sports commentary filled with intensity and emotions for the sports.
- Features
- Why and How Modus is Used
- Why and How Dgraph is Used
- Technologies Used
- Prerequisites
- Installation
- Directory Structure
- API Endpoints
- Video Overlay Program
- Known Issues
- Future Improvements
- Contributing
- License
- Video Upload: Support for sports commentary video uploads (.mp4)
- Audio Extraction: Automated audio extraction using FFmpeg
- Speech-to-Text Transcription: Integration with Google Cloud Speech-to-Text
- AI-Based Normalization: Utilizes Modus AI for context-aware transcription normalization
- Metadata Retrieval: Dgraph-powered knowledge graph for animation and metadata storage
- Subtitle Track Generation: JSON-based subtitle tracks with timing and animation data
- Video Overlay: C++ library I built for putting sign language overlays on videos along with effects. Link to the repo
- Processed Video Delivery: Final video output with synchronized animations
- Modus integrates pre-trained large language models (e.g., LLaMA) to handle complex transcription normalization tasks.
- It interprets context, synonyms, and domain-specific terms dynamically, ensuring precise and versatile normalization across various sports.
-
Normalization:
- Raw transcription text is sent to Modus via a GraphQL query.
- Modus processes the text, providing normalized terms formatted as
[term1, term2, ...].
-
Integration in FastAPI:
- Modus is hosted locally (
http://localhost:8686/graphql) and queried from the FastAPI backend for real-time inference.
- Modus is hosted locally (
-
Use Cases:
- Standardizing commentary phrases (e.g., "He scores!" → "goal").
- Extracting key terms for querying metadata in Dgraph.
- Dgraph serves as the knowledge graph backend, efficiently storing relationships between terms, animations, synonyms, and intensity levels.
- Its GraphQL-based querying ensures low-latency retrieval of metadata for overlay generation.
-
Schema Definition:
- The schema includes entities like terms, synonyms, intensity levels, and animations.
-
Metadata Storage:
- Terms (e.g., "goal") are linked to animations (e.g.,
goal_loud.mp4), synonyms, and intensity mappings.
- Terms (e.g., "goal") are linked to animations (e.g.,
-
Query Integration:
- Queries are made to Dgraph via the FastAPI backend for animations and metadata corresponding to normalized terms.
-
Use Cases:
- Retrieve animations (e.g., "goal_standard.mp4") for specific terms.
- Provide additional metadata like intensity for customization.
| Component | Technology |
|---|---|
| Backend | FastAPI |
| Audio/Video Processing | FFmpeg, C++ (video_overlay) |
| AI Normalization | Modus (LLaMA-based NLP) |
| Knowledge Graph | Dgraph |
| Transcription Service | Google Cloud Speech-to-Text |
| Deployment | Docker, Google Cloud Platform (GCP) |
- Python 3.8+
- FFmpeg installed and added to system path
- C++ compiler
- Google Cloud credentials
- Running instances of Modus and Dgraph
- Clone the repository:
git clone https://github.com/your-repo/sign-language-subtitles.git
cd sign-language-subtitles- Install dependencies:
pip install -r requirements.txt-
Set up Modus:
- Install and run Modus locally (http://localhost:8686/graphql)
-
Set up Dgraph:
- Define schema and populate initial data
-
Build the C++ overlay program:
- Ensure video_overlay is compiled in Sign-Language-Subtitles/build/
-
Start the FastAPI application:
uvicorn main:app --reloadproject-root/
├── main.py # FastAPI backend code
├── uploads/ # Temporary storage for files
├── Sign-Language-Subtitles/ # C++ video overlay program
│ ├── build/ # Compiled binaries
│ │ ├── video_overlay # C++ program for overlay
│ ├── images/ # Static images for overlay
├── app/
│ ├── service-account-key.json # Google Cloud credentials
├── requirements.txt # Python dependencies
| Endpoint | Method | Description |
|---|---|---|
| /transcribe-audio | POST | Upload video, process, and return with overlays |
| /query-phrase | POST | Query metadata and animations for specific terms |
curl -X POST "http://localhost:8000/transcribe-audio" \
-H "Content-Type: multipart/form-data" \
-F "file=@input_video.mp4" --output processed_video.mp4The C++ video_overlay library accepts:
- Input video
- Subtitle track JSON
- Static images directory
- Output path
Usage:
./video_overlay input_video.mp4 subtitle_track.json images output_video.mp4- High Latency: Speech-to-Text transcription can be slow for longer videos
- Missing Dgraph Data: Ensure complete term coverage in Dgraph schema
- File Cleanup: Implement scheduled cleanup for temporary files
- Real-time processing for live commentary streams
- React-based frontend interface
- Fine-tuned AI models for sports-specific terminology
- Fork the repository
- Create a feature branch
- Submit a pull request with detailed explanation
This project is licensed under the MIT License. See LICENSE for details.

