Inspiration
Hackatra aims to bring awareness on AI and other necessary tools to improve productivity and businesses. By indulging in such activities brings the experience required in the IT industry to make use of tools in a better way to get the most out of it.
What it does
- traverse through all video files in the given source directory and all its sub-folders
- convert all video files to audio files using ffmpeg video encoder
- parallel processing of audio files generated to convert them to text
- merge all generated text files into one single file
- prepare data models to train the chatGPT2 AI engine
How I built it.
- I implemented a comprehensive pipeline that efficiently processes video files into accurate text transcriptions. The project begins by leveraging Python for scripting and FFmpeg for video audio extraction, maintaining compatibility and quality.
- I used OpenAI’s Whisper, a state-of-the-art automatic speech recognition (ASR) model, for the transcription task, enabling high accuracy in converting speech to text.
- Scalability was key, so I utilized parallel processing with Python’s concurrent use of CPUs.
- Logging was set up using Python’s logging module to track processing times and errors, ensuring efficient debugging and monitoring
Challenges I ran into
- Explore Python language and understand its syntax and other code semantics
- Scale to large media files keeping the functionality intact
- Understanding new modules that use AI technology to process video files
Accomplishments that I'm proud of
- handled end to end in silo
- new programming language
- scalability in processing large media files
What I learned
- Python language
- Access and process files in local storage
- transcribing video files to text
- using chatGPT models to train them with sample data
What's next for Untitled
Scalability Architecture for future models (what can be done)
Implementing on large scale
- Using microservices: separate components for the video to audio —> audio to transcripts —> transcripts to merge files —> instruction completion dataset generator
- Using Data compression: for file transfers across file storage systems or uploads, the platform should implement auto file decompression w/o significantly affecting the audio quality and compress the video/audio files during file uploads
- Using cloud computing: Use Azure or AWS lambda functions to support auto scale while processing large sets of video files.
File Management System: building dedicated file management systems gives the advantage of traversing target files at ease and source resources at optimized ratios
Metadata management: use the metadata from extracted files to find patterns and use the information to improve the data pipelines
Built With
- python
- vscode
- whisper
- yaml