Local Directory Video Transcription

total time taken without parallel processing
total time taken with parallel processing

Inspiration

Hackatra aims to bring awareness on AI and other necessary tools to improve productivity and businesses. By indulging in such activities brings the experience required in the IT industry to make use of tools in a better way to get the most out of it.

What it does

traverse through all video files in the given source directory and all its sub-folders
convert all video files to audio files using ffmpeg video encoder
parallel processing of audio files generated to convert them to text
merge all generated text files into one single file
prepare data models to train the chatGPT2 AI engine

How I built it.

I implemented a comprehensive pipeline that efficiently processes video files into accurate text transcriptions. The project begins by leveraging Python for scripting and FFmpeg for video audio extraction, maintaining compatibility and quality.
I used OpenAI’s Whisper, a state-of-the-art automatic speech recognition (ASR) model, for the transcription task, enabling high accuracy in converting speech to text.
Scalability was key, so I utilized parallel processing with Python’s concurrent use of CPUs.
Logging was set up using Python’s logging module to track processing times and errors, ensuring efficient debugging and monitoring

Challenges I ran into

Explore Python language and understand its syntax and other code semantics
Scale to large media files keeping the functionality intact
Understanding new modules that use AI technology to process video files

Accomplishments that I'm proud of

handled end to end in silo
new programming language
scalability in processing large media files

What I learned

Python language
Access and process files in local storage
transcribing video files to text
using chatGPT models to train them with sample data

What's next for Untitled

Scalability Architecture for future models (what can be done)

Implementing on large scale

Using microservices: separate components for the video to audio —> audio to transcripts —> transcripts to merge files —> instruction completion dataset generator
Using Data compression: for file transfers across file storage systems or uploads, the platform should implement auto file decompression w/o significantly affecting the audio quality and compress the video/audio files during file uploads
Using cloud computing: Use Azure or AWS lambda functions to support auto scale while processing large sets of video files.

File Management System: building dedicated file management systems gives the advantage of traversing target files at ease and source resources at optimized ratios

Metadata management: use the metadata from extracted files to find patterns and use the information to improve the data pipelines

Built With

python
vscode
whisper
yaml