Inspiration

Hackatra aims to bring awareness on AI and other necessary tools to improve productivity and businesses. By indulging in such activities brings the experience required in the IT industry to make use of tools in a better way to get the most out of it.

What it does

  1. traverse through all video files in the given source directory and all its sub-folders
  2. convert all video files to audio files using ffmpeg video encoder
  3. parallel processing of audio files generated to convert them to text
  4. merge all generated text files into one single file
  5. prepare data models to train the chatGPT2 AI engine

How I built it.

  1. I implemented a comprehensive pipeline that efficiently processes video files into accurate text transcriptions. The project begins by leveraging Python for scripting and FFmpeg for video audio extraction, maintaining compatibility and quality.
  2. I used OpenAI’s Whisper, a state-of-the-art automatic speech recognition (ASR) model, for the transcription task, enabling high accuracy in converting speech to text.
  3. Scalability was key, so I utilized parallel processing with Python’s concurrent use of CPUs.
  4. Logging was set up using Python’s logging module to track processing times and errors, ensuring efficient debugging and monitoring

Challenges I ran into

  1. Explore Python language and understand its syntax and other code semantics
  2. Scale to large media files keeping the functionality intact
  3. Understanding new modules that use AI technology to process video files

Accomplishments that I'm proud of

  1. handled end to end in silo
  2. new programming language
  3. scalability in processing large media files

What I learned

  1. Python language
  2. Access and process files in local storage
  3. transcribing video files to text
  4. using chatGPT models to train them with sample data

What's next for Untitled

Scalability Architecture for future models (what can be done)

Implementing on large scale

  1. Using microservices: separate components for the video to audio —> audio to transcripts —> transcripts to merge files —> instruction completion dataset generator
  2. Using Data compression: for file transfers across file storage systems or uploads, the platform should implement auto file decompression w/o significantly affecting the audio quality and compress the video/audio files during file uploads
  3. Using cloud computing: Use Azure or AWS lambda functions to support auto scale while processing large sets of video files.

File Management System: building dedicated file management systems gives the advantage of traversing target files at ease and source resources at optimized ratios

Metadata management: use the metadata from extracted files to find patterns and use the information to improve the data pipelines

Built With

Share this project:

Updates