YT Highlight Extractor

Inspiration

What inspired me to create this project is when we noticed the Mediastream challenge. We wanted to branch out our knowledge in python and incorporate artificial intelligence to accomplish cool things.
Our idea was to capture media highlights from YouTube videos and collect them in a way that makes it convenient to share across social media or anywhere.

This python tool takes a YouTube video URL from a user and downloads the video in MP4 format, uses an open source API to fetch the most replayed section of the video, strips that section out of the MP4 and saves a copy, then converts the MP4 to an audio (WAV) file, uses a speech recognition python module to transcribe the audio to text and saves it to a file.
Finally, it optionally bundles all the files up into a folder with the option of creating a zip archive for readiness and portability.

We were originally looking into various python modules that could be used with YouTube specifically, and came across pytube. This provided us the necessary functions to download the video from our URL input.
We then started researching ways to fetch the most replayed sections of the video, with no luck from Google/YouTube's official API's, so I luckily came across an open source API called YouTube operational API that allowed me to find and parse that data. I needed a way to strip out the significant starting and ending time from the video which a module called moviepy came in handy.
Used CMU Sphinx which is an open source speech recognition module that uses an acoustic model trainer to model the relationship between audio signals and phonetic units.

Networking issues. The slow nature of the networks here at the hackathon gave us troubles when trying to download a video, such as the connection stream stopping prematurely, or just not connecting at all.
Our workarounds consisted of using a personal hotspot, VPN, or cloud servers which have more stable connections.
Another prevalent issue was error handling. We ended up having to add multiple retry attempts and sleep delays with the API as well as downloading the video.

Some accomplishments include incorporating third party modules and getting the most use out of the functions they provides us.
Error debugging and resolving issues is the biggest accomplishment because of the rewarding feeling of fixing something. Implementing and witnessing how amazing the power of speech recognition has developed over the years has been another accomplishment.
Finally, we implemented a module called Tkinter for turning the app into a GUI.

The main thing we learned is that network performance is more important than most people realize. Some connections that are too unreliable can have drastic impacts on performance of an application and its users, and is often beyond our control.
Overall, developing good habits of continuous error handling and coding practices has been a learning experience for making working code.

In the future, this tool has many possibilities of expansion including: using deeper machine learning tools to interpret the video content to find more accurate standout moments in a YouTube video.