Inspiration

Time is a very valuable resource, and although video has become more and more prominent it is still a very time-consuming task and hard to skim. We believed AI could be a very useful resource to determine what parts of a video you should be spending your time on. Two main scenarios stuck out to us as we started to form this idea: condensing class lecture videos and extracting highlights of a sports game. Despite sounding very different, they both are functionally very similar. There is a main speaker (the teacher and announcer respectively) who provides context and meaning to the scene in the video. We decided to focus on sports since sports announcing has a more consistent style than lecturing.

What it does

The user uploads a video and our API generates shorter highlight clips.from the video based on the audio contained in the video.

How I built it

Video Highlights ML is built as a Django app running tenserflow. The web app and API is hosted on a Google Compute Engine. We use Houndify and Google Natural Language Processing to generate transcripts of the video and to determine meaning from it. We used real football games and highlight clips as training data.

Challenges I ran into

We went through a number of iterations trying to determine the most effective attributes from the video to analyze for the neural net before we settled on transcriptions, and even then we had to determine what information to pull out of the transcription.

Accomplishments that I'm proud of

We were able to successfully implement our neural net which as data continues to be added could become significantly more accurate, and is potentially extensible for the future. This could provide us and other developer a solid base to do more in depth work off of. In addition, we hope to improve our data in fields beyond sports by learning more about the meaning and contexts of speech.

What I learned

Neither of us had previously had experience in Natural Language Processing. This allowed us to apply our general AI knowledge to NLP and learn about techniques and tools specific to language such as the Google Natural Language API.

What's next for Video Highlights ML

  • Tune hyper parameters for the neural net.
  • Experiment with different architectures.
  • Gather much more data. Implement a two step neural net isolating announcer audio first.
  • Incorporate the video from the game as well.
  • We will also experiment on running the model directly on the audio as opposed to on transcripts.
Share this project:

Updates