Inspiration

We're always watching YouTube videos to supplement our learning. We can very often find high quality educational content. The only problem - Our time!

We have such limited time that revisiting a 2 hour video to find a certain important information is not efficient especially during crunch time. Moreover, the video is soo long that we must write our own summary/pointers to grasp the knowledge better. This is also time-consuming.

What it does

  • Flow provides 2 buttons on YouTube videos. "Transcribe" will produce a full transcript of the video. You can then search for an important pointer from this transcription. "Summarize" will produce a shorter textual version of what was said in the video. Before a quiz/test, this is what you want to be reading!

  • Parts indicated with "[...]" are parts where the algorithm thinks is redundant and thus omitted. You can still view the content by visiting these parts in the full transcript.

How we built it

  • Retrieving the transcript from a YouTube video is actually not that simple since the YouTube api only allows you to get the transcription for videos where you are the owner. However, everytime a user clicks on a video, the transcript is sent to the client to enable "closed captions". Flow makes use of these through some nifty JavaScript and collects the the full transcript of the video.

  • This textual data is then sent to the Meaning Cloud Text Summarization API to get the summary. The API performs extractive summarization where only sentences that provide the most info are kept. Each sentence is analyzed given a score based on the TextTeaser & TextRank algorithms. The sentences with the higher score are kept.

  • This summary is returned and is shown to the client in a new window for them to copy to their personal notes.

The TextTeaser & TextRank algorithms calculate the scores based on the relative position of the sentence in the text, titles and section headers, presence of words in italics and bold, numbers, and/or some special keywords/phrases.

Challenges we ran into

  • Retrieving the YouTube transcript was harder than initially thought. Spent a lot of time figuring this out.
  • Injecting html on the YouTube page was also time consuming.

Accomplishments that we're proud of

  • This was my first time building my own chrome extension. Being able to finish it within ~20 hours for a first ever is great.
  • The extension will actually be useful for me. I can see myself using it later.

What we learned

  • How to build a chrome extension.

What's next for Flow

  • Improve the format of transcript extracted from the YouTube video.
  • Creating a custom server and expose the endpoint for summarization will reduce API consumption costs.
  • Create a custom & more advanced summarization algorithm mainly abstractive summarization, involving text rewriting, using neural approaches based on sequence to sequence models. But this will require a lot more learning and research!

Built With

Share this project:

Updates