LectureBoost

Inspiration

The idea behind LectureBoost was inspired by finals at Purdue. One of the best methods to review all the concepts covered in a class is to rewatch lectures from early in the semester to refresh your memory. However, lectures (and sometimes lecturers) can be extremely slow. Without so much downtime in a lecture there would be more time to get in all the studying you need before your big exams.

What it does

LectureBoost provides 3 main functionalities: white space removal, audio transcription/subtitle creation, video text recognition/notes compilation. All functionalities are brought together in a webpage hosted on AWS.

White space removal: Edits out the downtime from a video. This removes spaces in between sentences and cuts out long pauses.
Subtitle creation: Subtitles are matched up with the frames of the lecture and integrated into the video.
Audio transcription: Audio from the video is converted to text at 10 second intervals and added to a downloadable text file.
Notes Compilation: Unique frames from the video are added to a downloadable pdf file.
Video text recognition: Text from unique frames of the video is recorded and added to a downloadable text file.

How we built it

White space removal: Separating the audio (.wav) from a video (.mp4) allows us to analyze it byte by byte. After a 4-byte "data" tag, following bytes represent sound amplitudes. Converting these data bytes into decibels allows us to cut out all values below our threshold. Then, finding contiguous bytes below our decibel threshold allows us to determine ranges that will be cut out of the video. Audio analysis is done by hand and video trimming is done with the moviepy python library.

Subtitle creation/Audio transcription: Uses speech recognition to aggregate text from speech every interval. Decode the video and add subtitles using the aggregated text and also create a transcription text data with time stamps to allow users to search through lectures. The speech recognition is done by SpeechRecognition python library.

Notes Compilation/Video text recognition: Uses text recognition on frames that are captured through out the video. To avoid duplication, a pair of frames are compared using Mean Squared Error on pixel values and sequence matching on text scraped from the frames. The feature creates a pdf file that appends all the frames from the lecture that includes text. It also generates a text file with time stamps that matches with the slides.

Webpage: The webpage is built with a Reactjs frontend web-application where we can take mp4 files and other user input about preferences. The frontend relies on the python-flask-driven backend which performs the desired operations on the given video file. The backend then stores the new video file in an AWS S3 bucket where it can be accessed by the user.