Inspiration

We interact with tremendous amounts of information on a daily basis. Thoroughly understanding all of the information that we are presented with, however, is a time-consuming and challenging task. In particular, as students at Columbia, we are often required to read long excepts from works of literature, to sift through lengthy lecture slides, or to digest verbose textbook passages. To lighten our busy schedules from our required readings, we were inspired to develop an automated means of text summarization.

What it does

We developed a free-to-use machine learning-based means of summarizing text that is hosted on a web app (https://summarizr1.pythonanywhere.com/) and as a chrome extension. Specifically, given a user text input (either typed in, uploaded as a pdf, or via our extension), our unsupervised algorithm extracts the most important sentences from the input based on a similarity metric (cosine similarity), and returns these important sentences back to the user. The user can also customize the number of summary sentences that they want returned to them.

How we built it

We developed the extractive summarizer through the natural language toolkit (NLTK) package and the networkx package in python. The web application was developed using the Flask web-framework library in Python, and styled using vanilla JS and CSS. The extension was made using pure HTML, CSS, and HS.

Although it was not implemented in the web app or extension, we also developed an abstractive summarizer using stacked LSTMs with a custom attention layer using Tensorflow. Unfortunately, the model size was too large to be hosted on pythonanywhere for free.

Challenges we ran into / What we learned

Since we were completely new to developing chrome extensions, we had to learn the project formatting and javascript functionality that's exclusive to how chrome builds their extension applications. Additionally, since we wanted utilize the same back-end as the web application, we had to make use of Cross Origin Resource Sharing for the first time while also finding an effective means of transferring website data to a text format that the model could use.

Accomplishments that we're proud of

We successfully developed a text summarization web application and our first chrome extension.

What's next for Summarizr

We plan on improving the user interface into a single-page application by using React.js. In addition, we plan on developed a more lightweight abstractive summarization model that can be implemented in the website. For the chrome extension, we aim to build a more robust system for extracting relevant text and avoid feeding the model text from links, headers, widgets, etc.

Share this project:

Updates