Focii: An Anti-Procrastination Chrome Extension

Focii utilizes a custom machine-learning algorithm (no OpenAI) to determine whether a website is related to someone's current study keywords in order to block unrelated and distracting websites.

Front-end Github

Back-end Github

Inspiration

Like any project, we wanted to make a product that we could use in our day-to-day lives. As avid procrastinators, we thought creating an anti-procrastination Chrome extension would be extremely helpful for our study habits.

What it does

Our extension takes in a list of keywords from the user, which are related to what they are studying. For example, if someone was studying Vector Calculus, some keywords might be "vector calculus, vectors, calculus, curves, parametric, dot product, cross product". For any website that a user visits, the website is compared to these keywords using our custom machine learning algorithm and is blocked if unrelated to the study keywords, meaning that a user is restricted to only visiting websites that would help them study.

How we built it

Frontend

We built the front end in Vanilla Javascript, HTML, and CSS. To scrape the website keywords we obtained all of the text on the website, filtered out basic keywords that are unrelated to the meaning of the website, and sent this to Summary.js which filters out conjunctions and meaningless words, outputting a list of keywords that describe the overall content of the website. We then send the user-defined study keywords and the website keywords to the backend where we handle whether or not to block the website.

Backend

Our backend is built entirely in Python. We utilize the pre-trained sentence-transformers model all-mpnet-base-v2 in order to handle embedding the keywords from the user and website so that we could semantically compare the two and determine whether they are similar.

We first started simply using the cosine-similarity metric to compare the embeddings of the words and block if below some threshold value, but we found that the accuracy of this alone was not good enough for our classification task. We decided that for each list of keywords, we would instead average all of the word embeddings to obtain a single embedding representation of the keywords, multiply this by some weight, add a constant error term to the embeddings, and then compare the transformed embeddings using the cosine-similarity metric. By feeding the averaged word embeddings through a linear equation, we could then optimize the weight and error term to minimize our classification error.

We collected training data so that we could utilize supervised learning on the dataset and optimize our blocking threshold and the parameters of the linear transformation. Within SciPy, we use the Nelder-Mead optimization method (since we didn't have access to gradients), with our objective function being to minimize the amount of error with blocking classification. We found that by averaging the word embeddings, and optimizing the weight, error term, and blocking threshold on our training data, we were able to reduce our classification error by 75%.

Challenges we ran into

All of us had almost zero experience with Javascript, which was a big hurdle. We started off wanting to do everything in Javascript, but we eventually realized we lacked the expertise to effectively classify whether websites should be blocked or not, so we decided to switch our backend to Python since it was also easier to use machine learning techniques.

Accomplishments that we're proud of

Our blocking algorithm is highly effective in classifying whether or not a website should be blocked based on the website content and the user-defined keywords. Since the back end just takes in two lists of keywords/phrases and compares them for similarity, we could generalize our extension for a whole host of content-filtering applications including spam, hate speech, spoilers (our personal favorite), and censorship.

We are also proud of the fact that we have a usable Chrome extension!

What we learned

We learned how to link front-end and back-end to create a full-stack application! We also learned that managing permissions, states, and scopes in Javascript is super hard.

What's next for Focii

Personalized content-filtering (continuous learning based on users' website activity and user feedback to determine optimal parameters for a user's studying and browsing habits)
Keep everything on front-end so there is nothing stored on a server
Add pomodoro timer

Built With

chrome
chromewebapi
css
html
javascript
machine-learning
python
pytorch
rest
restapi
scipy
summary.js
transformers

Submitted to

Hack Education
- Winner Third Place

Created by

I worked on the backend in order to classify whether or not the websites should be blocked. I had a fun time playing around with word embeddings, figuring out our algorithm and optimizing the parameters.

Colin Pannikkat
Hello! I am a fourth-year CS student interested in the intersection between machine learning and environmental sustainability.
I helped with graphic design and some of the front end work. Also worked on some features/functionality that got binned due to time constraints.

David Gesl
Sarvesh Thiruppathi Ahila
Hello! I'm a Junior at OSU interested in Computer Engineering.
Ajinkya Gokule
Hey! I am Junior studying Computer Science interested in AI and Music.