Inspiration

Persuasive language is a 0-day vulnerability that everyone has. When people are aware that someone is trying to influence them, it is easy to step back and process things logically. However, when scrolling social media or reading a news article we consume content in a passive state. During this time persuasive language can exploit the backdoor into our psyche and cause us to form opinions without us even realizing. 
A 2022 NIH study analyzed the ability of social media as a persuasive platform to influence the behavior of young adults aged 18-24. This study concluded that social media was capable of significantly shaping their behavior. We set out to create a tool that can combat this critical vulnerability in the human psyche. 

What it does

    Our tool UVT or Ultra Violet Text is a chrome extension that aims to expand users’ media literacy by highlighting persuasive language. By highlighting persuasive language, we empower our users to consume information responsibly and combat viral misinformation. The extension allows you to change the highlight color to better fit the page and blacklist words to flag them on any webpage. 

How we built it

When a user opens a webpage, our chrome extension immediately tokenizes the DOM using treewalkers and object iterators. Those sentences are then sent to a proxy server using HTTPS since Chrome extensions only allow HTTPS requests. Our proxy server uses ssh tunneling to expose a node with no public IP and forward all uncached sentences to our compute node. Intel Developer Cloud node uses torchserve, a torch-optimized REST server, to efficiently classify all sentences using a RoBERTa model. The proxy server caches the HTTP responses from the compute node and sends a secure HTTPS response to the client extension. Our Chrome Extension highlights potentially persuasive text using cached DOM references. 

Challenges we ran into

    Google Chrome Extensions require all data going in to them to be https when operating on a page fetched with HTTPS, we used Tech.domain, Certbot, and Digital Ocean to create a https proxy server that served as a go between for our extension and classification model. 
    Training set for the classification model had an uneven split of data of 70/30 preventing accurate classification. We trimmed the data set to 50/50.
    Tracking mutations throughout the webpage without overloading the server with requests. The observer would create feedback loops overloading Google Chrome numerous times. 
    Classifying each sentence sequentially lead to many sequential api calls. Rewriting the parsing code to be decoupled from the classification code allowed for batched inference. This reduced the time to process the page and the number of fetch requests by an order of magnitude.

Accomplishments that we're proud of

    We wrote a general purpose, webpage agnostic sentence tokenizer for web pages. This allowed us to group HTML elements on sentence boundaries, organize them into batches for efficient inference and low latency, them modify the DOM to highlight sentences based on the classifier.
    Building a minimally invasive user interface for the highlighting functionality of the chrome extension that parses the DOM recursively with customization.
    Jerry Rigging together an actual backend with only 7$, free software, ssh tunneling, and a lot of work. It was definitely a great learning experience since I'd never worked with any networking, and now I know how to use DNS records, SSL certificates

Built With

Share this project:

Updates