SensibleSpeech

GIF
Extension demonstration on Twitter

Inspiration

The inspiration came from the fact that a lot of third-party sites may not have strong machine learning capabilities to filter the content that gets posted in their comment sections, etc.., therefore a website-independent solution is needed to label and censor hate-speech.

What it does

The Chrome extension parses a website text to determine if it's harmful for consumption. If the text is classified as harmful, we replace the HTML content for the that specific text, and denote the content as hate-speech.

How we built it

We are using an Azure VM as the backend for classifying sentences as hate speech. For the classification model we are using a dehate-bert-monoenglish from HuggingFace. Currently, the extension only supports English as a language. The chrome extension is build using JS, CSS and HTML and shows an overview of the hate-speech that a user has been exposed too (severity of content and number of views).

Challenges we ran into

Identifying relevant containers for text, currently the solution works in Twitter by identifying CSS elements containing the post text. For a future solution a more general DOM parsing method would need to be developed. The Berkeley deep-learning model presented in the workshop was non-functional, therefore we decided to use a different pre-trained model available on HuggingFace.

Accomplishments that we're proud of

Extension is currently working (when used in dark mode) and correctly identifies hate-speech and labels it over with a red-bar (see GIF).

What we learned

Programming of Chrome extensions to interact with pages content. Deploying VMs on Azure. Work with a team across 9 time-zones.

What's next for SensibleSpeech

The development next steps are:

Extending the DOM parser to work independently of CSS classes.
Moving model inference to the browser, thus avoiding costly REST API requests and removing VM running in the background (which would allow free use of the extension, otherwise it would be a paid extension to be able to cover Azure costs).
Provide multi-lingual model support, as the hate-speech problem is more present in non-english content (https://www.washingtonpost.com/outlook/2021/10/28/misinformation-spanish-facebook-social-media/)
Extend configurability of what hate-speech concepts to label, which to redact and which to replace with less damaging content (kind of translating from hate-speech to normal speech).
Integrating de-biased hierarchical attention model for better sentence classification.

Built With

Updates

Matteo Berchier started this project — Oct 02, 2022 04:53 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.