Inspiration
Our inspiration comes from the relevant issue of content moderation on social media platforms. The volume of content hosted by these companies is constantly increasing and current algorithms are designed to maximize user engagement, leading to a need for vast amounts of non-numerical data to be processed and monitored. Currently, this job is done by teams of Content Moderators. Companies have to allocate resources to support these teams as well as deal with the mental health effects that come as a result of these moderators constantly viewing and interacting with violent/inappropriate content. Our Python library provides an easy-to-implement solution.
What it does
To solve the ongoing issue of hate speech and content moderation across various social media platforms, we created an easy to implement Python Library for social media companies. The library scans any text that is posted by a user, in addition to links, images or GIFs. From this, it determines whether the post contains hate speech, offensive language, or none of these options. Based on this, it is able to visualize the data by producing a visual graph of it, indicating how many times/to what degree a user has posted offensive language, hate speech, or neither. In addition to this, it also has other features such as getting a user’s stats on how many times they have said something that is hateful or offensive and provides a rating for the type of language they have used. In order to demonstrate our library, we have created a Discord chat bot, but it can also be implemented across different platforms.
How we built it
We built it by first starting with a flowchart on how the program design was going to be. We then looked into hatesonar, a prebuilt ML model which takes in a string and returns 3 float values. We built a simple class called HateRater which takes in a post and processes it using Regex for if it has a URL. If that url leads to a webpage it will go to HateCrawler, the webscraper to get the visible text from the page. If it is an image or gif file, then it will go to HateImage to extract the text from the image or gif. The text is then ran through hatesonar to get its values and averaged out. The discord bot displays that data.
Challenges we ran into
One of our challenges was accounting for variable input types, such as text, emojis, images, URLs (and their associated webpages), GIFS, etc. As content is not restricted to text-based input, we had to implement unique algorithms for images, as well as a separate program to check site content when provided a URL (by retrieving html data). Our biggest challenge was that our base library used a deprecated function (from Scikit-learn), which we had to significantly adapt and augment in our packages and resulting library.
Accomplishments that we're proud of
We’re especially proud of creating something that combined everyone’s individual skill sets, as well as our efficiency and productivity throughout the hacking process. We were able to quickly learn and adapt to different technologies that helped support our end-product library.
What we learned
With this project, we were able to develop our skills using Python libraries, which we used to create our library that was implemented with a Discord bot. In addition, we also developed our image processing skills, working with a variety of file types not solely limited to text files. For instance, we learned how to use web scraping techniques to extract text from GIF, .jpg, .jpeg and .png files in order to analyze them for hateful or offensive language. We also used image processing techniques with different libraries to extract the background from images, and create graphs.
What's next for Hate Speech Detector
Because our library is scalable, we hope to have our library implemented across various platforms in addition to our Discord bot, including but not limited to social media platforms such as Twitter, Facebook and Youtube. Furthermore, it can also be implemented in an educational setting such as on apps used by schools to communicate with students. Therefore, we ultimately hope to have it implemented on a wider cross-platform scale in the future in order to reduce the amount of violence and hate speech that is an ongoing issue.
Log in or sign up for Devpost to join the conversation.