Magnify.io front end. Please see "Try it out" replit repository.
Magnify.io continued. We provide an option for user feedback.
Command line, back end. Please see "Try it out" github repository.

Inspiration

The internet has facilitated a lot of hate speech, whether it’s on Twitter, Discord, or through unscientific publications. This has led to many marginalized people feeling unsafe and very uncomfortable on the internet.

As both women of color in STEM, we have experienced firsthand how discriminatory speech can alienate people. That’s why mental health is incredibly important to us. We are both super passionate about computer science, and the internet is a big part of that. We think that everyone deserves a place online where they can feel safe.

We decided, in response to the growing problem of hate speech, to build software that determines if a given text is emotionally distressing to people from marginalized communities. No one should have to come across hurtful messages without warning.

What it does

Using open-sourced, dynamically generated hate-speech text data, magnify.io's classifier algorithm analyzes text in order to determine whether or not a piece of text may be triggering.

How we built it

This project required a bit of math and an elementary understanding of probability and data structures. Our machine learning classifier algorithm measured the probability of a word appearing in an offensive sentence vs a non-offensive casted sentence. For example, we used Bayes' Theorem to account for the conditional probability of each. The front end was built using HTML and CSS. The font size used (Verdana) was used for its accessibility and because it’s easy to read. The colors used for the site were also designed to be easy to read and aesthetically pleasing.

CSV file of generated hate speech used in our project: B. Vidgen, “Bvidgen/dynamically-generated-hate-speech-dataset: Repository for the dynamically generated hate speech dataset by Vidgen et al. (2021).,” GitHub, 2021. [Online]. Available: https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset. [Accessed: 28-Aug-2022].

Challenges we ran into

Dealing with a large dataset of values posed a significant issue at first, as one error related to the poor formatting of the CSV file led to our code not running at all. We solved this issue by looping through the data set in order to find lines that could not be scanned (eg: lines that were empty, faulty delimiters, etc).

As both of us had minimal experience with front-end scripting languages we realistically avoided our original plan of a chrome extension, which proved to be very technically advanced. In the future, we absolutely want to explore other UI/UX approaches!

Accomplishments that we're proud of

Relative to our limited experience in software engineering, our project was extremely difficult in terms of technicality. It required a lot of planning, debugging, and reformatting data in a way that worked well with our algorithm. This was probably one of our most challenging programming experiences, given the limited amount of time provided. As our first hackathon, we are proud of what we were able to accomplish over the course of ~24 hours.

Of equal importance, we took the time to create a platform experience and design we both loved. This is definitely a project we want to revisit in the future! We both spent a lot of time learning more about HTML, JavaScript, and CSS in order to make our website both soothing and enjoyable to look at.

What we learned

In our first hackathon, we learned to pace ourselves more effectively. Both of us had varying skill sets, so being able to each individual work on our own area of relative expertise and then coming together to cross-examine each other’s code was initially difficult, but rewarding in the end. We learned many new things from being put in a stressful situation. We were able independently to research, debug and discuss our code not only with each other but with the help of mentors. We solidified our understanding of programming. We spent a lot of effort in planning: illustrating the logical structure of our classifier while tinkering with our front-end design.

What's next for magnify.io

The next steps would be to more seamlessly connect the front-end and back-end using an API.

As we gain theoretical knowledge in data science and programming, we would like to add nuances and complexity to our algorithm. How can we account for context, sarcasm, and satire? How about multimodal data, like internet memes that combine both image and text? How can we improve our accuracy and run time? These are all questions we have already considered in the process of creating magnify.io. Perhaps instead of two binary classifications, we can rank text on a spectrum of microaggressive, extremely offensive, or not offensive. That would require different data.

Additionally, we would love to learn more about how we can implement technologies such as frameworks, APIs, libraries, etc. One of our future goals is to create a web scraper algorithm that scraped a website for text in order to analyze it. Additionally, we would like to perfect the UI/UX design. We will learn more about these processes!