N-dimensional Hatred

Inspiration

Our team was inspired with machine learning that we studied and wanted to do something cool with it.

What it does

Our code identifies hate speech in twitter type corpus. As an example we took Donald Trump's tweets.

How we built it

We wrote on python and python's libraries for data analysis (NumPy, Pandas, Sklearn). We used gensim - implementation of Word2Vec algorithm. To perform our code we used Croc Cloud.

Challenges we ran into

We understood that to understand hate speech can be complicated even for humans, that's why we started ouR work with discussion about structure of hate speech and its operationalisation. The most difficult part was to have Trump's tweets tagged according to their intensity. Also, our model needed features to be trained with, so we manually analysed tweets, trying to understand what was the primary offense of the message.

Accomplishments that we're proud of

We managed to use and manipulate vectors for NLP. Also, we are glad that our model gives 70-78 percent of accuracy.

What we learned

We used Word2Vec in our project and therefore learned it in practice. Also our background was mostly about numeric data, and now we have an idea how to process natural language.

What's next for N-dimensional Hatred

In theory algorithm can be used for detecting hate speech in media to prevent hostility spread. Our algorithm can be improved by increasing corpus of tagged speech.

Built With

croc-cloud
gensim
machine-learning
nlkt
nltk
numpy
pandas
python
scikit-learn
word2vec

Updates

Revecca V started this project — Nov 27, 2016 02:14 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.