Sentiment Analyzer (IgnitionHacks 2020)

Inspiration & what it does

This project was created for Ignition Hacks 2020. The machine learning algorithm analyzes text from a CSV file for positive and negative sentiments.

How we built it

1. Data Cleaning ~ Remove HTML ampersand codes (&amp, etc.) and punctuation ~ Remove @twitterhandles ~ Remove stopwords (unnecessary words) ~ Make one with all lowercase (can be commented out to test) ~ Separate the text into individual words and two word phrases, as a list ~ Lemmatize

2. Feature Extraction Use count vectorizer to select features from the bag of words generated

3. Model Selection Test multiple models, including Logistic Regression, Random Forest, Artificial Neural Network (ANN)

Challenges

One challenge that our team had was the memory capacity of our computers and Google Colab. Our laptops crashed while attempting to run some of the algorithms. Also, some of us used Jupyter Lab rather than Colab, so it took a long time to push the notebooks to GitHub and download them onto another teammate's laptop.

Accomplishments that we're proud of

We were able to manage the work efficiently and collaborated well, despite the challenges and time restraints. We ran multiple variations of the data and models, which proved beneficial to discovering the optimal accuracy but also detrimental to our computer memory. Coming into the hackathon, we each had our areas of specialty and were able to use that to our advantage while coding the Sentiment Analyzer.

What we learned

We all learned a lot about machine learning and natural language processing. The workshops we attended and the research we conducted as a team certainly contributed to our increase in knowledge of writing machine learning algorithms. The Ignition Hacks community (communicating by discord server) also provided tips and useful information.

What's next for Sentiment Analyzer

In the future, we would like to improve our algorithm to optimize accuracy rates. This would come in the form of a bigger dataset and many more variations of data cleaning, like including punctuation or emojis. Also, we could use different models that we hadn't tested yet, such as Naive Bayes or k-Means.

Built With

csv
github
google-colab
jupyter-lab
nltk
numpy
pandas
python
sklearn
tensorflow

Submitted to

Ignition Hacks 2020

Created by

I worked on the data cleaning and preprocessing with JupyterLab using Python.

Anna Yang
I helped clean the data to extract one, two, and three word phrases, and trained a couple of models to do predictions.

Jenny Yang
I implemented Logistic Regression and Random Forest. I also built an Artificial Neural Network to make predictions on the tweets.

Ayush Raj

Updates

Anna Yang started this project — Aug 23, 2020 07:35 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.