Sentiment scraper

Image of Scraped data
Image of scraper

Inspiration

Daily thousands of people suffer from depression and anxiety and even more with only mild symptoms but get worse over time due to exposure to the internet.

What it does

The project uses beautiful soup to scrap a given website for the keywords like "Depression", "anxiety" etc, and gives the information to the NLP model. The LSTM model of NLP does a sentiment analysis of whether the user is looking for informational content or emotional content and gives the result as the XML file with all the tags that the program thinks to need to be removed.

How we built it

It started with brainstorming about the type of model that we should use to get the sentiment analysis as there are two possible approaches, one classification, and another clustering. We chose to go forward with the latter as this allows us to be more flexible with possible additions to the scraping attributes. After that, we started building the scraping model that used beautiful soup to scrap a webpage given to it on the basis of attributes that we initially gave it.

Then we started with building the NLP model. We choose the IMDB dataset about the movie review to train our model to identify the text as an NL text and get the meaning of the text. Finally, we integrated the model with the scrapper to completely test the model against our text case. The model would give whether it feels that the sentence is of negative emotion or of neutral emotion and based on that value our scraper generates the XML output.

Challenges we ran into

Cleaning text data that is going to be vectorized is a challenge as we need to make sure that we don't remove import words from the data. Cleaning in general was extremely tough as there were over 5000 data points to make the model on.

What's next for Sentiment scraper

Turning this desktop application to a chrome extension, would automatically take the website's link that the user is on and remove that harmful content in real-time.

Built With

googlcolab
natural-language-processing
python
tensorflow

Updates

Chirag Malhotra started this project — Nov 13, 2022 10:28 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.