WebMD Drug Review Classification

Introduction: What problem are you trying to solve and why?

The goal of our project is to create a model that can successfully classify drug reviews online. This is a relevant problem because there is a good amount of consumers that rely on these sources before deciding what to buy for their condition. Moreover, this issue is relevant during the current COVID-19 era, where many health programs have been shifted to a remote setting.

Related Work: Are you aware of any, or is there any prior work that you drew on to do your project?

For our project, we are drawing inspiration from the paper "Syntactically Meaningful and Transferable Recursive Neural Networks for Aspect and Opinion Extraction" by Wang and Pan from Nanyang Technological University. Sentiment analysis is a classification problem, as MonkeyLearn explains: "There’s a few types of sentiment analysis, review based, emotion based, and aspect based. It’s pretty hard to label sentiment as only 60-65% of people agree on the sentiment of a particular text. There’s three types of sentiment analysis algorithms, rule-based, automatic and hybrid. It’s hard to parse emojis, sarcasm and understand comparisons." Although there have been existing implementations like the one in the research paper and "Sentiment analysis on drug reviews" by Szokoly, our implementation will be unique in that we use TensorFlow LSTMs and LIME over PyTorch tools. Also, "'Why Should I Trust You?' Explaining the Predictions of Any Classifier" is a paper that assisted us with our LIME usage.

Data: What data are you using (if any)?

The dataset we are using for this project is the WebMD Drug Reviews Dataset from Kaggle. There are 360,000 reviews in total, and most preprocessing will just be getting rid of data that doesn’t have written comments in the review alongside separating out the comments (inputs) and the effectiveness, ease of use, and satisfaction (labels).

Methodology: What is the architecture of your model?

We will be training our model on a train split of the Kaggle dataset like we have been doing in our previous course assignments. We expect that the hardest part of the project will be implementing LIME effectively and making sure preprocess runs in a decent amount of time.

Metrics: What constitutes “success?”

We will add the ratings of each of the 3 categories in the dataset: 3-6 will be classified as negative, 7-11 will be classified as neutral, and 12-15 will be classified as positive, so accuracy will be measured on whether the model predicts the comments to be positive, neutral, or negative. Overall, we will give our model reviews and see if they output the correct classification to run experiments and determine success. In the research paper, the authors present the expressiveness and effectiveness of different forms of dependency-tree-based recursive neural networks for both single-domain and cross-domain settings. The deep recursive structure together with the dependency-tree information is able to associate automatic feature learning with syntactic structures, which have been proven to be crucial for both single-domain and cross-domain aspect/opinion terms extraction. At the same time, a conditional domain adversarial network is incorporated to learn domain-invariant word features based on their inherent syntactic structure. Our base goal is to correctly classify reviews as positive or negative. Our target goal is to correctly find words/phrases that are significant for identifying whether a review is positive or negative. Our stretch goal is to correctly identify the reviewer's initial condition based off of their review.

Ethics: What are the broader consequences of your project?

Deep Learning is a good approach to this problem because it involves the analysis of words. The use of embeddings and RNNS can help extract the features from the text. In CS1470, we have done multiple projects involving the creation of a model to draw conclusions from text sources. The major stakeholders are people who use WebMD reviews to inform their decisions of which drugs to use, as well as the drug companies. The consequences of mistakes made by our algorithm would be that people who could benefit from using a drug might refrain from doing so and not cure their condition, or that people suffering from a condition will end up using a drug that they thought would benefit them when it ended up either being ineffective or causing harmful side effects. Additionally, incorrect reviews may reflect either badly or falsely positively on a specific company, which could impact their reputation.

Links to Github repo, reflections, and poster are below in the "Try it out" section.

The content here is not the final reflection! It is an edited version of the project proposal deliverable. The final reflection is linked below.