FairBite
What Inspired Us
We began noticing that certain restaurants—especially those serving non-Western cuisines—were consistently underrated on Yelp despite being genuinely excellent. That raised a key question: is this simply a matter of taste, or is something structural influencing how people write reviews?
That question became the foundation of this project.
What We Learned
Bias doesn’t need to be intentional to be measurable. A reviewer doesn’t have to be consciously unfair—they just have to be unfamiliar. When this pattern is aggregated across thousands of reviews, a clear signal emerges.
We also found that the text of a review often tells a different story than its star rating. Our sentiment model effectively provides a second opinion on each review. The gap between what someone writes and what they rate is where bias becomes visible.
We define this gap as the bias score:
bias_score(c, k) = average sentiment for cuisine c in city k − average sentiment across the city
where:
- average sentiment for cuisine c in city k = mean sentiment of that cuisine’s reviews
- average sentiment across the city = mean sentiment of all reviews in that city
A negative score indicates that a cuisine is linguistically undervalued relative to its surroundings.
How We Built It
The pipeline consists of two main components:
1. Sentiment Model
We trained a bidirectional LSTM on Yelp reviews, using normalized star ratings as labels:
label = (stars − 1) / 4
To avoid overfitting to Yelp’s natural skew toward 4–5 star reviews, we balanced the training data using stratified simple random sampling (STSRS) across all five star levels.
The model outputs a continuous score between 0 and 1 using a sigmoid activation and is trained with mean squared error (MSE) loss.
2. TF-IDF Analysis
We applied TF-IDF per cuisine on balanced review samples to identify vocabulary that is most distinctive to each cuisine. This provides interpretable insight into where bias appears in the language.
Adjusted Rating
We define the adjusted rating as:
adjusted_rating = clip(original_rating + 0.5 × bias_score, 1, 5)
This shifts ratings slightly based on detected bias while keeping them within the standard 1–5 range.
Tech Stack
- Backend: FastAPI
- Frontend: Next.js
- Deployment: Render
Challenges We Faced
Data Scale
Scoring tens of thousands of reviews individually through a REST API was slow. Batching and progress logging made it manageable, but it remained the primary bottleneck.
Model Size
The trained model.pt file was 32MB—too large for a standard Git push. We had to introduce Git LFS mid-project to handle deployment.
Circularity
The biggest conceptual challenge was that our sentiment model is trained on star ratings, which may themselves be biased. In other words, we’re using a potentially biased signal to detect bias.
What makes this approach still useful is that the model learns general language patterns across all cuisines. Deviations at the cuisine level from this baseline remain meaningful—but this is a limitation we explicitly acknowledge.
Research
Go over to the github to see the research paper we made.
Built With
- fastapi
- next.js
- nltk
- pandas
- pytorch
- typescript
Log in or sign up for Devpost to join the conversation.