FairBite

What Inspired Us

We began noticing that certain restaurants—especially those serving non-Western cuisines—were consistently underrated on Yelp despite being genuinely excellent. That raised a key question: is this simply a matter of taste, or is something structural influencing how people write reviews?

That question became the foundation of this project.

What We Learned

Bias doesn’t need to be intentional to be measurable. A reviewer doesn’t have to be consciously unfair—they just have to be unfamiliar. When this pattern is aggregated across thousands of reviews, a clear signal emerges.

We also found that the text of a review often tells a different story than its star rating. Our sentiment model effectively provides a second opinion on each review. The gap between what someone writes and what they rate is where bias becomes visible.

We define this gap as the bias score:

bias_score(c, k) = average sentiment for cuisine c in city k − average sentiment across the city

where:

average sentiment for cuisine c in city k = mean sentiment of that cuisine’s reviews
average sentiment across the city = mean sentiment of all reviews in that city

A negative score indicates that a cuisine is linguistically undervalued relative to its surroundings.

How We Built It

The pipeline consists of two main components:

1. Sentiment Model

We trained a bidirectional LSTM on Yelp reviews, using normalized star ratings as labels:

label = (stars − 1) / 4

To avoid overfitting to Yelp’s natural skew toward 4–5 star reviews, we balanced the training data using stratified simple random sampling (STSRS) across all five star levels.

The model outputs a continuous score between 0 and 1 using a sigmoid activation and is trained with mean squared error (MSE) loss.

2. TF-IDF Analysis

We applied TF-IDF per cuisine on balanced review samples to identify vocabulary that is most distinctive to each cuisine. This provides interpretable insight into where bias appears in the language.

Adjusted Rating

We define the adjusted rating as:

adjusted_rating = clip(original_rating + 0.5 × bias_score, 1, 5)

This shifts ratings slightly based on detected bias while keeping them within the standard 1–5 range.

Tech Stack

Backend: FastAPI
Frontend: Next.js
Deployment: Render

Challenges We Faced

Data Scale

Scoring tens of thousands of reviews individually through a REST API was slow. Batching and progress logging made it manageable, but it remained the primary bottleneck.

Model Size

The trained model.pt file was 32MB—too large for a standard Git push. We had to introduce Git LFS mid-project to handle deployment.

Circularity

The biggest conceptual challenge was that our sentiment model is trained on star ratings, which may themselves be biased. In other words, we’re using a potentially biased signal to detect bias.

What makes this approach still useful is that the model learns general language patterns across all cuisines. Deviations at the cuisine level from this baseline remain meaningful—but this is a limitation we explicitly acknowledge.