Trusty Krusty Reviews

Overview

Trusty Krusty Reviews is an ML system that assesses the quality and relevancy of Google location reviews. It reduces noise (ads, off-topic posts, rants) and surfaces trustworthy feedback so users, businesses, and platforms can rely on cleaner signals at scale.

Problem

Public ratings are often distorted by irrelevant, promotional, or low-effort reviews. The challenge is to design and implement an ML-based system that evaluates the quality and relevancy of location reviews and supports policy-aligned filtering at scale.

Solution

Our system classifies each review into one of four categories—Valid, Advertisement, Irrelevant, or Rant—assigns confidence scores, highlights suspicious tokens (e.g., URLs/promo phrases), and supports bulk processing and export. Inference runs locally for reliability and cost control.

Key Features

Dual modes: Business Mode (CSV upload and analysis) and Places Mode (live Google Places search and review classification)
Multi-class predictions with per-class confidence
Before/after comparison and metrics dashboard
Compact table view of all class confidence scores
Export of full datasets with predictions and classifications
Local inference (no external API keys required at runtime)

How It Works

Preprocessing and normalization (emoji handling, token cleanup, optional translation)
Feature engineering (DistilRoBERTa embeddings + numerical features such as review length and relevance scores)
Classification (multi-modal model + high-precision rules for obvious ads/low-effort content)
Streamlit UI for analysis, comparison, and export

Development Tools

VSCode, Jupyter Notebook
Git/GitHub for version control
Streamlit for the interactive application
Local Python environment with optional GPU

APIs Used

Google Maps API (googlemaps) for business details and location metadata
Apify: Google Maps Reviews Scraper and Google Maps Scraper for review and business data collection
LLM-assisted pre-labeling (ChatGPT) followed by manual validation

Libraries and Frameworks

PyTorch, Transformers, Sentence-Transformers, scikit-learn, datasets, safetensors
pandas, numpy, emoji, googletrans, python-dotenv, watchdog
Streamlit, streamlit-folium, matplotlib, seaborn

Assets and Datasets

~3,800 Google Maps reviews collected via Apify scrapers
Semi-automated labeling: LLM-generated initial labels with a manual validation pass
Repository data files:
- dataset/all_reviews.csv (raw)
- dataset/final_df.csv (processed features)
- dataset/label_data.csv (labeled training set)
- dataset/places_data.csv (business metadata)
- assets/sample_reviews.csv (10-row demo subset)

Relevance and Impact

By filtering promotional, off-topic, and unconstructive content and elevating reliable reviews, the system provides a cleaner, policy-aligned signal for any Google location category. Users make better choices, businesses gain fairer representation, and platforms reduce moderation overhead with transparent, reproducible tooling.

Built With

google-maps
googleplacesclient
numpy
pandas
python
streamlit

Updates

Kerway Tan started this project — Aug 30, 2025 06:43 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.