Trusty Krusty Reviews

Overview

Trusty Krusty Reviews is an ML system that assesses the quality and relevancy of Google location reviews. It reduces noise (ads, off-topic posts, rants) and surfaces trustworthy feedback so users, businesses, and platforms can rely on cleaner signals at scale.

Problem

Public ratings are often distorted by irrelevant, promotional, or low-effort reviews. The challenge is to design and implement an ML-based system that evaluates the quality and relevancy of location reviews and supports policy-aligned filtering at scale.

Solution

Our system classifies each review into one of four categories—Valid, Advertisement, Irrelevant, or Rant—assigns confidence scores, highlights suspicious tokens (e.g., URLs/promo phrases), and supports bulk processing and export. Inference runs locally for reliability and cost control.

Key Features

  • Dual modes: Business Mode (CSV upload and analysis) and Places Mode (live Google Places search and review classification)
  • Multi-class predictions with per-class confidence
  • Before/after comparison and metrics dashboard
  • Compact table view of all class confidence scores
  • Export of full datasets with predictions and classifications
  • Local inference (no external API keys required at runtime)

How It Works

  • Preprocessing and normalization (emoji handling, token cleanup, optional translation)
  • Feature engineering (DistilRoBERTa embeddings + numerical features such as review length and relevance scores)
  • Classification (multi-modal model + high-precision rules for obvious ads/low-effort content)
  • Streamlit UI for analysis, comparison, and export

Development Tools

  • VSCode, Jupyter Notebook
  • Git/GitHub for version control
  • Streamlit for the interactive application
  • Local Python environment with optional GPU

APIs Used

  • Google Maps API (googlemaps) for business details and location metadata
  • Apify: Google Maps Reviews Scraper and Google Maps Scraper for review and business data collection
  • LLM-assisted pre-labeling (ChatGPT) followed by manual validation

Libraries and Frameworks

  • PyTorch, Transformers, Sentence-Transformers, scikit-learn, datasets, safetensors
  • pandas, numpy, emoji, googletrans, python-dotenv, watchdog
  • Streamlit, streamlit-folium, matplotlib, seaborn

Assets and Datasets

  • ~3,800 Google Maps reviews collected via Apify scrapers
  • Semi-automated labeling: LLM-generated initial labels with a manual validation pass
  • Repository data files:
    • dataset/all_reviews.csv (raw)
    • dataset/final_df.csv (processed features)
    • dataset/label_data.csv (labeled training set)
    • dataset/places_data.csv (business metadata)
    • assets/sample_reviews.csv (10-row demo subset)

Relevance and Impact

By filtering promotional, off-topic, and unconstructive content and elevating reliable reviews, the system provides a cleaner, policy-aligned signal for any Google location category. Users make better choices, businesses gain fairer representation, and platforms reduce moderation overhead with transparent, reproducible tooling.

Built With

Share this project:

Updates