Developed by Team JimJamSlam
Trusty Krusty Reviews is an ML-based system designed to evaluate the quality and relevancy of Google location reviews. Our solution improves trust in reviews by automatically classifying them and ensuring fair representation of locations through intelligent filtering and highlighting of problematic content.
Challenge: Design and implement an ML-based system to evaluate the quality and relevancy of Google location reviews.
Our Solution: We developed a comprehensive review classification system that:
- Automatically categorizes reviews as Valid, Advertisement, Irrelevant, or Rant
- Provides confidence scores and violation detection
- Highlights suspicious content with visual indicators
- Offers both rule-based and ML-based classification approaches
- Enables bulk processing and export of cleaned datasets
This addresses the core challenge of maintaining review quality and relevancy at scale, helping users make informed decisions based on trustworthy location-based feedback.
- IDE/Development Environment: VSCode, Jupyter Notebook
- Version Control: Git/GitHub
- Data Processing: Jupyter Notebook for model training (
training.ipynb) - Web Framework: Streamlit for the interactive application
- Model Development: Local Python environment with GPU support
- Google Maps API: For gathering business details and location data (
googlemapslibrary) - Apify APIs:
- Google Maps Reviews Scraper: User review collection
- Google Maps Scraper: Business details and associated reviews
Note: The system was designed to work locally without requiring external API keys for inference, making it suitable for hackathon environments with cost constraints.
- PyTorch (2.6.0): Deep learning framework for model training and inference
- Transformers (4.52.0): Hugging Face transformers for pre-trained language models
- Sentence-Transformers (3.4.1): For text embeddings and similarity calculations
- scikit-learn (1.6.1): Traditional ML algorithms and evaluation metrics
- Datasets (4.0.0): Hugging Face datasets for data handling
- SafeTensors: Secure tensor serialization
- pandas (2.2.3): Data manipulation and analysis
- numpy (2.0.2): Numerical computing
- emoji (2.14.1): Text preprocessing for emoji handling
- googletrans (4.0.0rc1): Language translation utilities
- Streamlit: Interactive web application framework
- Streamlit-Folium: Map visualization components
- python-dotenv: Environment variable management
- watchdog: File system monitoring
- matplotlib: Data visualization
- seaborn: Statistical data visualization
- Google Local Reviews Dataset: 3,800 Google Maps reviews collected via Apify scrapers
- Manual Labeling: Semi-automated approach using ChatGPT-5 for initial labels, followed by manual validation
Located in dataset/ directory:
all_reviews.csv: Complete raw datasetfinal_df.csv: Processed dataset with featureslabel_data.csv: Labeled training dataplaces_data.csv: Business location metadata
assets/sample_reviews.csv: Test dataset with 10 representative reviews for demonstration
- Valid (High Quality): Relevant, meaningful feedback about the location
- Advertisement: Promotional content or external links unrelated to business
- Irrelevant: Content unrelated to the location or business
- Rant: Emotional, unconstructive, or inappropriate content
- Automated Review Classification: Multi-class classification with confidence scores
- Dual-Mode Operation: Business Mode (CSV upload) and Places Mode (live Google Places search)
- Batch Processing: Handle large CSV datasets efficiently
- Interactive Dashboard: Real-time metrics and before/after comparison
- Export Functionality: Download complete datasets with ML predictions and classifications
- Compact Score Display: Clean table format showing all classification confidence scores
- Local Processing: No external API dependencies for inference
- Python 3.8 or higher
- Git (optional, for cloning)
-
Clone the repository
git clone https://github.com/alvinnnnnnnnnn/TikTokJamFest.git cd TikTokJamFest -
Create and activate virtual environment
# Create virtual environment python3 -m venv venv # Activate (macOS/Linux) source venv/bin/activate # Activate (Windows) venv\Scripts\activate
-
Upgrade pip and install dependencies
pip install --upgrade pip pip install -r requirements.txt
streamlit run app.pyThe app will open automatically at http://localhost:8501
- Select Business Mode: Click "π’ Business Mode - CSV Analysis" in the sidebar
- Upload CSV Data: Use the main area file uploader
- Review Results:
- Feed Tab: Compare raw vs processed reviews with classification badges and scores
- Metrics Tab: View classification statistics and breakdowns
- Export Tab: Download complete dataset with ML predictions
- Select Places Mode: Click "π Places Mode - Live Review Analysis" in the sidebar
- Enable Location: Grant location permission for nearby search
- Search Places: Enter business names or categories to find locations
- Browse Results: Navigate through places with pagination and view classified reviews
Your input CSV should contain these columns:
review_idorid: Unique identifierreview_textortext: The review contentratingorscore: Star rating (1-5)userorreviewer: Username/reviewertimestampordate: Review date/time
# Open the training notebook
jupyter notebook training.ipynb
# Follow the notebook cells to:
# - Load and preprocess the dataset
# - Engineer features (review length, relevance scores)
# - Train the multi-modal model
# - Evaluate performance metrics# Run the app
streamlit run app.py
# Upload the sample dataset
# File: assets/sample_reviews.csv
# Expected results:
# - 4 Valid reviews (Sarah_M, John_D, CoupleGoals, FoodieExpert)
# - 2 Ad reviews (PromoBot_123, FakeAccount_99)
# - 2 Rant reviews (AngryCustomer, LazyReviewer)
# - 2 Irrelevant reviews (RandomReviewer, SecondHand_Info)The model achieves the following metrics on our test dataset:
| Metric | Validation Set | Test Set |
|---|---|---|
| Weighted F1-Score | 0.81 | 0.78 |
| Weighted Precision | 0.78 | 0.75 |
| Weighted Recall | 0.84 | 0.81 |
Generate a larger test dataset for performance evaluation:
import pandas as pd
import random
base_reviews = [
'Great food and amazing service!',
'Visit our website www.deals.com for offers!',
'TERRIBLE PLACE!!!! WORST EVER!!!!',
'Never been here but heard bad things.',
'Nice atmosphere, good food.',
]
# Generate 1000 rows for testing
data = []
for i in range(1000):
review = random.choice(base_reviews)
data.append({
'review_text': f'{review} #{i}',
'rating': random.randint(1, 5),
'user': f'User_{i}',
'timestamp': f'2024-01-{(i%30)+1:02d}'
})
df = pd.DataFrame(data)
df.to_csv('large_test.csv', index=False)From the raw dataset, additional features were engineered:
- Review Length: Word count analysis for quality assessment
- Relevance Scoring: Cross-encoder model comparing review text with business category keywords (0-1 similarity score)
- Text Encoder: DistilRoBERTa for review text processing
- Feature Integration: Combines text embeddings with numerical features
- Classification Head: Multi-class output with confidence scores
Complementary rule-based system for high-precision detection:
- Empty/missing text β Low Quality
- URL detection β Advertisement check
- Keyword matching β Advertisement flagging
- Pattern recognition β Rant/spam detection
Command Not Found Error:
# Ensure virtual environment is activated
source venv/bin/activate
pip install -r requirements.txt
streamlit --version # Verify installationPort Already in Use:
streamlit run app.py --server.port 8502Import Errors:
- Ensure you're in the project root directory
- Verify all dependencies are installed
- Check Python path includes project directory
This application can be deployed to:
- Streamlit Cloud (recommended)
- Hugging Face Spaces
- Any platform supporting Streamlit apps
- Dataset Imbalance: Majority of reviews categorized as High Quality, potentially causing model bias
- Limited User Context: Could benefit from reviewer history and profile metadata for stronger signals
- Language Support: Currently optimized for English reviews
- Active Learning: Incorporate user feedback to improve model accuracy
- Multilingual Support: Extend classification to multiple languages
- Real-time Processing: Stream processing for live review analysis
- Advanced Features: Sentiment analysis, topic modeling, and trend detection
- Data Collection: Scraping data from Apify using Google Maps Reviews Scraper and Google Maps Scraper
- Data Processing: Development of code for data preprocessing and feature engineering
- Model Development: Development and training of the multi-modal machine learning model
- Backend Logic: Implementation of classification algorithms and rule-based systems
- Model Development: Development and training of machine learning model architecture
- Documentation: Devpost submission preparation and technical documentation
- UI Development: User interface design and implementation for the Streamlit application
- Project Management: Coordination of development workflows and deliverables
- Web Application: Full-stack development of the Streamlit web application
- Video Production: Creation of demonstration video showcasing the system capabilities
- Team Coordination: Project management, timeline coordination, and team communication
- Integration: System integration and deployment preparation
This project is open source and available under the MIT License.