ML-Based System for Evaluating Google Local Reviews

Team Members: Balakrishnan Vaisiya, Atharshlakshmi Vijayakumar

Project Overview

Our project is an ML-based system designed to evaluate the quality and relevancy of Google Local Reviews. The specific problem we address is the challenge of automatically detecting low-quality reviews—such as advertisements, irrelevant content, or rants—while preserving valid, informative reviews.

This directly tackles the problem of ensuring trustworthiness and reliability in location-based reviews, which are often cluttered with spam or misleading content.

The system works by:

Classifying reviews into four categories: Ad, Rant, Irrelevant, and Valid.
Feature engineering to capture useful signals, such as review length, all-caps ratio (to detect rants), and a heuristic relevancy score (longer reviews with higher sentiment and lower caps ratio are more trustworthy).
Labeling data using a combination of Qwen with few-shot prompting and manual hand-labeling of 1000 and 200 reviews respectively to improve data quality.
To capture emotions effectively, we trained RoBERTa on the GoEmotions dataset and leveraged it for feature extraction.
Training a Logistic Regression to classify reviews into categories.
Evaluating performance with precision, recall, and F1-score to measure how well the system identifies different types of low-quality reviews.
This solution helps platforms enforce content policies and provides more reliable information for users making location-based decisions.

Setup Instructions

1. Clone Repository

Clone the repo using:

git clone https://github.com/atharshlakshmi/techjam

2. Create Virtual Environment

Create your own virtual environment in the terminal

python -m venv venv
source venv/bin/activate   # Mac/Linux
venv\Scripts\activate      # Windows

3. Install Dependencies

pip install -r requirements.txt

4. Install Data

The dataset is not committed to Git (protected by .gitignore).

Go to UCSD Google Local Dataset (https://mcauleylab.ucsd.edu/public_datasets/gdrive/googlelocal/).
Download complete 'Other' reviews and metadata.
Add the downloaded files to your local repo under:

data/raw

5. Install Trained Model

The trained model is not committed to Git (protected by .gitignore).

Go to the Trained Model Google Driv (https://drive.google.com/drive/folders/1cqsuyhh5_O1Ei7SdYWW-wx-g_DE2MlEn).
Download and unzip the complete folder.
Add the downloaded folder to your local repo.

5. Setup Environment Variables

Create a .env file in the project root with:

HF_TOKEN=yourhuggingfacetoken

To Reproduce Results

Download the data as mentioned above.
Visit pre_processing/process_data.ipynb to explore the data.
Label the data by running both labelling/hand_label_data.py and labelling/qwen_label_data.ipynb.
Do feature engineering by running feature_engineering/extract_features.ipynb.
Train the model and get your results by running models/logistic_regression.ipynb.
To classify a new set of data, clean and extract its features before inputting it into the model.

Future Improvements

Fine-tune larger pre-trained models on more labelled reviews.
Add metadata features (e.g., GPS proximity, user history).
Deploy as a real-time API with explainability (XAI).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-Based System for Evaluating Google Local Reviews

Team Members: Balakrishnan Vaisiya, Atharshlakshmi Vijayakumar

Project Overview

Setup Instructions

1. Clone Repository

2. Create Virtual Environment

3. Install Dependencies

4. Install Data

5. Install Trained Model

5. Setup Environment Variables

To Reproduce Results

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
demo		demo
feature_engineering		feature_engineering
labelling		labelling
models		models
pre_processing		pre_processing
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
train_roberta.ipynb		train_roberta.ipynb

Folders and files

Latest commit

History

Repository files navigation

ML-Based System for Evaluating Google Local Reviews

Team Members: Balakrishnan Vaisiya, Atharshlakshmi Vijayakumar

Project Overview

Setup Instructions

1. Clone Repository

2. Create Virtual Environment

3. Install Dependencies

4. Install Data

5. Install Trained Model

5. Setup Environment Variables

To Reproduce Results

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages