Amazon Review Analyzer

A machine-learning project that classifies Amazon product reviews as real or fake based on their text content.

Project Overview

This project reads Amazon reviews from CSV data, cleans the text, engineers features, trains multiple classifiers, and serves predictions through a Streamlit dashboard. The workflow covers data loading, feature engineering, model training/evaluation, and an interactive frontend for live inference.

File Structure

Amazon-Review-Analyzer-2/
├── data/            # Raw and processed review data files
├── model/           # Trained joblib models, metadata, and BERT LoRA adapter files
├── src/             # Python scripts for data processing, training, and evaluation
├── webapp/          # Frontend application for serving model predictions
├── pyproject.toml   # Project metadata and dependencies (used by uv)
├── .gitignore       # Files and directories excluded from version control
└── README.md        # Project documentation

Getting Started

Prerequisites

This project uses uv for Python project and environment management.

Install uv

macOS / Linux

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

After installation, restart your terminal so the uv command is available.

Set Up the Project

Clone the repository

git clone https://github.com/wijayaju/Amazon-Review-Analyzer-2.git
cd Amazon-Review-Analyzer-2

Create a virtual environment and install dependencies
```
uv sync
```
This reads pyproject.toml, creates a .venv virtual environment, and installs all listed dependencies.
Add new packages as needed
```
uv add <package-name>
```
Run scripts
```
uv run python src/<script>.py
```

Usage

Place the raw review data file in the data/ directory.

Run preprocessing to create the engineered dataset:

uv run python src/preprocess.py --input "data/<INPUT_CSV_PATH>.csv"

Example:

uv run python src/preprocess.py --input "data/fake-reviews.csv"

Output is written to data/preprocessed_reviews.csv.

Use the scripts in src/ to train the baseline TF-IDF + logistic regression model, the XGBoost model, or the BERT LoRA model.
Trained artifacts are saved in model/ as joblib files, JSON metadata, and a bert_lora/ adapter directory.
Launch the web application from webapp/ to interact with the model through a browser.

Run the Streamlit App

Start an interactive UI where you can paste a review and choose a model (baseline, xgboost, or bert) for instant classification:

uv run streamlit run webapp/streamlit_app.py

The app predicts whether the review is AI-generated or human-written, displays model confidence, and shows extracted feature values for the current input.

Responsible AI Use

I used GitHub Copilot to help draft and scaffold parts of this project. I am the one responsible for reviewing, testing, and revising any AI-generated output before treating it as a final product. AI assistance is primarily used to accelerate development, but is not used to replace my own judgment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Review Analyzer

Project Overview

File Structure

Getting Started

Prerequisites

Install uv

Set Up the Project

Usage

Run the Streamlit App

Responsible AI Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
model		model
src		src
webapp		webapp
.gitignore		.gitignore
AI_LOG.md		AI_LOG.md
AI_REFLECTION.md		AI_REFLECTION.md
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Amazon Review Analyzer

Project Overview

File Structure

Getting Started

Prerequisites

Install uv

Set Up the Project

Usage

Run the Streamlit App

Responsible AI Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages