A machine-learning project that classifies Amazon product reviews as real or fake based on their text content.
This project reads Amazon reviews from CSV data, cleans the text, engineers features, trains multiple classifiers, and serves predictions through a Streamlit dashboard. The workflow covers data loading, feature engineering, model training/evaluation, and an interactive frontend for live inference.
Amazon-Review-Analyzer-2/
├── data/ # Raw and processed review data files
├── model/ # Trained joblib models, metadata, and BERT LoRA adapter files
├── src/ # Python scripts for data processing, training, and evaluation
├── webapp/ # Frontend application for serving model predictions
├── pyproject.toml # Project metadata and dependencies (used by uv)
├── .gitignore # Files and directories excluded from version control
└── README.md # Project documentation
This project uses uv for Python project and environment management.
macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | shWindows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"After installation, restart your terminal so the uv command is available.
-
Clone the repository
git clone https://github.com/wijayaju/Amazon-Review-Analyzer-2.git cd Amazon-Review-Analyzer-2 -
Create a virtual environment and install dependencies
uv sync
This reads
pyproject.toml, creates a.venvvirtual environment, and installs all listed dependencies. -
Add new packages as needed
uv add <package-name>
-
Run scripts
uv run python src/<script>.py
-
Place the raw review data file in the
data/directory. -
Run preprocessing to create the engineered dataset:
uv run python src/preprocess.py --input "data/<INPUT_CSV_PATH>.csv"Example:
uv run python src/preprocess.py --input "data/fake-reviews.csv"Output is written to
data/preprocessed_reviews.csv. -
Use the scripts in
src/to train the baseline TF-IDF + logistic regression model, the XGBoost model, or the BERT LoRA model. -
Trained artifacts are saved in
model/asjoblibfiles, JSON metadata, and abert_lora/adapter directory. -
Launch the web application from
webapp/to interact with the model through a browser.
Start an interactive UI where you can paste a review and choose a model (baseline, xgboost, or bert) for instant classification:
uv run streamlit run webapp/streamlit_app.pyThe app predicts whether the review is AI-generated or human-written, displays model confidence, and shows extracted feature values for the current input.
I used GitHub Copilot to help draft and scaffold parts of this project. I am the one responsible for reviewing, testing, and revising any AI-generated output before treating it as a final product. AI assistance is primarily used to accelerate development, but is not used to replace my own judgment.