This project performs sentiment analysis and topic modelling on review data using Google Cloud Natural Language API and Vertex AI for TF-IDF and LDA.
- Project Presentation: Google Slides
Tip
For a quick overview of the project and results, start with the Google Slides presentation above.
Important
Ensure you have the following before starting:
- Python 3.8+
- Google Cloud account with:
- Cloud Storage API enabled
- Natural Language API enabled
- Vertex AI API enabled
- Service account with appropriate permissions
Create and activate a virtual environment:
# Create virtual environment
python -m venv venv
# Activate on Linux/Mac
source venv/bin/activate
# Activate on Windows
# venv\Scripts\activatepip install -r requirements.txt-
Download your service account JSON key from Google Cloud Console:
- Go to IAM & Admin → Service Accounts
- Create or select a service account with the required permissions
- Keys tab → Add Key → Create new key → JSON format
-
Save the downloaded file as
service-account-key.jsonin the project root- This file is automatically gitignored for security
- The notebook expects this exact filename and location
Note
Update the following variables in the notebook to match your Google Cloud project:
bucket_name: Your GCS bucket name (currentlyubc-bolt-case)project: Your Google Cloud project ID (currentlysent-analysis-452609)
- Open
reviews_notebook.ipynbin Jupyter or VS Code - Run the cells sequentially:
- Section 1: Sentiment Analysis - Computes sentiment scores for reviews
- Section 2: Topic Modeling - Classifies reviews into categories
- Section 3: Data Visualization - Creates plots and insights
- The
.gitignorefile is configured to exclude all*.jsoncredential files - Always use environment variables or local files excluded from git
- Rotate your keys immediately if they are accidentally exposed
Note
Data files are expected in your Google Cloud Storage bucket.
The notebook expects:
- Input:
reviews.csvin your GCS bucket - Output:
reviews_final_model_results.csvsaved to GCS
See LICENSE file for details.
Contributor: Mikail Durrani, Jacky Zhong and me (ofc)