Hieu Hoang (Calvin) calvinhoang203

👋 Hi! I'm Hieu (Calvin) Hoang.

I’m a data science enthusiast with a strong foundation in machine learning and AI and a passion for leveraging data to solve real-world problems.

🎓 Recent Graduate from UC Davis with a B.S. in Statistical Data Science with a Minor in Technology Management.
📊 Data Science Coordinator @ ASUCD Pantry, optimizing food inventory with predictive modeling.
🌍 Youth Advisory Council Member @ JFF, working to enhance career navigation tools for young adults.
💡 Currently learning about AI Agents and Generative AI to explore their potential in automation and decision-making.
🔍 Interested in machine learning, data visualization, and applied AI in healthcare, business, and technology.

🌟 Always open to connecting and collaborating—feel free to reach out! 🚀

🛠 Projects:

NYC Restaurant Health Inspection Prediction
- Inspiration:
  - Analyze 18 years of NYC restaurant health inspection data to identify what factors actually drive health code compliance and predict restaurant grades using machine learning.
  - Understand whether location, cuisine type, timing, or violations themselves are the strongest predictors of health inspection outcomes.
- What it does:
  - Analyzes 295,831+ inspection records spanning 2007-2026 from 30,627 unique restaurants across all 5 NYC boroughs.
  - Predicts restaurant health grades (A, B, or C) using a Random Forest classifier with 93.8% overall accuracy.
  - Features an interactive Streamlit dashboard with real-time data visualization, filtering capabilities, and grade prediction interface.
  - Automatically refreshes data daily from NYC Open Data API to ensure the dashboard always displays the latest inspection results.
  - Provides comprehensive analytics including grade distributions, violation patterns, borough comparisons, and cuisine type analysis.
  - Includes a prediction tool where users can input inspection details (violations, cuisine type, borough, date) to predict potential health grades.
- How we built it:
  - Processed and cleaned 295K+ raw inspection records using Pandas, aggregating multiple violation records into single inspections (reduced to 51,839 unique inspections).
  - Engineered 6 key features: total violations, critical violations, cuisine type, borough, month, and day of week.
  - Trained multiple models (Logistic Regression baseline, Random Forest) using scikit-learn, handling severe class imbalance (87% A grades).
  - Built comprehensive Streamlit dashboard with Plotly visualizations including pie charts, histograms, time series, scatter plots, and heatmaps.
  - Implemented automatic data refresh functionality that checks for updates daily and processes raw data into clean format.
  - Deployed interactive web application with filtering by borough, cuisine type, and date range, plus real-time model predictions.
- Key Outcomes:
  - Achieved 93.8% accuracy with Random Forest model, with 99% recall for A-grade restaurants.
  - Identified that critical violations (44%) and total violations (31%) account for 75% of model predictions, revealing that food safety violations are the primary driver of health grades.
  - Discovered that location, timing, and cuisine type have minimal impact (combined <20%) compared to actual violations.
  - Successfully processed and visualized 18 years of inspection data with interactive dashboards accessible to non-technical users.
  - Built production-ready application with error handling, data validation, and automatic data refresh capabilities.
- Technologies used:
Oprina: Conversational AI Avatar Assistant
- Inspiration:
  - Create a voice assistant with a lifelike avatar that handles email and calendar through natural conversation, designed for hands-free productivity with a smooth, reliable experience.
- What it does:
  - Provides a conversational AI avatar assistant that manages email and calendar tasks through voice commands.
  - Features a lifelike avatar interface for natural human-computer interaction.
  - Handles productivity tasks seamlessly without requiring manual input.
- How we built it:
  - Developed using React and TypeScript for the frontend interface.
  - Implemented FastAPI with Python for the backend services.
  - Integrated Google ADK, Gmail API, and Google Calendar API for email and calendar management.
  - Utilized Vertex AI and Gemini 2.0 Flash for advanced AI capabilities.
  - Incorporated HeyGen API for avatar generation and animation.
  - Used Supabase for database management and Google Cloud for deployment.
- Key Features:
  - Voice-activated email and calendar management
  - Lifelike AI avatar with natural conversation capabilities
  - Hands-free productivity workflow
  - Integration with Google services (Gmail, Calendar)
  - Real-time AI processing with Gemini 2.0 Flash
- Technologies used:
EpiAccess: Epidemic Forecasting & Healthcare Analysis Dashboard
- Inspiration:
  - Develop a comprehensive tool for analyzing infectious disease trends and understanding global healthcare access patterns to support educational research and emergency preparedness planning.
- What it does:
  - Analyzes 63,115 real epidemic records from COVID-19, SARS, and Monkeypox outbreaks across 222 countries.
  - Generates 6-month epidemic forecasts using both traditional exponential smoothing and PyTorch neural networks with confidence intervals.
  - Clusters 175 countries into 4 distinct healthcare access categories using K-means algorithm based on health expenditure patterns.
  - Provides interactive disease mapping with choropleth and bubble visualizations.
  - Generates AI-powered insights in plain English with confidence scoring and trend analysis.
  - Features "what-if" scenario planning showing how historical outbreaks would unfold in 2025.
- How we built it:
  - Built comprehensive data processing pipeline using Pandas to unify datasets from multiple sources (Kaggle, World Bank).
  - Implemented dual forecasting system: exponential smoothing for transparency and PyTorch neural networks for complex pattern recognition.
  - Developed healthcare access clustering using scikit-learn K-means with 3-year averaging (2020-2022) for pandemic stability.
  - Created interactive Streamlit dashboard with three main components: Disease Trends, Disease Map, and Healthcare Access Analysis.
  - Integrated Plotly for dynamic visualizations and implemented statistical analysis with correlation metrics and efficiency ratios.
- Key Outcomes:
  - Successfully processed and analyzed 63,000+ epidemic records with real-time interactive visualizations.
  - Achieved reliable trend analysis for educational purposes with clear confidence scoring and limitation transparency.
  - Identified 4 distinct global healthcare access patterns: High Access-Advanced Economy, Medium-High Access-Developing, High Priority-Limited Resources, and Low Access-Resource Constrained.
- Technologies used:
BrainBoost: Academic Success Coach
- Inspiration:
  - Provide students with a personalized, data-driven “coach” to track daily habits and predict academic performance.
- What it does:
  - Allows users to input daily metrics—study hours, sleep hours, social activities, physical activity, extracurriculars, and screen time.
  - Predicts letter-grade category and stress level using a Gradient Boosting model (0.9087 overall accuracy) trained on 2,000+ student records.
  - Visualizes progress over time in an interactive Streamlit dashboard, showing habit history alongside predicted outcomes.
  - Generates tailored recommendations in three categories—“Study Strategy,” “Wellness,” and “Balance”—based on predicted grade gaps and stress levels.
  - Includes a simulation tab where users adjust habit sliders to see potential effects on predicted GPA and stress.
- Key Outcomes:
  - Gradient Boosting model achieved 0.8175 letter-grade accuracy and 1.0000 stress-level accuracy, for an overall 0.9087 accuracy.
  - Empowered students to identify habit changes likely to improve GPA trajectories and manage stress effectively.
- How we built it:
  - Preprocessed the Student Lifestyle Dataset (2,000 records) in Pandas; engineered features such as Study–Sleep interaction, Social–Study ratio, Total Activity, Study Efficiency, and Life Balance.
  - Trained multiple classifiers (Logistic Regression, Random Forest, XGBoost, Decision Tree, Gradient Boosting) in scikit-learn and saved the best-performing pipeline in stacked_multioutput_predictor.pkl.
  - Developed a Streamlit app (app.py) to load the model and a StandardScaler (scaler.pkl), capture user inputs, perform real-time feature engineering, and display predictions.
  - Built interactive tabs:
    1. Input Habits: Numeric inputs for six daily activities, interactive time-allocation progress bar, and “Critical” warnings for unrealistic inputs.
    2. Progress: Displays latest predicted grade, stress level, time-to-graduation estimate, plus a line chart and table of habit history.
    3. Recommendations: Provides personalized tips for improving study habits, wellness, and work-life balance, and includes interactive sliders so users can adjust daily habits and immediately see how those changes might impact their predicted GPA and stress levels.
- Technologies used:
SHIPSmart
- Inspiration:
  - Help UC SHIP students avoid surprise medical bills by estimating healthcare costs up front.
- What it does:
  - Provides a real-time cost estimation and claims automation system.
  - Allows users to input plan details and claim data to calculate expected reimbursements.
  - Automates rebate processing to expedite refunds.
- How we built it:
  - Co-developed during HackDavis 2025 with SwiftUI on iOS.
  - Integrated Python back-end logic and the Cerebras API for machine learning calculations.
  - Leveraged OpenAI/Gemini and SQL to process and analyze insurance data in real time.
- Technologies used:
Aggie Reminder
- Inspiration:
  - Enhance volunteer management and communication at Aggie House.
- What it does:
  - Provides an admin portal to monitor volunteer work hours.
  - Sends automated email reminders via the SendGrid API.
  - Implements an automatic reminder feature using JavaScript in a Google Sheets App Script.
- How we built it:
  - Developed with Node.js for server-side logic.
  - Built with HTML, CSS, and JavaScript for a responsive frontend.
- Technologies used:
Pantry Tracking Website
- Description:
  - A web-based dashboard for tracking pantry inventory in real-time.
- How it works:
  - Built with Python, allowing users to input, update, and visualize inventory data.
  - Efficiently manages distributed products and remaining stock.
- Technologies used:
Exploring the Impact of Stroke, Heart Disease, and Diabetes on Mobility Challenges
- Project Overview:
  - Uses the BRFSS 2015 dataset to analyze and predict heart disease indicators.
  - Leverages health metrics such as BMI, smoking habits, physical activity, and healthcare access.
- Inspiration:
  - Motivated by the alarming prevalence of heart disease and the need for early intervention.
- Dataset Overview:
  - Based on the Heart Disease Health Indicators from the 2015 BRFSS survey.
  - Consists of 22 columns covering health metrics, demographics, and lifestyle factors.
- Analysis Details:
  - Objectives: Identify key predictors of heart disease, build and evaluate predictive models, and provide actionable insights.
  - Methods: Exploratory Data Analysis, Feature Engineering, and Model Building using algorithms like Logistic Regression and Random Forest.
  - Results & Learnings: Highlighted significant predictors (e.g., HighBP, HighChol) and gained insights into lifestyle impacts on heart disease risk.
- Technologies used:
Analysis-of-Amazon-Sales-Trend
- Project Overview:
  - Conducts an in-depth analysis of customer behavior using the Amazon Sales Dataset.
  - Focuses on product review categories, review lengths, and their impact on product engagement.
- Key Questions Explored:
  - What information does the dataset provide?
  - Which products are top-rated based on the number of ratings?
  - Is there a correlation between ratings count and average product rating?
  - Which products have the most discounted prices, and how do discounts relate to review counts?
  - What are the top products by click-through rates and by category?
  - How do review characteristics (e.g., length) correlate with product ratings?
- Technologies used:
Marvel-Universe-Explorer
- Project Overview:
  - A fun, interactive web project that dives into the Marvel Universe.
  - Retrieves data from the Marvel API to showcase characters, comics, and creators.
- Inspiration:
  - Sparked by a childhood fascination with Marvel heroes and their incredible stories.
- Features:
  - Home: Introductory section guiding users through the site.
  - Marvel Characters Gallery: Displays characters with images and descriptions.
  - Marvel Comics Gallery: Lists comics with cover images, titles, and issue numbers.
- Technologies used:
MealBuddy
- Overview:
  - An AI-powered meal planning app designed to help users track ingredients, generate personalized meal suggestions, and monitor nutritional intake.
- Features (In Progress):
  - Ingredient Tracking: Search, add, edit, and delete ingredients with nutritional breakdown (calories, protein, fats, water, sugar).
  - AI-Powered Chatbot: Provides recipe suggestions using Google Gemini AI based on user-provided ingredients, with integrated YouTube video links for cooking instructions.
  - Dynamic Dashboards: Displays nutritional summaries with circular trackers and pie charts for calories, water, protein, carbs, and fats.
  - Profile Management: Manage user data (name, age, gender, height, weight) with authentication through Firebase and Google Sign-In, including password reset and logout functionality.
  - Searchable Fridge Inventory: Filter and manage stored ingredients with real-time updates using Firestore snapshot listeners.
  - Themed UI: Automatically adapts to system light/dark themes using a custom color scheme.
- Tech Stack:
  - Frontend: React Native, HTML, CSS, JavaScript, TypeScript.
  - Backend: Firebase for authentication and database management, Google Gemini API for AI integration.
  - AI Integration: Google Gemini API for personalized meal recommendations and YouTube Data API for video retrieval.
- Technologies used:

💻 Skills

📊 GitHub Stats:

📫 Connect with Me:

GitHub: calvinhoang203
LinkedIn: Hieu Hoang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hieu Hoang (Calvin) calvinhoang203

Achievements

Achievements

Block or report calvinhoang203

👋 Hi! I'm Hieu (Calvin) Hoang.

🛠 Projects:

💻 Skills

📊 GitHub Stats:

📫 Connect with Me:

Pinned Loading

Uh oh!