I’m a data science enthusiast with a strong foundation in machine learning and AI and a passion for leveraging data to solve real-world problems.
- 🎓 Recent Graduate from UC Davis with a B.S. in Statistical Data Science with a Minor in Technology Management.
- 📊 Data Science Coordinator @ ASUCD Pantry, optimizing food inventory with predictive modeling.
- 🌍 Youth Advisory Council Member @ JFF, working to enhance career navigation tools for young adults.
- 💡 Currently learning about AI Agents and Generative AI to explore their potential in automation and decision-making.
- 🔍 Interested in machine learning, data visualization, and applied AI in healthcare, business, and technology.
🌟 Always open to connecting and collaborating—feel free to reach out! 🚀
-
NYC Restaurant Health Inspection Prediction
- Inspiration:
- Analyze 18 years of NYC restaurant health inspection data to identify what factors actually drive health code compliance and predict restaurant grades using machine learning.
- Understand whether location, cuisine type, timing, or violations themselves are the strongest predictors of health inspection outcomes.
- What it does:
- Analyzes 295,831+ inspection records spanning 2007-2026 from 30,627 unique restaurants across all 5 NYC boroughs.
- Predicts restaurant health grades (A, B, or C) using a Random Forest classifier with 93.8% overall accuracy.
- Features an interactive Streamlit dashboard with real-time data visualization, filtering capabilities, and grade prediction interface.
- Automatically refreshes data daily from NYC Open Data API to ensure the dashboard always displays the latest inspection results.
- Provides comprehensive analytics including grade distributions, violation patterns, borough comparisons, and cuisine type analysis.
- Includes a prediction tool where users can input inspection details (violations, cuisine type, borough, date) to predict potential health grades.
- How we built it:
- Processed and cleaned 295K+ raw inspection records using Pandas, aggregating multiple violation records into single inspections (reduced to 51,839 unique inspections).
- Engineered 6 key features: total violations, critical violations, cuisine type, borough, month, and day of week.
- Trained multiple models (Logistic Regression baseline, Random Forest) using scikit-learn, handling severe class imbalance (87% A grades).
- Built comprehensive Streamlit dashboard with Plotly visualizations including pie charts, histograms, time series, scatter plots, and heatmaps.
- Implemented automatic data refresh functionality that checks for updates daily and processes raw data into clean format.
- Deployed interactive web application with filtering by borough, cuisine type, and date range, plus real-time model predictions.
- Key Outcomes:
- Achieved 93.8% accuracy with Random Forest model, with 99% recall for A-grade restaurants.
- Identified that critical violations (44%) and total violations (31%) account for 75% of model predictions, revealing that food safety violations are the primary driver of health grades.
- Discovered that location, timing, and cuisine type have minimal impact (combined <20%) compared to actual violations.
- Successfully processed and visualized 18 years of inspection data with interactive dashboards accessible to non-technical users.
- Built production-ready application with error handling, data validation, and automatic data refresh capabilities.
- Technologies used:
- Inspiration:
-
Oprina: Conversational AI Avatar Assistant
- Inspiration:
- Create a voice assistant with a lifelike avatar that handles email and calendar through natural conversation, designed for hands-free productivity with a smooth, reliable experience.
- What it does:
- Provides a conversational AI avatar assistant that manages email and calendar tasks through voice commands.
- Features a lifelike avatar interface for natural human-computer interaction.
- Handles productivity tasks seamlessly without requiring manual input.
- How we built it:
- Developed using React and TypeScript for the frontend interface.
- Implemented FastAPI with Python for the backend services.
- Integrated Google ADK, Gmail API, and Google Calendar API for email and calendar management.
- Utilized Vertex AI and Gemini 2.0 Flash for advanced AI capabilities.
- Incorporated HeyGen API for avatar generation and animation.
- Used Supabase for database management and Google Cloud for deployment.
- Key Features:
- Voice-activated email and calendar management
- Lifelike AI avatar with natural conversation capabilities
- Hands-free productivity workflow
- Integration with Google services (Gmail, Calendar)
- Real-time AI processing with Gemini 2.0 Flash
- Technologies used:
- Inspiration:
-
EpiAccess: Epidemic Forecasting & Healthcare Analysis Dashboard
- Inspiration:
- Develop a comprehensive tool for analyzing infectious disease trends and understanding global healthcare access patterns to support educational research and emergency preparedness planning.
- What it does:
- Analyzes 63,115 real epidemic records from COVID-19, SARS, and Monkeypox outbreaks across 222 countries.
- Generates 6-month epidemic forecasts using both traditional exponential smoothing and PyTorch neural networks with confidence intervals.
- Clusters 175 countries into 4 distinct healthcare access categories using K-means algorithm based on health expenditure patterns.
- Provides interactive disease mapping with choropleth and bubble visualizations.
- Generates AI-powered insights in plain English with confidence scoring and trend analysis.
- Features "what-if" scenario planning showing how historical outbreaks would unfold in 2025.
- How we built it:
- Built comprehensive data processing pipeline using Pandas to unify datasets from multiple sources (Kaggle, World Bank).
- Implemented dual forecasting system: exponential smoothing for transparency and PyTorch neural networks for complex pattern recognition.
- Developed healthcare access clustering using scikit-learn K-means with 3-year averaging (2020-2022) for pandemic stability.
- Created interactive Streamlit dashboard with three main components: Disease Trends, Disease Map, and Healthcare Access Analysis.
- Integrated Plotly for dynamic visualizations and implemented statistical analysis with correlation metrics and efficiency ratios.
- Key Outcomes:
- Successfully processed and analyzed 63,000+ epidemic records with real-time interactive visualizations.
- Achieved reliable trend analysis for educational purposes with clear confidence scoring and limitation transparency.
- Identified 4 distinct global healthcare access patterns: High Access-Advanced Economy, Medium-High Access-Developing, High Priority-Limited Resources, and Low Access-Resource Constrained.
- Technologies used:
- Inspiration:
-
BrainBoost: Academic Success Coach
- Inspiration:
- Provide students with a personalized, data-driven “coach” to track daily habits and predict academic performance.
- What it does:
- Allows users to input daily metrics—study hours, sleep hours, social activities, physical activity, extracurriculars, and screen time.
- Predicts letter-grade category and stress level using a Gradient Boosting model (0.9087 overall accuracy) trained on 2,000+ student records.
- Visualizes progress over time in an interactive Streamlit dashboard, showing habit history alongside predicted outcomes.
- Generates tailored recommendations in three categories—“Study Strategy,” “Wellness,” and “Balance”—based on predicted grade gaps and stress levels.
- Includes a simulation tab where users adjust habit sliders to see potential effects on predicted GPA and stress.
- Key Outcomes:
- Gradient Boosting model achieved 0.8175 letter-grade accuracy and 1.0000 stress-level accuracy, for an overall 0.9087 accuracy.
- Empowered students to identify habit changes likely to improve GPA trajectories and manage stress effectively.
- How we built it:
- Preprocessed the Student Lifestyle Dataset (2,000 records) in Pandas; engineered features such as Study–Sleep interaction, Social–Study ratio, Total Activity, Study Efficiency, and Life Balance.
- Trained multiple classifiers (Logistic Regression, Random Forest, XGBoost, Decision Tree, Gradient Boosting) in scikit-learn and saved the best-performing pipeline in
stacked_multioutput_predictor.pkl. - Developed a Streamlit app (
app.py) to load the model and aStandardScaler(scaler.pkl), capture user inputs, perform real-time feature engineering, and display predictions. - Built interactive tabs:
- Input Habits: Numeric inputs for six daily activities, interactive time-allocation progress bar, and “Critical” warnings for unrealistic inputs.
- Progress: Displays latest predicted grade, stress level, time-to-graduation estimate, plus a line chart and table of habit history.
- Recommendations: Provides personalized tips for improving study habits, wellness, and work-life balance, and includes interactive sliders so users can adjust daily habits and immediately see how those changes might impact their predicted GPA and stress levels.
- Technologies used:
- Inspiration:
-
- Inspiration:
- Help UC SHIP students avoid surprise medical bills by estimating healthcare costs up front.
- What it does:
- Provides a real-time cost estimation and claims automation system.
- Allows users to input plan details and claim data to calculate expected reimbursements.
- Automates rebate processing to expedite refunds.
- How we built it:
- Co-developed during HackDavis 2025 with SwiftUI on iOS.
- Integrated Python back-end logic and the Cerebras API for machine learning calculations.
- Leveraged OpenAI/Gemini and SQL to process and analyze insurance data in real time.
- Technologies used:
- Inspiration:
-
- Inspiration:
- Enhance volunteer management and communication at Aggie House.
- What it does:
- Provides an admin portal to monitor volunteer work hours.
- Sends automated email reminders via the SendGrid API.
- Implements an automatic reminder feature using JavaScript in a Google Sheets App Script.
- How we built it:
- Developed with Node.js for server-side logic.
- Built with HTML, CSS, and JavaScript for a responsive frontend.
- Technologies used:
- Inspiration:
-
Exploring the Impact of Stroke, Heart Disease, and Diabetes on Mobility Challenges
- Project Overview:
- Uses the BRFSS 2015 dataset to analyze and predict heart disease indicators.
- Leverages health metrics such as BMI, smoking habits, physical activity, and healthcare access.
- Inspiration:
- Motivated by the alarming prevalence of heart disease and the need for early intervention.
- Dataset Overview:
- Based on the Heart Disease Health Indicators from the 2015 BRFSS survey.
- Consists of 22 columns covering health metrics, demographics, and lifestyle factors.
- Analysis Details:
- Objectives: Identify key predictors of heart disease, build and evaluate predictive models, and provide actionable insights.
- Methods: Exploratory Data Analysis, Feature Engineering, and Model Building using algorithms like Logistic Regression and Random Forest.
- Results & Learnings: Highlighted significant predictors (e.g., HighBP, HighChol) and gained insights into lifestyle impacts on heart disease risk.
- Technologies used:
- Project Overview:
-
Analysis-of-Amazon-Sales-Trend
- Project Overview:
- Conducts an in-depth analysis of customer behavior using the Amazon Sales Dataset.
- Focuses on product review categories, review lengths, and their impact on product engagement.
- Key Questions Explored:
- What information does the dataset provide?
- Which products are top-rated based on the number of ratings?
- Is there a correlation between ratings count and average product rating?
- Which products have the most discounted prices, and how do discounts relate to review counts?
- What are the top products by click-through rates and by category?
- How do review characteristics (e.g., length) correlate with product ratings?
- Technologies used:
- Project Overview:
-
- Project Overview:
- A fun, interactive web project that dives into the Marvel Universe.
- Retrieves data from the Marvel API to showcase characters, comics, and creators.
- Inspiration:
- Sparked by a childhood fascination with Marvel heroes and their incredible stories.
- Features:
- Home: Introductory section guiding users through the site.
- Marvel Characters Gallery: Displays characters with images and descriptions.
- Marvel Comics Gallery: Lists comics with cover images, titles, and issue numbers.
- Technologies used:
- Project Overview:
-
- Overview:
- An AI-powered meal planning app designed to help users track ingredients, generate personalized meal suggestions, and monitor nutritional intake.
- Features (In Progress):
- Ingredient Tracking: Search, add, edit, and delete ingredients with nutritional breakdown (calories, protein, fats, water, sugar).
- AI-Powered Chatbot: Provides recipe suggestions using Google Gemini AI based on user-provided ingredients, with integrated YouTube video links for cooking instructions.
- Dynamic Dashboards: Displays nutritional summaries with circular trackers and pie charts for calories, water, protein, carbs, and fats.
- Profile Management: Manage user data (name, age, gender, height, weight) with authentication through Firebase and Google Sign-In, including password reset and logout functionality.
- Searchable Fridge Inventory: Filter and manage stored ingredients with real-time updates using Firestore snapshot listeners.
- Themed UI: Automatically adapts to system light/dark themes using a custom color scheme.
- Tech Stack:
- Frontend: React Native, HTML, CSS, JavaScript, TypeScript.
- Backend: Firebase for authentication and database management, Google Gemini API for AI integration.
- AI Integration: Google Gemini API for personalized meal recommendations and YouTube Data API for video retrieval.
- Technologies used:
- Overview:
- GitHub: calvinhoang203
- LinkedIn: Hieu Hoang


