Skip to content
View calvinhoang203's full-sized avatar

Block or report calvinhoang203

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
calvinhoang203/README.md

👋 Hi! I'm Hieu (Calvin) Hoang.

I’m a data science enthusiast with a strong foundation in machine learning and AI and a passion for leveraging data to solve real-world problems.

  • 🎓 Recent Graduate from UC Davis with a B.S. in Statistical Data Science with a Minor in Technology Management.
  • 📊 Data Science Coordinator @ ASUCD Pantry, optimizing food inventory with predictive modeling.
  • 🌍 Youth Advisory Council Member @ JFF, working to enhance career navigation tools for young adults.
  • 💡 Currently learning about AI Agents and Generative AI to explore their potential in automation and decision-making.
  • 🔍 Interested in machine learning, data visualization, and applied AI in healthcare, business, and technology.

🌟 Always open to connecting and collaborating—feel free to reach out! 🚀


🛠 Projects:

  • NYC Restaurant Health Inspection Prediction

    • Inspiration:
      • Analyze 18 years of NYC restaurant health inspection data to identify what factors actually drive health code compliance and predict restaurant grades using machine learning.
      • Understand whether location, cuisine type, timing, or violations themselves are the strongest predictors of health inspection outcomes.
    • What it does:
      • Analyzes 295,831+ inspection records spanning 2007-2026 from 30,627 unique restaurants across all 5 NYC boroughs.
      • Predicts restaurant health grades (A, B, or C) using a Random Forest classifier with 93.8% overall accuracy.
      • Features an interactive Streamlit dashboard with real-time data visualization, filtering capabilities, and grade prediction interface.
      • Automatically refreshes data daily from NYC Open Data API to ensure the dashboard always displays the latest inspection results.
      • Provides comprehensive analytics including grade distributions, violation patterns, borough comparisons, and cuisine type analysis.
      • Includes a prediction tool where users can input inspection details (violations, cuisine type, borough, date) to predict potential health grades.
    • How we built it:
      • Processed and cleaned 295K+ raw inspection records using Pandas, aggregating multiple violation records into single inspections (reduced to 51,839 unique inspections).
      • Engineered 6 key features: total violations, critical violations, cuisine type, borough, month, and day of week.
      • Trained multiple models (Logistic Regression baseline, Random Forest) using scikit-learn, handling severe class imbalance (87% A grades).
      • Built comprehensive Streamlit dashboard with Plotly visualizations including pie charts, histograms, time series, scatter plots, and heatmaps.
      • Implemented automatic data refresh functionality that checks for updates daily and processes raw data into clean format.
      • Deployed interactive web application with filtering by borough, cuisine type, and date range, plus real-time model predictions.
    • Key Outcomes:
      • Achieved 93.8% accuracy with Random Forest model, with 99% recall for A-grade restaurants.
      • Identified that critical violations (44%) and total violations (31%) account for 75% of model predictions, revealing that food safety violations are the primary driver of health grades.
      • Discovered that location, timing, and cuisine type have minimal impact (combined <20%) compared to actual violations.
      • Successfully processed and visualized 18 years of inspection data with interactive dashboards accessible to non-technical users.
      • Built production-ready application with error handling, data validation, and automatic data refresh capabilities.
    • Technologies used:

  • Oprina: Conversational AI Avatar Assistant

    • Inspiration:
      • Create a voice assistant with a lifelike avatar that handles email and calendar through natural conversation, designed for hands-free productivity with a smooth, reliable experience.
    • What it does:
      • Provides a conversational AI avatar assistant that manages email and calendar tasks through voice commands.
      • Features a lifelike avatar interface for natural human-computer interaction.
      • Handles productivity tasks seamlessly without requiring manual input.
    • How we built it:
      • Developed using React and TypeScript for the frontend interface.
      • Implemented FastAPI with Python for the backend services.
      • Integrated Google ADK, Gmail API, and Google Calendar API for email and calendar management.
      • Utilized Vertex AI and Gemini 2.0 Flash for advanced AI capabilities.
      • Incorporated HeyGen API for avatar generation and animation.
      • Used Supabase for database management and Google Cloud for deployment.
    • Key Features:
      • Voice-activated email and calendar management
      • Lifelike AI avatar with natural conversation capabilities
      • Hands-free productivity workflow
      • Integration with Google services (Gmail, Calendar)
      • Real-time AI processing with Gemini 2.0 Flash
    • Technologies used:

  • EpiAccess: Epidemic Forecasting & Healthcare Analysis Dashboard

    • Inspiration:
      • Develop a comprehensive tool for analyzing infectious disease trends and understanding global healthcare access patterns to support educational research and emergency preparedness planning.
    • What it does:
      • Analyzes 63,115 real epidemic records from COVID-19, SARS, and Monkeypox outbreaks across 222 countries.
      • Generates 6-month epidemic forecasts using both traditional exponential smoothing and PyTorch neural networks with confidence intervals.
      • Clusters 175 countries into 4 distinct healthcare access categories using K-means algorithm based on health expenditure patterns.
      • Provides interactive disease mapping with choropleth and bubble visualizations.
      • Generates AI-powered insights in plain English with confidence scoring and trend analysis.
      • Features "what-if" scenario planning showing how historical outbreaks would unfold in 2025.
    • How we built it:
      • Built comprehensive data processing pipeline using Pandas to unify datasets from multiple sources (Kaggle, World Bank).
      • Implemented dual forecasting system: exponential smoothing for transparency and PyTorch neural networks for complex pattern recognition.
      • Developed healthcare access clustering using scikit-learn K-means with 3-year averaging (2020-2022) for pandemic stability.
      • Created interactive Streamlit dashboard with three main components: Disease Trends, Disease Map, and Healthcare Access Analysis.
      • Integrated Plotly for dynamic visualizations and implemented statistical analysis with correlation metrics and efficiency ratios.
    • Key Outcomes:
      • Successfully processed and analyzed 63,000+ epidemic records with real-time interactive visualizations.
      • Achieved reliable trend analysis for educational purposes with clear confidence scoring and limitation transparency.
      • Identified 4 distinct global healthcare access patterns: High Access-Advanced Economy, Medium-High Access-Developing, High Priority-Limited Resources, and Low Access-Resource Constrained.
    • Technologies used:

  • BrainBoost: Academic Success Coach

    • Inspiration:
      • Provide students with a personalized, data-driven “coach” to track daily habits and predict academic performance.
    • What it does:
      • Allows users to input daily metrics—study hours, sleep hours, social activities, physical activity, extracurriculars, and screen time.
      • Predicts letter-grade category and stress level using a Gradient Boosting model (0.9087 overall accuracy) trained on 2,000+ student records.
      • Visualizes progress over time in an interactive Streamlit dashboard, showing habit history alongside predicted outcomes.
      • Generates tailored recommendations in three categories—“Study Strategy,” “Wellness,” and “Balance”—based on predicted grade gaps and stress levels.
      • Includes a simulation tab where users adjust habit sliders to see potential effects on predicted GPA and stress.
    • Key Outcomes:
      • Gradient Boosting model achieved 0.8175 letter-grade accuracy and 1.0000 stress-level accuracy, for an overall 0.9087 accuracy.
      • Empowered students to identify habit changes likely to improve GPA trajectories and manage stress effectively.
    • How we built it:
      • Preprocessed the Student Lifestyle Dataset (2,000 records) in Pandas; engineered features such as Study–Sleep interaction, Social–Study ratio, Total Activity, Study Efficiency, and Life Balance.
      • Trained multiple classifiers (Logistic Regression, Random Forest, XGBoost, Decision Tree, Gradient Boosting) in scikit-learn and saved the best-performing pipeline in stacked_multioutput_predictor.pkl.
      • Developed a Streamlit app (app.py) to load the model and a StandardScaler (scaler.pkl), capture user inputs, perform real-time feature engineering, and display predictions.
      • Built interactive tabs:
        1. Input Habits: Numeric inputs for six daily activities, interactive time-allocation progress bar, and “Critical” warnings for unrealistic inputs.
        2. Progress: Displays latest predicted grade, stress level, time-to-graduation estimate, plus a line chart and table of habit history.
        3. Recommendations: Provides personalized tips for improving study habits, wellness, and work-life balance, and includes interactive sliders so users can adjust daily habits and immediately see how those changes might impact their predicted GPA and stress levels.
    • Technologies used:

  • SHIPSmart

    • Inspiration:
      • Help UC SHIP students avoid surprise medical bills by estimating healthcare costs up front.
    • What it does:
      • Provides a real-time cost estimation and claims automation system.
      • Allows users to input plan details and claim data to calculate expected reimbursements.
      • Automates rebate processing to expedite refunds.
    • How we built it:
      • Co-developed during HackDavis 2025 with SwiftUI on iOS.
      • Integrated Python back-end logic and the Cerebras API for machine learning calculations.
      • Leveraged OpenAI/Gemini and SQL to process and analyze insurance data in real time.
    • Technologies used:

  • Aggie Reminder

    • Inspiration:
      • Enhance volunteer management and communication at Aggie House.
    • What it does:
      • Provides an admin portal to monitor volunteer work hours.
      • Sends automated email reminders via the SendGrid API.
      • Implements an automatic reminder feature using JavaScript in a Google Sheets App Script.
    • How we built it:
      • Developed with Node.js for server-side logic.
      • Built with HTML, CSS, and JavaScript for a responsive frontend.
    • Technologies used:

  • Pantry Tracking Website

    • Description:
      • A web-based dashboard for tracking pantry inventory in real-time.
    • How it works:
      • Built with Python, allowing users to input, update, and visualize inventory data.
      • Efficiently manages distributed products and remaining stock.
    • Technologies used:

  • Exploring the Impact of Stroke, Heart Disease, and Diabetes on Mobility Challenges

    • Project Overview:
      • Uses the BRFSS 2015 dataset to analyze and predict heart disease indicators.
      • Leverages health metrics such as BMI, smoking habits, physical activity, and healthcare access.
    • Inspiration:
      • Motivated by the alarming prevalence of heart disease and the need for early intervention.
    • Dataset Overview:
      • Based on the Heart Disease Health Indicators from the 2015 BRFSS survey.
      • Consists of 22 columns covering health metrics, demographics, and lifestyle factors.
    • Analysis Details:
      • Objectives: Identify key predictors of heart disease, build and evaluate predictive models, and provide actionable insights.
      • Methods: Exploratory Data Analysis, Feature Engineering, and Model Building using algorithms like Logistic Regression and Random Forest.
      • Results & Learnings: Highlighted significant predictors (e.g., HighBP, HighChol) and gained insights into lifestyle impacts on heart disease risk.
    • Technologies used:

  • Analysis-of-Amazon-Sales-Trend

    • Project Overview:
      • Conducts an in-depth analysis of customer behavior using the Amazon Sales Dataset.
      • Focuses on product review categories, review lengths, and their impact on product engagement.
    • Key Questions Explored:
      • What information does the dataset provide?
      • Which products are top-rated based on the number of ratings?
      • Is there a correlation between ratings count and average product rating?
      • Which products have the most discounted prices, and how do discounts relate to review counts?
      • What are the top products by click-through rates and by category?
      • How do review characteristics (e.g., length) correlate with product ratings?
    • Technologies used:

  • Marvel-Universe-Explorer

    • Project Overview:
      • A fun, interactive web project that dives into the Marvel Universe.
      • Retrieves data from the Marvel API to showcase characters, comics, and creators.
    • Inspiration:
      • Sparked by a childhood fascination with Marvel heroes and their incredible stories.
    • Features:
      • Home: Introductory section guiding users through the site.
      • Marvel Characters Gallery: Displays characters with images and descriptions.
      • Marvel Comics Gallery: Lists comics with cover images, titles, and issue numbers.
    • Technologies used:

  • MealBuddy

    • Overview:
      • An AI-powered meal planning app designed to help users track ingredients, generate personalized meal suggestions, and monitor nutritional intake.
    • Features (In Progress):
      • Ingredient Tracking: Search, add, edit, and delete ingredients with nutritional breakdown (calories, protein, fats, water, sugar).
      • AI-Powered Chatbot: Provides recipe suggestions using Google Gemini AI based on user-provided ingredients, with integrated YouTube video links for cooking instructions.
      • Dynamic Dashboards: Displays nutritional summaries with circular trackers and pie charts for calories, water, protein, carbs, and fats.
      • Profile Management: Manage user data (name, age, gender, height, weight) with authentication through Firebase and Google Sign-In, including password reset and logout functionality.
      • Searchable Fridge Inventory: Filter and manage stored ingredients with real-time updates using Firestore snapshot listeners.
      • Themed UI: Automatically adapts to system light/dark themes using a custom color scheme.
    • Tech Stack:
      • Frontend: React Native, HTML, CSS, JavaScript, TypeScript.
      • Backend: Firebase for authentication and database management, Google Gemini API for AI integration.
      • AI Integration: Google Gemini API for personalized meal recommendations and YouTube Data API for video retrieval.
    • Technologies used:


💻 Skills

Skills


📊 GitHub Stats:

Profile Views GitHub Followers GitHub Stars

Education Focus Location


📫 Connect with Me:

Pinned Loading

  1. Aggie-Reminder Aggie-Reminder Public

    JavaScript 1

  2. Exploring-the-Impact-of-Stroke-Heart-Disease-and-Diabetes-on-Mobility-Challenges Exploring-the-Impact-of-Stroke-Heart-Disease-and-Diabetes-on-Mobility-Challenges Public

    Jupyter Notebook 1

  3. GDSC-UCD-Cohort-24-25/MealBuddy GDSC-UCD-Cohort-24-25/MealBuddy Public

    JavaScript 2 1

  4. lJulietl/Pantry-Tracking-Website lJulietl/Pantry-Tracking-Website Public

    Python

  5. Analysis-of-Amazon-Sales-Trend Analysis-of-Amazon-Sales-Trend Public

    Jupyter Notebook

  6. rohith4444/Oprina_AVAFG rohith4444/Oprina_AVAFG Public

    Oprina is an AI-powered conversational agent with real-time avatar streaming, featuring multimodal Gemini 2.0 Flash integration, Gmail/Calendar automation, voice controls, and enterprise-grade dep…

    Python 7 2