Inspiration
Environmental health risk is not distributed equally. In Southern California, communities in LA and Orange County face vastly different exposures to air pollution, toxic releases, and chronic disease — often split street by street across ZIP codes. Yet this data is buried in dense government spreadsheets that most residents never see. We built Healthy Home Audit to make that invisible risk visible: an interactive tool that lets anyone type in their ZIP code and immediately understand the environmental health burden of where they live.
What it does
Healthy Home Audit is an interactive environmental health risk map for the Los Angeles and Orange County metro area. Users can:
- Search any ZIP code to get a full environmental health profile — CES 4.0 score, asthma rate, cardiovascular disease risk, toxic air releases, PM2.5, traffic density, low birth weight rate, poverty and education percentiles, and demographic breakdown
- Explore an interactive map with color-coded markers for 700+ ZIP codes across Southern California, ranging from low to extreme environmental burden
- Switch to a heatmap view that visualizes six health indicators as a continuous risk surface across the region — four powered by our own trained ML models, two from raw CalEnviroScreen data
- Discover the most vulnerable communities with the Top 10 Most Vulnerable feature, ranking LA and OC ZIP codes by a composite Health Vulnerability Index (HVI) that weighs health outcomes, pollution burden, and social determinants
- For ZIP codes outside our primary dataset, our backend XGBoost models predict asthma and cardiovascular risk in real time using environmental features from the nearest tracked ZIP, so no area is left as an unknown.
How we built it
Frontend: React + Vite, react-leaflet for the interactive map, CSS Modules for styling. The map renders custom SVG markers for 700+ ZIPs, color-coded by CES score, with a heatmap layer using concentric circle overlays that blend into a continuous risk surface.
Backend: FastAPI serving a suite of trained models:
- XGBoost (Asthma) — trained on CalEnviroScreen 4.0 environmental features to predict emergency department asthma visit rates
- XGBoost (Cardiovascular) — trained on the same feature set to predict cardiovascular disease rates Health Vulnerability Index — a composite score combining health outcomes, pollution burden, and social determinants (poverty, education, linguistic isolation, housing burden, unemployment), normalized and weighted across all census tracts statewide
Data: CalEnviroScreen 4.0 (OEHHA, 2021) — 8,000+ California census tracts aggregated to ZIP code level with population-weighted averaging. Melissa ZIP geocode database for coordinates across all 700+ Southern California ZIP codes. For ZIPs outside the CES dataset, predictions are generated via haversine nearest-neighbor lookup and model inference.
Challenges we ran into
Getting the ML backend and React frontend to communicate cleanly required solving CORS, data type coercion issues (CalEnviroScreen stores some values as ' NA ' strings), and column name whitespace in the source CSV. Rendering 700+ map markers performantly while keeping the heatmap smooth required rethinking our approach — we abandoned the leaflet.heat plugin entirely and rebuilt the heatmap with declarative React Circle components. Aggregating census tract data to ZIP code level required careful population-weighted averaging to avoid misrepresenting densely vs. sparsely populated tracts.
Accomplishments that we're proud of
- A fully functional real-time risk prediction pipeline for any ZIP code in Southern California
- A heatmap that covers the entire region including areas with no direct CES data, estimated via our trained models
- The Health Vulnerability Index composite score, which captures cumulative disadvantage beyond just pollution metrics
- A clean, accessible UI that surfaces complex epidemiological data without requiring any technical background to understand
What we learned
Environmental health data is messy, inconsistent, and geographically misaligned — ZIP codes don't map cleanly to census tracts, and raw government datasets are rarely analysis-ready. We learned how to navigate those gaps with population-weighted aggregation and nearest-neighbor estimation. We also learned that the most important design challenge wasn't the ML — it was making the data legible and actionable for a general audience.
What's next for Healthy Home Audit
- Expand beyond LA/OC to all of California
- Add time-series views to show how risk has changed across CES versions (2.0 → 4.0)
- Integrate real-time AQI data to layer live air quality on top of historical risk
- Add a "compare two ZIP codes" feature for side-by-side analysis
- Mobile-responsive layout for on-the-go neighborhood research
Built With
- fastapi
- matplotlib
- numpy
- pandas
- python
- react
- scikit-learn
- xgboost
Log in or sign up for Devpost to join the conversation.