Inspiration

Santa Barbara County is in a housing deadlock. While policy discussions often focus on "new units," we noticed a gap between housing projects being approved and people actually moving in. We wanted a good way to visualize the data and build a diagnostic tool that identifies exactly where the pipeline is leaking—turning messy Annual Progress Reports into a clear roadmap for housing equity. It also provides an AI-driven, data-backed, tool for current and new home renters/buyers to find the best places to rent/buy housing in Santa Barbara County.

What it does

  • Interactive Chloropleth maps: A React-based choropleth map of 28 SBC jurisdictions (including CDPs like UCSB and Isla Vista) visualizing Housing Friction (how easy it is to build new residences), Income Mismatch (the disparity in high income and low income housing), Cost Burden (the percentage of an area burdened by the cost of living), and Availability (the owning and renting availability of housing in certain areas).
  • Predictive Modeling: A Random Forest model that forecasts rent overburden based on income-stratified housing production.
  • AI Advisor: A RAG-powered Llama-3.3 assistant that matches workers to viable communities based on real-time workforce and affordability data.

How we built it

  • Data Engineering: We used Node.js scripts to download and filter US Census TIGER/Line shapefiles and R and Python to clean 9,000+ rows of HCD Annual Progress Report data.
  • Analysis & Modeling: We utilized R Python to train six different regression models, ultimately selecting a Random Forest engine for its superior RMSE performance, which we exported via ONNX for production.
  • Frontend: A high-performance React/Vite dashboard using react-simple-maps for SVG-based geographic rendering and D3 for color scaling.
  • Backend: A FastAPI server that bridges the ONNX model and a Groq-powered RAG pipeline for localized AI recommendations.

Challenges we ran into

The biggest hurdle was "data lag." Because completions (COs) can arrive for permits issued years prior, our friction formula initially produced artifacts (like Goleta having "negative" friction). We had to implement strict statistical guards—excluding low-volume jurisdictions and cross-year artifacts—to ensure our metrics remained responsible and accurate for policy use. Due to time restrictions, we couldn't fully utilize our dataset and API for the model to train for as long as possible. The time period we obtained the dataset from had a big anomaly, which was the pandemic. Because of that, the model had to train with various factors such as the Federal Reserve's efforts to stabilize the economy with mortgage as an example.

Accomplishments that we're proud of

  • The Friction Metric: Creating a mathematically sound way to quantify "red tape" and structural barriers across the county.
  • 30-Jurisdiction Coverage: Successfully mapping and analyzing not just major cities, but critical CDPs like UCSB and Vandenberg SFB.
  • End-to-End Pipeline: Seamlessly moving dataset APIs from a complex R and Python statistical analysis to a live, web-based ONNX inference engine.

What we learned

We learned that housing data is incredibly non-linear. A simple "more houses = lower rent" model doesn't capture the reality of Santa Barbara; you have to account for income-stratified production. We also gained deep experience in geographic data normalization and the Model Context Protocol for connecting AI to live datasets.

What's next for SBC Housing Explorer

We want to move from cumulative data to time-series cohort tracking, allowing users to see how friction rates change year-over-year. We also hope to integrate live transit data to calculate a "True Cost of Living" metric that combines rent burden with commuting expenses.

Built With

Share this project:

Updates