Inspiration

Where you live shapes who you are, but what if it worked the other way around? We wanted to explore whether a person's lifestyle, values, and habits could predict where they'd naturally fit within a community. Rancho Santa Margarita, with its distinct neighborhoods and rich Melissa dataset, gave us the perfect testing ground to ask: can data find your people for you?

What It Does

Community Matcher takes a user's personal, financial, and behavioral profile and predicts the residential community within Rancho Santa Margarita, CA where they'd best fit in. By training on lifestyle predictors spanning sociodemographics, hobbies, values, and financial habits, the model outputs a predicted latitude and longitude that maps to a real neighborhood. Think of it as a "Lifestyle Fingerprint" that places you on a map.

How We Built It

We worked with a residential dataset from Melissa, masking geographic identifiers to prevent data leakage and force the model to learn purely from lifestyle signals. Our feature set covered five broad categories: sociodemographics, financial profile, values and beliefs, interests and hobbies, and outdoor and fitness behavior. We experimented with three models using Python, scikit-learn, pandas, and NumPy: Linear Regression as a baseline, Ridge Regression to handle multicollinearity across our many binary features, and a Random Forest Regressor for capturing non-linear relationships. We also built a PyTorch neural network to explore whether a deeper architecture could squeeze more signal out of the data.

Challenges We Ran Into

Predicting geography from lifestyle alone is a hard problem. Neighborhoods don't sort themselves neatly by hobby or net worth, and the overlap between communities meant our target variable was inherently noisy. The dataset's many binary features also created sparsity challenges, and preventing geographic leakage while still giving the model enough signal to learn from required careful feature engineering. Getting the neural network to converge meaningfully on such a structured, tabular dataset took significant tuning as well.

Accomplishments That We're Proud Of

Our Random Forest Regressor outperformed both regression baselines and achieved an R² of 0.092, which, given the subjective and non-deterministic nature of where people choose to live, we consider a meaningful result. We were also excited to see the model surface interpretable top features: AutoWork, HuntingShooting, and HomeImprovementDIY were the strongest predictors of neighborhood placement. These aren't arbitrary, they paint a coherent picture of the hands-on, outdoors-oriented communities within RSM.

What We Learned

Lifestyle data carries real geographic signal, even when you strip away all the obvious location identifiers. We also learned that model interpretability matters as much as accuracy here: knowing which features drive predictions is what makes this tool actionable for real-world use cases. On the technical side, we gained hands-on experience comparing classical ML approaches against a neural network on tabular data, and saw firsthand why tree-based models often hold their own against deep learning in structured settings.

What's Next for Community Matcher

The RSM prototype is just the proof of concept. The broader vision is an AI tool that works at any geographic scale, from matching students to college campuses based on academic and social profiles, to helping door-to-door sales teams identify neighborhoods where their ideal customer's Lifestyle Fingerprint clusters. Next steps include expanding the dataset to multiple cities, building an interactive front-end where users input their profile and see their predicted community on a map, and refining the model with richer feature sets and better-calibrated neural network architectures.

Built With

Share this project:

Updates