Inspiration Crime affects lives, communities, and cities — but what if we could predict it before it peaks? With two decades of district-wise crime data available, we were inspired to turn raw numbers into insights that could empower policy-makers, researchers, and law enforcement to act smarter, not just harder.
What it does EDA Patrol analyzes crime trends across India from 2001 to 2014 to: Reveal state- and district-level crime patterns Predict future crime trends in key cities Classify high-crime vs. low-crime districts using machine learning Cluster districts with similar crime behavior Generate a Crime Risk Index to identify vulnerable regions With a combination of visuals, models, and metrics, it helps transform crime data into a blueprint for action.
How we built it We used Python with: Pandas & NumPy for cleaning and aggregation Seaborn & Matplotlib for insightful visualizations Scikit-learn for clustering (K-Means), regression, and classification Custom logic to engineer features and proxy missing data like urban/rural classification We built modular notebooks and scripts to keep our workflow flexible and testable.
Challenges we ran into Crime data was messy — non-standardized district names, missing values, and inconsistent formatting There was no direct population or urban/rural flag, so we had to get creative with proxies Making models both accurate and interpretable required trade-offs between complexity and clarity Accomplishments that we're proud of Built a complete ML pipeline on real-world Indian crime data Created a risk scoring system to rank districts by severity Generated meaningful visualizations that are easy to understand — even for non-technical users Predicted future crime trends using actual historical data
What we learned How to handle and model imbalanced, real-world datasets The importance of domain knowledge in selecting features and interpreting results That even simple models can provide powerful insights when backed by clean data How storytelling through data makes insights truly impactful
What's next for EDA Patrol Integrate population & socio-economic data for crime-per-capita and deeper insights Upgrade time-series models to include seasonality and auto-regressive features Add geospatial visualizations using maps (GeoPandas/Folium) Launch an interactive dashboard for real-time filtering by year/state/district Open-source the tool and invite collaboration from data scientists & urban planners
Built With
- k-means
- linear-regression
- numpy
- pandas
- python
- random-forest-classifier
- scikit-learn
Log in or sign up for Devpost to join the conversation.