Inspiration

The idea for this project stemmed from growing concerns around public safety and the need for actionable insights into crime trends across India. We aimed to go beyond static crime figures and instead build a dynamic, interactive dashboard backed by analytical models to reveal hidden patterns in historical crime data from 2001 to 2014.

What it does

EDA_Patrol is an interactive crime analysis tool designed to uncover trends, patterns, and risks in India's district-wise crime data from 2001 to 2014. The platform provides the following capabilities:

Enables users to explore and filter crime data by year, state, and district through a live interactive dashboard.

Identifies the top high-crime states and districts based on total IPC crime counts.

Visualizes crime hotspots across India using a geospatial choropleth map.

Applies clustering algorithms to group districts by similar crime patterns.

Calculates a Crime Risk Index to rank districts based on overall threat levels.

Classifies districts as high-crime or low-crime using supervised machine learning models.

Forecasts future crime trends using polynomial regression and time-series analysis.

Analyzes how crime patterns differ between urban and rural regions, and investigates correlations between major crime categories such as murder and theft.

How we built it

Data Preprocessing: Cleaned and normalized district-level crime data from official sources.

Exploratory Data Analysis (EDA): Used pandas, matplotlib, and seaborn to uncover trends.

ML Modeling: Used KMeans for clustering, Logistic Regression for classification, and Polynomial Regression/ARIMA for forecasting.

Dashboard: Built using Streamlit and hosted via ngrok for live interaction.

Geospatial Mapping: Integrated GeoJSON boundaries with geopandas for India map overlays.

Challenges we ran into

Data Cleaning: The dataset included redundant rows (like 'TOTAL') and inconsistent district/state names requiring extensive normalization.

GeoJSON Matching: Aligning crime data with geospatial boundaries was challenging due to spelling and naming mismatches.

Model Generalization: Forecasting crime accurately is complex due to limited feature diversity (e.g., no monthly granularity).

Deployment: Hosting an interactive dashboard securely using ngrok had constraints (e.g., tunnel limits and auth tokens).

Accomplishments that we're proud of

Developed an end-to-end crime analysis platform that combines statistical rigor, machine learning, and interactive visualizations.

Built a Streamlit dashboard that enables real-time exploration of crime data by year, state, and district.

Created a geospatial choropleth map that visualizes crime hotspots across India using GeoPandas and matplotlib.

Applied clustering algorithms (K-Means) to group districts by crime patterns and built a Crime Risk Index to rank regions by severity.

Used regression techniques (Linear, Polynomial, and ARIMA) to forecast future IPC crime trends with strong predictive accuracy.

Successfully classified high-crime vs. low-crime districts using supervised machine learning with over 94% accuracy.

Discovered impactful correlations such as higher crime intensity in urban districts and a positive link between murder and theft.

Maintained data transparency and interpretability throughout the process to ensure insights are actionable and understandable.

What we learned

Through this project, we learned to:

Handle large-scale real-world datasets with missing values and inconsistencies.

Apply both supervised and unsupervised machine learning models effectively.

Utilize geospatial data (GeoJSON) and visualize it with libraries like geopandas and matplotlib.

Build scalable and user-friendly dashboards with Streamlit and deploy them using ngrok.

What's next for EDA_Patrol

Add monthly and seasonal crime data to allow more granular temporal analysis and understand seasonal crime surges.

Drill down to street-level crime mapping in select metropolitan districts using local law enforcement datasets.

Incorporate demographic and socioeconomic data (e.g., literacy, income, employment) to uncover deeper causal factors.

Enhance models with deep learning approaches for anomaly detection and crime prediction at a finer level.

Deploy the dashboard publicly on platforms like Streamlit Cloud or AWS for broader accessibility and civic engagement.

Build a live crime data integration pipeline from government APIs (if available) to update the dashboard in near real-time.

Collaborate with policy researchers and local authorities to translate data-driven insights into community safety strategies.

Built With

Share this project:

Updates