Authors: Farhan Sadeek, Jalen Francis, Jayson Clark
Institution: The Ohio State University
Competition: IM²C 2025 (Intercollegiate Math Modeling Challenge)
This repository contains a complete computational framework for:
- Objectively identifying Communities of Interest (COIs) using demographic, socioeconomic, and commuting data
- Quantifying community fragmentation using information-theoretic metrics
- Evaluating redistricting fairness through ensemble comparison against neutral maps
Applied to North Carolina's 2023 congressional map (SB 757), our analysis reveals systematic community splitting: 5 of 30 identified communities exhibit CSI scores > 0.91 (near-maximum fragmentation), with the enacted map splitting communities more severely than 89% of algorithmically-generated neutral alternatives.
| Metric | Value |
|---|---|
| Communities Identified | 30 COIs across 2,672 census tracts |
| Most Fragmented COI | Community 18 (CSI = 0.9999, population 123,786) |
| Severely Split (CSI > 0.90) | 5 communities |
| SB 757 Mean CSI | 0.5050 |
| Neutral Maps Mean CSI | 0.4351 (14% lower) |
| Percentile Rank | 89th (worse than 89% of neutral maps) |
.
├── README.md # This file
├── requirements.txt # Python dependencies
├── script.py # Main COI identification & evaluation
├── gerrychain_analysis.py # Ensemble comparison analysis
├── report.tex # Full LaTeX report source
├── data/ # Data files (downloaded by scripts)
├── results/ # Generated visualizations & results
│ ├── coi_map.png
│ ├── coi_splits_map.png
│ ├── ensemble_comparison.png
│ └── coi_results_complete.csv
└── .gitignore # Git ignore patterns
- Data Sources:
- 2020 Decennial Census (P.L. 94-171)
- American Community Survey 5-Year Estimates (2016-2020)
- LEHD Origin-Destination Employment Statistics (LODES)
- Features (9 dimensions):
- Demographics: Hispanic %, NH White %, NH Black %, NH Asian %
- Socioeconomics: Median income, homeownership rate, bachelor's degree %
- Commuting: Jobs in tract, workers in tract
- Algorithm: Spatial K'luster Analysis by Tree Edge Removal (SKATER)
- Ensures spatial contiguity
- Minimizes within-cluster variance
- k = 30 communities
- Formula: Pielou's Evenness Index adapted to redistricting
where p_i = proportion of COI population in district i
CSI = H / H_max = -Σ(p_i log₂ p_i) / log₂(N) - Interpretation:
- CSI = 0: Community intact (not split)
- CSI ≈ 1: Community maximally fragmented
- Algorithm: Recombination (ReCom) Markov chain
- Constraints: Population equality (±5%), district contiguity
- Sample Size: 100 neutral maps
- Result: SB 757 at 89th percentile of CSI distribution
- Python 3.10 or higher (required for GerryChain)
- ~2GB disk space for data files
-
Clone the repository:
git clone https://github.com/jalenfran/immc_2025.git cd immc_2025 -
Create virtual environment:
python3.12 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
python script.pyRuntime: 5-10 minutes
Outputs:
coi_map.png- Map of 30 identified communitiescoi_splits_map.png- COIs overlaid with SB 757 districtscoi_results_complete.csv- CSI scores for all 30 communities
python gerrychain_analysis.pyRuntime: 10-30 minutes (generates 10,000 MCMC steps)
Outputs:
ensemble_comparison.png- Histogram comparing SB 757 to neutral mapsensemble_csi_scores.csv- CSI scores for all 100 ensemble maps- Statistical significance results (percentile rank, z-score)
Note: Requires Python 3.10+ due to GerryChain type hint syntax.
All data is automatically downloaded by the scripts from public sources:
-
Geographic Boundaries:
-
Demographic Data:
-
Commuting Data:
- Geospatial:
geopandas,pyogrio,shapely - Data Access:
cenpy(Census API wrapper) - Spatial Analysis:
libpysal,spopt(SKATER implementation) - Ensemble Generation:
gerrychain(ReCom MCMC) - Visualization:
matplotlib
See requirements.txt for complete list with version constraints.
30 spatially-contiguous communities identified across North Carolina, ranging from 18,728 (rural) to 2,427,537 (metro) residents.
| COI ID | Population | CSI Score | Interpretation |
|---|---|---|---|
| 18 | 123,786 | 0.9999 | Near-maximally split |
| 8 | 65,015 | 0.9867 | Near-maximally split |
| 22 | 27,328 | 0.9721 | Severely split |
| 27 | 73,721 | 0.9328 | Severely split |
| 5 | 295,848 | 0.9195 | Severely split |
- SB 757 Mean CSI: 0.5050
- Ensemble Mean CSI: 0.4351 (σ = 0.0563)
- Percentile: 89% (SB 757 worse than 89% of neutral maps)
- Z-score: 1.24
All analysis is fully reproducible:
- Scripts automatically download data from public sources
- Random seed fixed where applicable (SKATER uses deterministic algorithm)
- Complete dependency versions specified in
requirements.txt - All code commented and documented
If you use this methodology in your work, please cite:
Sadeek, F., Francis, J., & Clark, J. (2025). Beyond the Line: A Data-Driven Framework
for Quantifying Community and Evaluating Redistricting Fairness in North Carolina.
IM²C 2025. https://github.com/jalenfran/immc_2025
MIT License - See LICENSE file for details.
For questions or collaboration:
- Farhan Sadeek: sadeek.1@osu.edu
- Jalen Francis: francis.628@osu.edu
- Jayson Clark: clark.4101@osu.edu
- Data: U.S. Census Bureau, LEHD Program
- Software: PySAL, GerryChain, GeoPandas communities
- Methodology: Assunção et al. (2006), Pielou (1966), DeFord et al. (2021)