Skip to content

jalenfran/immc_2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beyond the Line: Data-Driven Community Analysis for NC Redistricting

Authors: Farhan Sadeek, Jalen Francis, Jayson Clark
Institution: The Ohio State University
Competition: IM²C 2025 (Intercollegiate Math Modeling Challenge)

Overview

This repository contains a complete computational framework for:

  1. Objectively identifying Communities of Interest (COIs) using demographic, socioeconomic, and commuting data
  2. Quantifying community fragmentation using information-theoretic metrics
  3. Evaluating redistricting fairness through ensemble comparison against neutral maps

Applied to North Carolina's 2023 congressional map (SB 757), our analysis reveals systematic community splitting: 5 of 30 identified communities exhibit CSI scores > 0.91 (near-maximum fragmentation), with the enacted map splitting communities more severely than 89% of algorithmically-generated neutral alternatives.

Key Results

Metric Value
Communities Identified 30 COIs across 2,672 census tracts
Most Fragmented COI Community 18 (CSI = 0.9999, population 123,786)
Severely Split (CSI > 0.90) 5 communities
SB 757 Mean CSI 0.5050
Neutral Maps Mean CSI 0.4351 (14% lower)
Percentile Rank 89th (worse than 89% of neutral maps)

Repository Structure

.
├── README.md                      # This file
├── requirements.txt               # Python dependencies
├── script.py                      # Main COI identification & evaluation
├── gerrychain_analysis.py         # Ensemble comparison analysis
├── report.tex                     # Full LaTeX report source
├── data/                          # Data files (downloaded by scripts)
├── results/                       # Generated visualizations & results
│   ├── coi_map.png
│   ├── coi_splits_map.png
│   ├── ensemble_comparison.png
│   └── coi_results_complete.csv
└── .gitignore                     # Git ignore patterns

Methodology Summary

1. Community Identification (SKATER Algorithm)

  • Data Sources:
    • 2020 Decennial Census (P.L. 94-171)
    • American Community Survey 5-Year Estimates (2016-2020)
    • LEHD Origin-Destination Employment Statistics (LODES)
  • Features (9 dimensions):
    • Demographics: Hispanic %, NH White %, NH Black %, NH Asian %
    • Socioeconomics: Median income, homeownership rate, bachelor's degree %
    • Commuting: Jobs in tract, workers in tract
  • Algorithm: Spatial K'luster Analysis by Tree Edge Removal (SKATER)
    • Ensures spatial contiguity
    • Minimizes within-cluster variance
    • k = 30 communities

2. Fragmentation Metric (Community Splitting Index)

  • Formula: Pielou's Evenness Index adapted to redistricting
    CSI = H / H_max = -Σ(p_i log₂ p_i) / log₂(N)
    
    where p_i = proportion of COI population in district i
  • Interpretation:
    • CSI = 0: Community intact (not split)
    • CSI ≈ 1: Community maximally fragmented

3. Ensemble Comparison (ReCom MCMC)

  • Algorithm: Recombination (ReCom) Markov chain
  • Constraints: Population equality (±5%), district contiguity
  • Sample Size: 100 neutral maps
  • Result: SB 757 at 89th percentile of CSI distribution

Installation & Requirements

Prerequisites

  • Python 3.10 or higher (required for GerryChain)
  • ~2GB disk space for data files

Setup

  1. Clone the repository:

    git clone https://github.com/jalenfran/immc_2025.git
    cd immc_2025
  2. Create virtual environment:

    python3.12 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

Usage

Main Analysis (COI Identification & Evaluation)

python script.py

Runtime: 5-10 minutes
Outputs:

  • coi_map.png - Map of 30 identified communities
  • coi_splits_map.png - COIs overlaid with SB 757 districts
  • coi_results_complete.csv - CSI scores for all 30 communities

Ensemble Analysis (Optional)

python gerrychain_analysis.py

Runtime: 10-30 minutes (generates 10,000 MCMC steps)
Outputs:

  • ensemble_comparison.png - Histogram comparing SB 757 to neutral maps
  • ensemble_csi_scores.csv - CSI scores for all 100 ensemble maps
  • Statistical significance results (percentile rank, z-score)

Note: Requires Python 3.10+ due to GerryChain type hint syntax.

Data Sources

All data is automatically downloaded by the scripts from public sources:

  1. Geographic Boundaries:

  2. Demographic Data:

  3. Commuting Data:

Key Dependencies

  • Geospatial: geopandas, pyogrio, shapely
  • Data Access: cenpy (Census API wrapper)
  • Spatial Analysis: libpysal, spopt (SKATER implementation)
  • Ensemble Generation: gerrychain (ReCom MCMC)
  • Visualization: matplotlib

See requirements.txt for complete list with version constraints.

Results

Identified Communities

30 spatially-contiguous communities identified across North Carolina, ranging from 18,728 (rural) to 2,427,537 (metro) residents.

Top 5 Most Split Communities

COI ID Population CSI Score Interpretation
18 123,786 0.9999 Near-maximally split
8 65,015 0.9867 Near-maximally split
22 27,328 0.9721 Severely split
27 73,721 0.9328 Severely split
5 295,848 0.9195 Severely split

Ensemble Comparison

  • SB 757 Mean CSI: 0.5050
  • Ensemble Mean CSI: 0.4351 (σ = 0.0563)
  • Percentile: 89% (SB 757 worse than 89% of neutral maps)
  • Z-score: 1.24

Reproducibility

All analysis is fully reproducible:

  • Scripts automatically download data from public sources
  • Random seed fixed where applicable (SKATER uses deterministic algorithm)
  • Complete dependency versions specified in requirements.txt
  • All code commented and documented

Citation

If you use this methodology in your work, please cite:

Sadeek, F., Francis, J., & Clark, J. (2025). Beyond the Line: A Data-Driven Framework 
for Quantifying Community and Evaluating Redistricting Fairness in North Carolina. 
IM²C 2025. https://github.com/jalenfran/immc_2025

License

MIT License - See LICENSE file for details.

Contact

For questions or collaboration:

Acknowledgments

  • Data: U.S. Census Bureau, LEHD Program
  • Software: PySAL, GerryChain, GeoPandas communities
  • Methodology: Assunção et al. (2006), Pielou (1966), DeFord et al. (2021)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors