🔬 Forensic Biological Evidence Analyzer

BACSA Hacks 2026 — University of Toronto

Turning biological crime scene evidence into ranked suspect lists using ML — with confidence scores that model uncertainty, not just answers.

📌 About

The Forensic Biological Evidence Analyzer is a machine learning system that takes biological evidence collected from a crime scene and ranks 30,000 suspects by match confidence. It models uncertainty through transparent confidence scores rather than claiming one correct answer — directly mirroring how real forensic science works.

Built in 24 hours for BACSA Hacks 2026 — Closed Challenge: Forensics Support System.

🎯 Features

🧬 6 DNA STR markers — same standard used in real forensic labs
🩸 19 biological features — blood type, hair, eyes, fingerprint, height + DNA
🤖 Gradient Boosting ML model — 96.2% accuracy trained on 30,000 records
📊 Confidence scoring — combines ML probability + DNA overlap + physical match
📝 Written reasoning — explains WHY each suspect ranked high
📈 5-page Streamlit UI — interactive, dark-themed, production-grade
⬇️ Export results — download full ranked CSV of all 30,000 suspects

🖥️ App Pages

Page	Description
🏠 Home	Project overview and stats dashboard
🧬 Enter Evidence	Input crime scene biological evidence manually or via CSV upload
📂 Suspect Database	Browse and filter all 30,000 suspects
📊 Analysis & Results	Ranked suspect list with confidence scores and reasoning
📈 Visualizations	Feature importance, heatmaps, and database distribution charts

🧠 How the Confidence Score Works

Each suspect receives a final confidence score calculated as:

Final Score = (0.5 × ML Model Probability)
            + (0.3 × DNA STR Allele Overlap %)
            + (0.2 × Physical Trait Match %)

Score	Label
≥ 70%	🔴 HIGH
40–70%	🟠 MEDIUM
< 40%	🔵 LOW

🛠️ Tech Stack

Tool	Purpose
Python	Core language
Pandas & NumPy	Data generation and feature engineering
Scikit-learn	Gradient Boosting Classifier
Streamlit	Interactive 5-page web app
Seaborn & Matplotlib	Visualizations

🚀 Getting Started

1. Clone the repo

git clone https://github.com/yourusername/forensic-analyzer.git
cd forensic-analyzer

2. Install dependencies

pip install scikit-learn pandas numpy matplotlib seaborn streamlit

3. Generate the dataset

python generate_dataset.py

Creates:

data/suspects.csv — 30,000 suspect records
data/crime_scene_evidence.csv — sample crime scene evidence

4. Train the model

python train_model.py

Creates:

models/xgb_model.pkl — trained model
models/metadata.pkl — encoding metadata
models/feature_importance.csv — feature weights

5. Run the app

streamlit run app.py

📁 Project Structure

forensic-analyzer/
│
├── app.py                    # Main Streamlit application
├── generate_dataset.py       # Synthetic dataset generator (30,000 records)
├── train_model.py            # Model training script
│
├── data/
│   ├── suspects.csv          # 30,000 suspect records (17 columns)
│   └── crime_scene_evidence.csv  # Sample crime scene evidence
│
└── models/
    ├── xgb_model.pkl         # Trained Gradient Boosting model
    ├── metadata.pkl          # Feature encoding metadata
    └── feature_importance.csv # Feature importance scores

📊 Dataset Features

Feature	Type	Description
suspect_id	String	Unique ID (S00001–S30000)
blood_type	Categorical	A+, A-, B+, B-, AB+, AB-, O+, O-
hair_color	Categorical	Black, Brown, Blonde, Red, Gray, White
eye_color	Categorical	Brown, Blue, Green, Hazel, Gray, Amber
fingerprint_class	Categorical	Loop, Whorl, Arch, Tented Arch
height_cm	Integer	150–200 cm
age	Integer	18–65 years
prior_record	Binary	0 = No, 1 = Yes
dna_marker_1–6	String	STR allele pairs (e.g. "12,18")

🏆 Judging Criteria Addressed

"Success is measured not by finding one correct answer, but by how well teams model uncertainty, justify assumptions, and reason their forensic interpretations."

✅ Ranked suspect list — all 30,000 suspects ranked by confidence
✅ Confidence scores — multi-component scoring with uncertainty modeling
✅ Reasoning — written explanation for every suspect's ranking
✅ Assumption justification — feature importance shows what the model weighted

👨‍💻 Built By

Built with ❤️ for BACSA Hacks 2026 — Biotech and Computer Science Association, University of Toronto.

📄 License

MIT License — free to use, modify, and distribute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 Forensic Biological Evidence Analyzer

BACSA Hacks 2026 — University of Toronto

📌 About

🎯 Features

🖥️ App Pages

🧠 How the Confidence Score Works

🛠️ Tech Stack

🚀 Getting Started

1. Clone the repo

2. Install dependencies

3. Generate the dataset

4. Train the model

5. Run the app

📁 Project Structure

📊 Dataset Features

🏆 Judging Criteria Addressed

👨‍💻 Built By

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
app.py		app.py
crime_scene_evidence.csv		crime_scene_evidence.csv
feature_importance.csv		feature_importance.csv
generate_dataset.py		generate_dataset.py
metadata.pkl		metadata.pkl
requirements.txt		requirements.txt
suspects.csv		suspects.csv
train_model.py		train_model.py
xgb_model.pkl		xgb_model.pkl

Folders and files

Latest commit

History

Repository files navigation

🔬 Forensic Biological Evidence Analyzer

BACSA Hacks 2026 — University of Toronto

📌 About

🎯 Features

🖥️ App Pages

🧠 How the Confidence Score Works

🛠️ Tech Stack

🚀 Getting Started

1. Clone the repo

2. Install dependencies

3. Generate the dataset

4. Train the model

5. Run the app

📁 Project Structure

📊 Dataset Features

🏆 Judging Criteria Addressed

👨‍💻 Built By

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages