GeneAI

Pharmacogenomics Drug Risk Analysis
Gene profile + drug → personalized risk score + CPIC clinical recommendation

Problem

1.3 million emergency room admissions per year in the U.S. are caused by adverse drug reactions. Most drugs are prescribed based on symptoms alone — not genetics. Patients with certain gene variants metabolize drugs too quickly, too slowly, or not at all, leading to toxicity or treatment failure.

CPIC pharmacogenomic guidelines exist, but they're buried in academic tables clinicians don't have time to look up mid-appointment.

Solution

GeneAI is an API + UI that takes a patient's gene profile and a drug name, and returns:

Per-gene activity scores from a trained Set Transformer model
CPIC clinical recommendation text (e.g., "Avoid codeine use" or "Use standard dosing")
Plain-English explanation suitable for patients
Supports both structured JSON and natural language input

Architecture

Two prediction paths:

Natural Language — User query → OpenAI parses to structured JSON → model pipeline
Structured — Direct JSON {"genes": [...], "drug": "..."} → model pipeline

Model pipeline:

Gene embeddings via ESM-2 protein language model
Drug embeddings via Morgan fingerprints from SMILES
Target flags from DrugBank drug-gene interactions
Cross-Attention Set Transformer fuses multi-gene + drug features → per-gene risk scores

Post-prediction, CPIC recommendation text is looked up and optionally enriched by GPT-4o-mini.

Tech Stack

Component	Technology
API	FastAPI + Uvicorn
Model	PyTorch (Set Transformer)
Gene Embeddings	ESM-2 (Meta protein language model)
Drug Embeddings	RDKit Morgan fingerprints from SMILES
NLP	OpenAI gpt-4o-mini
Data	CPIC database, DrugBank, PubChem
Frontend	React + Three.js (3D DNA helix)

Project Structure

├── api/                 # FastAPI backend
│   ├── routes/          # health, drugs, genes, predict, explain
│   ├── services/        # data_service, model_service, openai_service
│   ├── models/          # request/response schemas
│   └── static/          # docs + demo HTML pages
├── model/               # Set Transformer model
│   ├── model_api.py     # PharmaRiskModel interface
│   ├── model.py         # PharmaSetTransformer architecture
│   ├── set_transformer.pt
│   └── embeddings/      # gene_embeddings.pkl, drug_embeddings.pkl, target_flags.pkl
├── pipeline/            # Data pipeline + processed CPIC data
├── frontend/            # React + Three.js UI (GeneAI app)
└── requirements.txt     # Python dependencies

Live

geneai.tech — UI live geneai.tech/docs — Documentation geneai.tech/demo — Interactive demo

Quickstart

# Clone
git clone https://github.com/Topupchips/HackIllinoisWinningIdea.git
cd HackIllinoisWinningIdea

# Install Python dependencies
pip install -r requirements.txt

# (Optional) Set OpenAI key for natural language + explain
export OPENAI_API_KEY=sk-...

# Run API
python -m uvicorn api.main:app --reload --port 8000

# Run frontend (separate terminal)
cd frontend && npm install && npm start

http://localhost:3000 — GeneAI React app
http://localhost:8000/docs — Documentation
http://localhost:8000/docs/api — Interactive API explorer
http://localhost:8000/demo — Live endpoint demo

API Endpoints

Method	Path	Description
`GET`	`/health`	Health check — model/data status
`GET`	`/validate`	Validate loaded data, list available genes
`GET`	`/drugs`	List all drugs (paginated, searchable)
`GET`	`/drugs/{drug_id}`	Get drug details + SMILES
`GET`	`/genes`	List all genes (paginated, searchable)
`GET`	`/genes/{symbol}`	Gene details + recommendation count
`GET`	`/genes/{symbol}/alleles`	All alleles for a gene
`POST`	`/predict`	Predict risk from structured gene/drug input
`POST`	`/predict/natural`	Predict risk from natural language
`POST`	`/explain`	Generate plain-English explanation

All list endpoints support ?search=, ?page=, ?limit= and return fuzzy "did you mean" suggestions on no results.

Curl Examples

Health check:

curl http://localhost:8000/health

{"status": "healthy", "model_loaded": true, "data_loaded": true, "drug_count": 323, "gene_count": 17}

Predict risk (structured):

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"genes": [{"name": "CYP2D6", "phenotype": "Poor Metabolizer"}], "drug": "codeine"}'

[
  {
    "gene": "CYP2D6",
    "activity_level": 0.0,
    "medicine": "codeine",
    "text": "Avoid codeine use because of possibility of diminished analgesia."
  }
]

Predict risk (natural language):

curl -X POST http://localhost:8000/predict/natural \
  -H "Content-Type: application/json" \
  -d '{"query": "I am a CYP2D6 poor metabolizer taking codeine"}'

Explain risk:

curl -X POST http://localhost:8000/explain \
  -H "Content-Type: application/json" \
  -d '{"drug": "codeine", "risk_score": 8.5, "gene_contributions": {"CYP2D6": 0.85}}'

Data Sources

Source	License	Description
CPIC	CC0 (public domain)	Clinical pharmacogenomics guidelines (Stanford/NIH)
PharmGKB	CC BY-SA 4.0	Pharmacogenomics knowledge base
PubChem	Public domain	Drug SMILES / molecular structures
DrugBank	CC BY-NC 4.0	Drug-gene target interactions
ESM-2	MIT	Protein language model for gene embeddings

AI Disclosure

Required by HackIllinois rules.

Tool	Usage
OpenAI gpt-4o-mini	Runtime: parses natural language input, generates patient-facing explanations
ESM-2 (Meta)	Gene sequence embeddings used as model features
Claude Code (Anthropic)	Development: code scaffolding, data pipeline scripts, API boilerplate

What we built ourselves:

Data extraction and cleaning pipeline (CPIC SQL, DrugBank XML, PubChem API)
Risk score engineering and labeling logic
Model architecture design (Cross-Attention Set Transformer)
Training pipeline and feature engineering
API design, endpoint logic, and service layer
React + Three.js frontend

Team

License

MIT

Built for HackIllinois 2026

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
api		api
frontend		frontend
model		model
pipeline		pipeline
Procfile		Procfile
README.md		README.md
aedify.yaml		aedify.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeneAI

Problem

Solution

Architecture

Tech Stack

Project Structure

Live

Quickstart

API Endpoints

Curl Examples

Data Sources

AI Disclosure

Team

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GeneAI

Problem

Solution

Architecture

Tech Stack

Project Structure

Live

Quickstart

API Endpoints

Curl Examples

Data Sources

AI Disclosure

Team

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages