ToxIQ

Landing Page
Analysis of Aspirin
Analysis of Aspirin (cont.)

Inspiration

Every year, over $2 billion and 12 years are spent bringing a single drug to market — and the majority of that cost comes from late-stage failures. Promising compounds make it through early lab testing only to harm or fail real patients in clinical trials. Worse, millions of animals are sacrificed annually in preclinical testing that still doesn't reliably predict human outcomes.

The story that haunted us most was thalidomide — a drug prescribed to pregnant women in the 1950s as a safe sedative. Thousands of children were born with severe limb deformities. There was no tool to predict it. We asked ourselves: what if there had been?

That question became ToxIQ.

What it does

ToxIQ is an AI-powered pharmacokinetic (PK) simulation and toxicity prediction platform. A researcher inputs a drug's molecular structure (SMILES string) or selects a known compound, sets a patient profile — age, weight, renal function, hepatic function — and ToxIQ does three things:

Predicts PK parameters — absorption rate ka, clearance CL, volume of distribution Vd, bioavailability F, and half-life t½ = (0.693 × Vd) / CL — using an XGBoost model trained on molecular features extracted with RDKit.
Simulates drug concentration over time using a one-compartment pharmacokinetic model solved with SciPy's solve_ivp:

dC/dt = (F × ka × D / Vd) × e^(−ka × t) − (CL / Vd) × C

The output is a full concentration-time curve plotted against the drug's therapeutic window — showing exactly when a drug enters, stays within, or exceeds safe levels.

Generates a plain-language safety summary using Gemini AI, so any stakeholder — researcher, investor, or regulator — can immediately understand the safety profile without interpreting raw numbers.

How we built it

Layer	Technology
Frontend	Next.js, React, CSS, Chart.js, Motion
Backend	FastAPI (Python), Pydantic
ML & Features	RDKit, scikit-learn, XGBoost
Toxicity Modeling	admet-ai (with heuristic fallback)
PK Simulation	SciPy `solve_ivp`
AI Summary	Google Gemini API
Deployment	Vercel (frontend), Railway (backend)

The backend exposes four core endpoints: /predict, /simulate, /compare, and /summary. The ML model extracts molecular descriptors — molecular weight, LogP, TPSA, H-bond donors/acceptors — and combines them with patient parameters to predict PK outputs. The simulation solves the ODE system and returns chart-ready JSON directly consumable by Chart.js.

Challenges we ran into

RDKit packaging on Python 3.11+ has quirks — getting it to install cleanly inside a Railway container required switching to rdkit-pypi and pinning dependency versions carefully.

Calibrating the synthetic training dataset was harder than expected. Without real labeled PK data readily available, we had to generate plausible synthetic data with physiologically grounded ranges and validate predictions against known drugs like aspirin, warfarin, and metformin.

Making Gemini summaries medically useful took significant prompt engineering. Generic prompts returned boilerplate text. We iterated until summaries consistently referenced the actual predicted PK numbers and communicated risk in plain, actionable language.

Accomplishments that we're proud of

We built a working end-to-end pipeline — from raw SMILES string to a rendered concentration-time curve and a plain-language safety report — in a single hackathon weekend. The PK simulation produces physiologically plausible curves that align with published data for known drugs. We're especially proud that ToxIQ surfaces complex pharmacokinetic science in a way that a non-expert can act on immediately.

What we learned

Real pharmacokinetic simulation is mathematically accessible — a one-compartment ODE model captures the core behavior of most small-molecule drugs well enough for early screening.
A compelling scientific narrative is as important as the code. The thalidomide story gives ToxIQ an emotional anchor that raw ML metrics never could.
Combining molecular ML with differential equation simulation creates something more powerful than either approach alone — the model predicts what the parameters are, and SciPy shows what that means for a real patient over time.

What's next for ToxIQ

Integrate real PK datasets (PK-DB, DrugBank) to train a production-grade model
Add multi-compartment PK modeling for more complex drug behaviors
Expand the compare mode to support full patient cohort simulations
Partner with pharmacology researchers to validate predictions against clinical trial data