RankGen

Every week, a family receives a genetic report with variants of uncertain significance—and then they wait. They wait months or years while clinicians manually sift through databases, searching for clues. During that wait, symptoms may progress, treatment opportunities may be missed, and the emotional toll of not knowing compounds the physical burden of disease.

Clinicians today face an impossible task. When they receive a list of VUSs, they must manually check population databases, conservation scores, protein structures, and phenotype matches—for each variant, one by one. This takes hours, and most VUSs never get investigated at all.

We built RankGen to change that. We wanted to turn uncertainty into action—to give clinicians a tool that doesn't just list variants, but tells them which ones matter most and what to do next.

RankGen is an automated variant prioritization engine that takes a patient's VUSs and runs them through four evidence-based filters: Population Frequency, Evolutionary Conservation, 3D Structural Context, and Phenotype Matching. But RankGen doesn't stop at a ranked list. For each top-priority variant, we visualize the variant in its 3D protein structure using iCn3D, color-coded by predicted impact, search literature for prior functional assays on this variant or gene, and recommend a specific next-step experiment based on structural context. The output is a testable hypothesis—not just "this variant is interesting," but "here is the experiment to run next."

We built RankGen in just above 24 hours during the hackathon using open-source tools and publicly available datasets. For the frontend, we used iCn3D (NCBI) for interactive 3D protein structure visualization, Streamlit for rapid prototyping of the web interface, and Plotly for interactive data visualizations. For the backend, we used Python (Flask API) for the prioritization engine, Pandas/NumPy for data processing, and Scikit-learn for weighted scoring algorithms. For the data sources, we used gnomAD API for population frequency filtering, Ensembl REST API for PhyloP conservation scores, AlphaFold DB for protein structures (AF-P04637 for TP53, etc.), ClinVar FTP for variant classifications and pathogenicity labels, HPO API for phenotype-gene associations, and lastly PubMed E-utilities for literature search. We built a modular pipeline where each filter operates independently, then combines scores using weighted averaging. The iCn3D viewer is embedded directly in the results page, with JavaScript event handlers linking the variant table to the 3D structure.

Determining whether a variant is in a "binding site" vs. "buried core" vs. "surface" is not trivial. We initially tried simple distance-based rules, but found that binding sites are often discontinuous in 3D space. We solved this by, using precomputed SIFTS annotations where available, implementing residue depth calculations (Bio.PDD) to distinguish buried vs. surface, and creating a rule-based system: if within 5Å of known ligand in any PDB structure, classify as "binding site".

No other tool we found combines population frequency, conservation, 3D structure, and phenotype matching in a single automated workflow with experiment recommendations. We built something genuinely novel. Moving from "prioritization" to "prescription" was our boldest idea. We're proud that our recommendations are grounded in the variant's structural context—not just generic advice, but specific, mechanistically appropriate assays. We started Friday night with an idea and a pile of APIs. By Sunday afternoon, we had a working prototype with 3D visualization, literature search, and experiment recommendations. The hackathon constraints pushed us to be creative and efficient.

This hackathon taught us as much about the reality of clinical genomics as it did about coding. On the technical side, we learned that APIs are powerful but fragile—gnomAD and Ensembl have rate limits that forced us to implement intelligent caching and batch querying, making our tool 100x faster than our first naive implementation. Most surprisingly, we learned that 3D visualization is easier than we thought—iCn3D does the heavy lifting, letting us focus on interpretation rather than rendering.

Our immediate next step is to expand beyond missense variants to support in-frame indels, which can be partially modeled using existing structures, and eventually splice variants, which will require predicting structural impact from exon skipping patterns. Looking further ahead, we envision EMR integration that automatically runs RankGen when new VUSs are reported, pushing alerts to clinicians when evidence emerges.

Built With

clinvar
ensembl
gnomad
icn3d
javascript
numpy
openaiapi
pandas
pbo
plotly
pubmed
python
streamlit

Updates

Anthony Shen started this project — Mar 01, 2026 01:02 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.