Inspiration

In rare disease genomics, variants are often treated as local events: a residue changes, a score is assigned, and a label is produced.

But proteins are three-dimensional physical systems. Residues interact through spatial contacts, and functional effects propagate across structure.

In KBG syndrome (ANKRD11), we observed substantial genotype–phenotype heterogeneity — even among variants of similar class. This inspired a shift in perspective:

What if a mutation is not just a point change, but a perturbation introduced into a structural network?


What We Built

We developed a structure-aware graph framework to model how variant effects spread inside proteins.

1. Protein as a Contact Graph

From AlphaFold structure:

  • Residue → node
  • Spatial contact → edge
  • pLDDT → structural region (core, semi-core, disorder)

Edges are defined by:

$$ |x_i - x_j|2 < d{\text{cutoff}} $$


2. Variant as Network Perturbation

For a mutation at residue ( s ), we apply Random Walk with Restart (RWR):

$$ p_{t+1} = (1-\alpha) W p_t + \alpha p_0 $$

This produces a steady-state distribution ( p^* ), which we interpret as the variant’s structural fingerprint.

We summarize impact across regions (core, semi-core, disorder, communication backbone).


3. Linking Structure to Phenotype

Using a curated KBG cohort, we model phenotype association:

$$ \text{logit}(P(\text{HPO})) = \beta_0 + \beta_1 \log(\text{SemiCoreImpact}) + \beta_2 \text{TruncFrac} $$

This connects structural propagation patterns to phenotype variability, even in a small rare-disease dataset.


What We Learned

  • Structural organization matters beyond sequence position.
  • Diffusion-based approximations can capture meaningful mechanistic signals.
  • Rare disease interpretation benefits from interpretable, mechanism-aware features.

Modeling how effects move can reveal patterns that classification alone cannot.


Challenges

  • Structural uncertainty: AlphaFold confidence varies across ANKRD11; we stratified by pLDDT.
  • Small cohort size: Required conservative modeling and interpretable features.
  • Approximation limits: RWR is not a full physical simulation, but a scalable abstraction.

Built With

  • alphafold
  • biopython
  • gradio
  • matplotlib
  • networkx
  • python
  • scipy
  • statsmodels
Share this project:

Updates