Diageno

Inspiration

This project was inspired by listening to rare disease patient panels and learning about their diagnostic journeys. Many individuals described years of uncertainty, multiple specialist referrals, fragmented data, and delayed recognition of key clinical features. A consistent theme was the profound impact of early diagnosis—not only on medical outcomes, but also on psychological well-being and financial burden. These discussions emphasized the need for improved awareness, structured data consolidation, and computational tools that support more efficient diagnostic reasoning.

In parallel, the 3Blue1Brown video on optimally solving WORDLE using entropy influenced the architectural foundation of our system. The central concept—that each step should maximize expected information gain—directly shaped our approach to narrowing diagnostic uncertainty in rare disease evaluation.

What Does Diageno Do

Our system analyzes structured phenotypic and genomic information to identify the most probable diagnostic domain within the rare disease landscape. Rather than simply producing a ranked list of diseases, the model localizes the region of highest uncertainty, enabling more focused downstream evaluation. By reducing the diagnostic search space, the system supports more efficient clinical reasoning.

How We Built It

We constructed the system using standardized biomedical ontologies and structured case datasets, including HPO for phenotype encoding and MONDO/Orphanet for disease harmonization. Case-level data were stored in a columnar format (Parquet) to enable efficient computational analysis. The architecture incorporates semantic phenotype similarity scoring, ontology-based mapping, and entropy-driven ranking mechanisms to prioritize informative distinctions across candidate diseases.

Finding Individual Patient Data and Recommending Next Steps

The system integrates de-identified structured case data to model phenotype–disease associations. For a given patient profile, it evaluates which phenotypic features most reduce diagnostic ambiguity. Instead of prescribing treatment, the system identifies high-yield domains or targeted genomic tests that would most effectively refine the differential diagnosis.

Accurate Diagnosis Through Information Gain

We frame diagnostic reasoning as an information optimization problem. Each additional phenotype or genomic result is evaluated based on its expected reduction in uncertainty. By computing entropy across the candidate disease distribution, the system prioritizes features that maximize discriminatory power within the rare disease space.

Entropy Calculations and the Rare Disease Landscape

The rare disease landscape is characterized by phenotypic heterogeneity and sparse data. We apply entropy calculations to quantify uncertainty across possible diagnoses. This enables systematic prioritization of features that most efficiently partition the disease space, analogous to optimal guessing strategies in information theory models.

Incorporating a Temporal Component

Recognizing that phenotypes evolve over time, we hope to incorporate a temporal dimension into the model. Onset age, progression patterns, and longitudinal phenotype events are encoded to refine disease likelihood dynamically. This temporal integration could improve alignment with real-world clinical trajectories and enhance diagnostic specificity.