Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

From systems to structure — using genetic data to model protein structures

Abstract

Understanding the effects of genetic variation is a fundamental problem in biology that requires methods to analyse both physical and functional consequences of sequence changes at systems-wide and mechanistic scales. To achieve a systems view, protein interaction networks map which proteins physically interact, while genetic interaction networks inform on the phenotypic consequences of perturbing these protein interactions. Until recently, understanding the molecular mechanisms that underlie these interactions often required biophysical methods to determine the structures of the proteins involved. The past decade has seen the emergence of new approaches based on coevolution, deep mutational scanning and genome-scale genetic or chemical–genetic interaction mapping that enable modelling of the structures of individual proteins or protein complexes. Here, we review the emerging use of large-scale genetic datasets and deep learning approaches to model protein structures and their interactions, and discuss the integration of structural data from different sources.

This is a preview of subscription content, access via your institution

Access options

Buy this article

39,95 €

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Readouts, scale and resolution.
Fig. 2: Structural modelling of proteins and their complexes using coevolution.
Fig. 3: Mapping of genetic and chemical–genetic interactions.
Fig. 4: Structural modelling of proteins and their complexes using genetic and chemical–genetic interactions.
Fig. 5: Structural characterization of host–pathogen interaction networks.

Similar content being viewed by others

References

  1. Sharan, R., Ulitsky, I. & Shamir, R. Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Barabasi, A. L. Scale-free networks: a decade and beyond. Science 325, 412–413 (2009).

    Article  CAS  PubMed  Google Scholar 

  3. Swaney, D. L. et al. A protein network map of head and neck cancer reveals PIK3CA mutant drug sensitivity. Science 374, eabf2911 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Kim, M. et al. A protein interaction landscape of breast cancer. Science 374, eabf3066 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Zheng, F. et al. Interpretation of cancer mutations using a multiscale map of protein systems. Science 374, eabf3067 (2021).

    Article  CAS  PubMed  Google Scholar 

  6. Krogan, N. J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643 (2006).

    Article  CAS  PubMed  Google Scholar 

  7. Gavin, A. C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006).

    Article  CAS  PubMed  Google Scholar 

  8. Yu, H. et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Havugimana, P. C. et al. A census of human soluble protein complexes. Cell 150, 1068–1081 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Shi, Y. A glimpse of structural biology through X-ray crystallography. Cell 159, 995–1014 (2014).

    Article  CAS  PubMed  Google Scholar 

  11. Henderson, R. Realizing the potential of electron cryo-microscopy. Q. Rev. Biophys. 37, 3–13 (2004).

    Article  CAS  PubMed  Google Scholar 

  12. Wuthrich, K. The way to NMR structures of proteins. Nat. Struct. Biol. 8, 923–925 (2001).

    Article  CAS  PubMed  Google Scholar 

  13. Phillips, P. C. Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems. Nat. Rev. Genet. 9, 855–867 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Collins, S. R. et al. Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature 446, 806–810 (2007).

    Article  CAS  PubMed  Google Scholar 

  15. Tong, A. H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294, 2364–2368 (2001).

    Article  CAS  PubMed  Google Scholar 

  16. Dobson, C. M. Biophysical techniques in structural biology. Annu. Rev. Biochem. 88, 25–33 (2019).

    Article  CAS  PubMed  Google Scholar 

  17. Murata, K. & Wolf, M. Cryo-electron microscopy for structural analysis of dynamic biological macromolecules. Biochim. Biophys. Acta Gen. Subj. 1862, 324–334 (2018).

    Article  CAS  PubMed  Google Scholar 

  18. Huang, C. & Kalodimos, C. G. Structures of large protein complexes determined by nuclear magnetic resonance spectroscopy. Annu. Rev. Biophys. 46, 317–336 (2017).

    Article  CAS  PubMed  Google Scholar 

  19. Wall, M. E., Wolff, A. M. & Fraser, J. S. Bringing diffuse X-ray scattering into focus. Curr. Opin. Struct. Biol. 50, 109–116 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Altschuh, D., Lesk, A. M., Bloomer, A. C. & Klug, A. Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 193, 693–707 (1987).

    Article  CAS  PubMed  Google Scholar 

  21. Gobel, U., Sander, C., Schneider, R. & Valencia, A. Correlated mutations and residue contacts in proteins. Proteins 18, 309–317 (1994).

    Article  CAS  PubMed  Google Scholar 

  22. Neher, E. How frequent are correlated changes in families of protein sequences? Proc. Natl Acad. Sci. USA 91, 98–102 (1994).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Taylor, W. R. & Hatrick, K. Compensating changes in protein multiple sequence alignments. Protein Eng. 7, 341–348 (1994).

    Article  CAS  PubMed  Google Scholar 

  24. Shindyalov, I. N., Kolchanov, N. A. & Sander, C. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 7, 349–358 (1994).

    Article  CAS  PubMed  Google Scholar 

  25. Thomas, D. J., Casari, G. & Sander, C. The prediction of protein contacts from multiple sequence alignments. Protein Eng. 9, 941–948 (1996).

    Article  CAS  PubMed  Google Scholar 

  26. Dunn, S. D., Wahl, L. M. & Gloor, G. B. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24, 333–340 (2008).

    Article  CAS  PubMed  Google Scholar 

  27. Fodor, A. A. & Aldrich, R. W. Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004).

    Article  CAS  PubMed  Google Scholar 

  28. Marks, D. S., Hopf, T. A. & Sander, C. Protein structure prediction from sequence variation. Nat. Biotechnol. 30, 1072–1080 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Thomas, J., Ramakrishnan, N. & Bailey-Kellogg, C. Graphical models of residue coupling in protein families. IEEE/ACM Trans. Comput. Biol. Bioinform 5, 183–197 (2008).

    Article  CAS  PubMed  Google Scholar 

  30. Balakrishnan, S., Kamisetty, H., Carbonell, J. G., Lee, S. I. & Langmead, C. J. Learning generative models for protein fold families. Proteins 79, 1061–1078 (2011).

    Article  CAS  PubMed  Google Scholar 

  31. Burger, L. & van Nimwegen, E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol. 6, e1000633 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl Acad. Sci. USA 106, 67–72 (2009).

    Article  CAS  PubMed  Google Scholar 

  33. Jones, D. T., Buchan, D. W., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012).

    Article  CAS  PubMed  Google Scholar 

  34. UniProt, C. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49, D480–D489 (2021).

    Article  CAS  Google Scholar 

  35. Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766 (2011). This study describes the first application of protein structure modelling using spatial restraints derived from coevolution data.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Hopf, T. A. et al. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149, 1607–1621 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Sulkowska, J. I., Morcos, F., Weigt, M., Hwa, T. & Onuchic, J. N. Genomics-aided structure prediction. Proc. Natl Acad. Sci. USA 109, 10340–10345 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Nugent, T. & Jones, D. T. Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc. Natl Acad. Sci. USA 109, E1540–E1547 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Kamisetty, H., Ovchinnikov, S. & Baker, D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc. Natl Acad. Sci. USA 110, 15674–15679 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014).

    Article  PubMed Central  Google Scholar 

  41. Ovchinnikov, S., Kamisetty, H. & Baker, D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 3, e02030 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Bitbol, A. F., Dwyer, R. S., Colwell, L. J. & Wingreen, N. S. Inferring interaction partners from protein sequences. Proc. Natl Acad. Sci. USA 113, 12180–12185 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Pazos, F., Helmer-Citterich, M., Ausiello, G. & Valencia, A. Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 271, 511–523 (1997).

    Article  CAS  PubMed  Google Scholar 

  44. Baldassi, C. et al. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS ONE 9, e92721 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Cong, Q., Anishchenko, I., Ovchinnikov, S. & Baker, D. Protein interaction networks revealed by proteome coevolution. Science 365, 185–189 (2019). This study represents a major expansion of the utility of coevolution by applying it to predict PPIs on a proteome-wide scale in E. coli and M. tuberculosis.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Stiffler, M. A. et al. Protein structure from experimental evolution. Cell Syst. 10, 15–24 e15 (2020).

    Article  CAS  PubMed  Google Scholar 

  47. Ekeberg, M., Lovkvist, C., Lan, Y., Weigt, M. & Aurell, E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys. Rev. E Stat. Nonlin Soft Matter Phys. 87, 012707 (2013).

    Article  PubMed  CAS  Google Scholar 

  48. Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294–298 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13, e1005324 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Zeng, H. et al. ComplexContact: a web server for inter-protein contact prediction using deep learning. Nucleic Acids Res. 46, W432–W437 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Jones, D. T. & Kandathil, S. M. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 34, 3308–3315 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). This deep learning approach allows for efficient prediction of protein structures at near experimental accuracy.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Burley, S. K. et al. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 49, D437–D451 (2021).

    Article  CAS  PubMed  Google Scholar 

  54. Suzek, B. E. et al. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).

    Article  CAS  PubMed  Google Scholar 

  55. Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Akdel, M. et al. A structural biology community assessment of AlphaFold 2 applications. Preprint at bioRxiv https://doi.org/10.1101/2021.09.26.461876 (2021).

    Article  Google Scholar 

  57. Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2 and extended multiple-sequence alignments. Preprint at bioRxiv https://doi.org/10.1101/2021.09.15.460468 (2021).

    Article  Google Scholar 

  58. Ghani, U. et al. Improved docking of protein models by a combination of Alphafold2 and ClusPro. Preprint at bioRxiv https://doi.org/10.1101/2021.09.07.459290 (2021).

    Article  Google Scholar 

  59. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021). This deep learning approach allows for efficient prediction of protein structures at near experimental accuracy.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Humphreys, I. R. et al. Computed structures of core eukaryotic protein complexes. Science https://doi.org/10.1126/science.abm4805 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Gupta, M. et al. CryoEM and AI reveal a structure of SARS-CoV-2 Nsp2, a multifunctional protein involved in key host processes. Preprint at bioRxiv https://doi.org/10.1101/2021.05.10.443524 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Beltrao, P., Cagney, G. & Krogan, N. J. Quantitative genetic interactions reveal biological modularity. Cell 141, 739–745 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Boone, C., Bussey, H. & Andrews, B. J. Exploring genetic interactions and networks with yeast. Nat. Rev. Genet. 8, 437–449 (2007).

    Article  CAS  PubMed  Google Scholar 

  65. Pan, X. et al. A robust toolkit for functional profiling of the yeast genome. Mol. Cell 16, 487–496 (2004).

    Article  CAS  PubMed  Google Scholar 

  66. Collins, S. R., Schuldiner, M., Krogan, N. J. & Weissman, J. S. A strategy for extracting and analyzing large-scale quantitative epistatic interaction data. Genome Biol. 7, R63 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Schuldiner, M., Collins, S. R., Weissman, J. S. & Krogan, N. J. Quantitative genetic analysis in Saccharomyces cerevisiae using epistatic miniarray profiles (E-MAPs) and its application to chromatin functions. Methods 40, 344–352 (2006).

    Article  CAS  PubMed  Google Scholar 

  68. Costanzo, M. et al. A global genetic interaction network maps a wiring diagram of cellular function. Science 353, aaf1420 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  69. Costanzo, M. et al. The genetic landscape of a cell. Science 327, 425–431 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Fiedler, D. et al. Functional organization of the S. cerevisiae phosphorylation network. Cell 136, 952–963 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Kapitzky, L. et al. Cross-species chemogenomic profiling reveals evolutionarily conserved drug mode of action. Mol. Syst. Biol. 6, 451 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Nichols, R. J. et al. Phenotypic landscape of a bacterial cell. Cell 144, 143–156 (2011).

    Article  CAS  PubMed  Google Scholar 

  73. Chang, M., Bellaoui, M., Boone, C. & Brown, G. W. A genome-wide screen for methyl methanesulfonate-sensitive mutants reveals genes required for S phase progression in the presence of DNA damage. Proc. Natl Acad. Sci. USA 99, 16934–16939 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Hillenmeyer, M. E. et al. The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science 320, 362–365 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Butland, G. et al. eSGA: E. coli synthetic genetic array analysis. Nat. Methods 5, 789–795 (2008).

    Article  CAS  PubMed  Google Scholar 

  76. Typas, A. et al. High-throughput, quantitative analyses of genetic interactions in E. coli. Nat. Methods 5, 781–787 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Lehner, B., Crombie, C., Tischler, J., Fortunato, A. & Fraser, A. G. Systematic mapping of genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signaling pathways. Nat. Genet. 38, 896–903 (2006).

    Article  CAS  PubMed  Google Scholar 

  78. Roguev, A. et al. Conservation and rewiring of functional modules revealed by an epistasis map in fission yeast. Science 322, 405–410 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Horn, T. et al. Mapping of signaling networks through synthetic genetic interaction analysis by RNAi. Nat. Methods 8, 341–346 (2011).

    Article  CAS  PubMed  Google Scholar 

  80. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Du, D. et al. Genetic interaction mapping in mammalian cells using CRISPR interference. Nat. Methods 14, 577–580 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Roguev, A. et al. Quantitative genetic-interaction mapping in mammalian cells. Nat. Methods 10, 432–437 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Laufer, C., Fischer, B., Billmann, M., Huber, W. & Boutros, M. Mapping genetic interactions in human cancer cells with RNAi and multiparametric phenotyping. Nat. Methods 10, 427–431 (2013).

    Article  CAS  PubMed  Google Scholar 

  85. Bassik, M. C. et al. A systematic mammalian genetic interaction map reveals pathways underlying ricin susceptibility. Cell 152, 909–922 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Haarer, B., Viggiano, S., Hibbs, M. A., Troyanskaya, O. G. & Amberg, D. C. Modeling complex genetic interactions in a simple eukaryotic genome: actin displays a rich spectrum of complex haploinsufficiencies. Genes Dev. 21, 148–159 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Ryan, C. J. et al. High-resolution network biology: connecting sequence with function. Nat. Rev. Genet. 14, 865–879 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Zhang, Z., Shibahara, K. & Stillman, B. PCNA connects DNA replication to epigenetic inheritance in yeast. Nature 408, 221–225 (2000).

    Article  CAS  PubMed  Google Scholar 

  89. Braberg, H. et al. From structure to systems: high-resolution, quantitative genetic analysis of RNA polymerase II. Cell 154, 775–788 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Braberg, H., Moehle, E. A., Shales, M., Guthrie, C. & Krogan, N. J. Genetic interaction analysis of point mutations enables interrogation of gene function at a residue-level resolution: exploring the applications of high-resolution genetic interaction mapping of point mutations. Bioessays 36, 706–713 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Melamed, D., Young, D. L., Gamble, C. E., Miller, C. R. & Fields, S. Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein. RNA 19, 1537–1551 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Olson, C. A., Wu, N. C. & Sun, R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24, 2643–2651 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Sahoo, A., Khare, S., Devanarayanan, S., Jain, P. C. & Varadarajan, R. Residue proximity information and protein model discrimination using saturation-suppressor mutagenesis. eLife 4, e09532 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  95. Perica, T. et al. Systems-level effects of allosteric perturbations to a model molecular switch. Nature 599, 152–157 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Rollins, N. J. et al. Inferring protein 3D structure from deep mutation scans. Nat. Genet. 51, 1170–1176 (2019). This study describes the use of deep mutational scanning to generate restraints for determining the structures of small proteins or domains.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Schmiedel, J. M. & Lehner, B. Determining protein structures using deep mutagenesis. Nat. Genet. 51, 1177–1186 (2019). This study describes the use of deep mutational scanning to generate restraints for determining the structures of small proteins or domains.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Eccleston, R. C., Pollock, D. D. & Goldstein, R. A. Selection for cooperativity causes epistasis predominately between native contacts and enables epistasis-based structure reconstruction. Proc. Natl Acad. Sci. USA 118, e2010057 (2021).

    Article  CAS  Google Scholar 

  99. Araya, C. L. et al. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl Acad. Sci. USA 109, 16858–16863 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Diss, G. & Lehner, B. The genetic landscape of a physical interaction. eLife 7, e32472 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  101. Kobori, S. & Yokobayashi, Y. High-throughput mutational analysis of a twister ribozyme. Angew. Chem. Int. Ed. Engl. 55, 10354–10357 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Newberry, R. W., Leong, J. T., Chow, E. D., Kampmann, M. & DeGrado, W. F. Deep mutational scanning reveals the structural basis for alpha-synuclein activity. Nat. Chem. Biol. 16, 653–659 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Bolognesi, B. et al. The mutational landscape of a prion-like domain. Nat. Commun. 10, 4162 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  104. Braberg, H. et al. Genetic interaction mapping informs integrative structure determination of protein complexes. Science 370, eaaz4910 (2020). This study describes the modelling of protein complex structures, using restraints derived from genome-scale genetic interaction data and chemical–genetic interaction data.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Rout, M. P. & Sali, A. Principles for integrative structural biology studies. Cell 177, 1384–1403 (2019). This publication describes integrative structural biology, which serves as a crucial tool for integrating different types of dataset for the structural modelling of protein complexes.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Shiver, A. L. et al. Chemical-genetic interrogation of RNA polymerase mutants reveals structure-function relationships and physiological tradeoffs. Mol. Cell 81, 2201–2215 e2209 (2021).

    Article  CAS  PubMed  Google Scholar 

  107. Hockenberry, A. J. & Wilke, C. O. Evolutionary couplings detect side-chain interactions. PeerJ 7, e7280 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  108. Roy, K. R. et al. Multiplexed precision genome editing with trackable genomic barcodes in yeast. Nat. Biotechnol. 36, 512–520 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Collins, S. R. et al. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell Proteom. 6, 439–450 (2007).

    Article  CAS  Google Scholar 

  110. Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019). This CRISPR–Cas9-based genome editing approach allows for all base-to-base conversions, insertions or deletions, without the need of a double-stranded break or donor DNA, and with lower off-target activity than Cas9 nuclease.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Ma, L. et al. CRISPR-Cas9-mediated saturated mutagenesis screen predicts clinical drug resistance with improved accuracy. Proc. Natl Acad. Sci. USA 114, 11751–11756 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR-Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).

    Article  CAS  PubMed  Google Scholar 

  113. Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Erwood, S. et al. Saturation variant interpretation using CRISPR prime editing. Preprint at bioRxiv https://doi.org/10.1101/2021.05.11.443710 (2021).

    Article  Google Scholar 

  115. McGuffee, S. R. & Elcock, A. H. Diffusion, crowding & protein stability in a dynamic molecular model of the bacterial cytoplasm. PLoS Comput. Biol. 6, e1000694 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  116. Singla, J. et al. Opportunities and challenges in building a spatiotemporal multi-scale model of the human pancreatic β cell. Cell 173, 11–19 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Takamori, S. et al. Molecular anatomy of a trafficking organelle. Cell 127, 831–846 (2006).

    Article  CAS  PubMed  Google Scholar 

  118. Thul, P. J. et al. A subcellular map of the human proteome. Science 356, eaal3321 (2017).

    Article  PubMed  CAS  Google Scholar 

  119. Wilhelm, B. G. et al. Composition of isolated synaptic boutons reveals the amounts of vesicle trafficking proteins. Science 344, 1023–1028 (2014).

    Article  CAS  PubMed  Google Scholar 

  120. Eckhardt, M., Hultquist, J. F., Kaake, R. M., Huttenhain, R. & Krogan, N. J. A systems approach to infectious disease. Nat. Rev. Genet. 21, 339–354 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Gordon, D. E. et al. Comparative host-coronavirus protein interaction networks reveal pan-viral disease mechanisms. Science 370, eabe9403 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Gordon, D. E. et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature 583, 459–468 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Ramage, H. R. et al. A combined proteomics/genomics approach links hepatitis C virus infection with nonsense-mediated mRNA decay. Mol. Cell 57, 329–340 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Jager, S. et al. Global landscape of HIV-human protein complexes. Nature 481, 365–370 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  125. Gordon, D. E. et al. A quantitative genetic interaction map of HIV infection. Mol. Cell 78, 197–209.e197 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Tenthorey, J. L., Young, C., Sodeinde, A., Emerman, M. & Malik, H. S. Mutational resilience of antiviral restriction favors primate TRIM5alpha in host-virus evolutionary arms races. eLife 9, e59988 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Starr, T. N. et al. Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell 182, 1295–1310 e1220 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Greaney, A. J. et al. Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody recognition. Cell Host Microbe 29, 44–57 e49 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Gong, L. I., Suchard, M. A. & Bloom, J. D. Stability-mediated epistasis constrains the evolution of an influenza protein. eLife 2, e00631 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  130. Wong, A. H. M. et al. Receptor-binding loops in alphacoronavirus adaptation and evolution. Nat. Commun. 8, 1735 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  131. Sali, A. From integrative structural biology to cell biology. J. Biol. Chem. 296, 100743 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Kim, S. J. et al. Integrative structure and functional anatomy of a nuclear pore complex. Nature 555, 475–482 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Lasker, K. et al. Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach. Proc. Natl Acad. Sci. USA 109, 1380–1387 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Gutierrez, C. et al. Structural dynamics of the human COP9 signalosome revealed by cross-linking mass spectrometry and integrative modeling. Proc. Natl Acad. Sci. USA 117, 4088–4098 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Kwon, Y. et al. Structural basis of CD4 downregulation by HIV-1 Nef. Nat. Struct. Mol. Biol. 27, 822–828 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Luo, J. et al. Architecture of the human and yeast general transcription and DNA repair factor TFIIH. Mol. Cell 59, 794–806 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Wang, S., Li, W., Liu, S. & Xu, J. RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res. 44, W430–W435 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Fernandez-de-Cossio-Diaz, J., Uguzzoni, G. & Pagnani, A. Unsupervised inference of protein fitness landscape from deep mutational scan. Mol. Biol. Evol. 38, 318–328 (2021).

    Article  CAS  PubMed  Google Scholar 

  139. Schaarschmidt, J., Monastyrskyy, B., Kryshtafovych, A. & Bonvin, A. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 86 (Suppl. 1), 51–66 (2018).

    Article  CAS  PubMed  Google Scholar 

  140. Viswanath, S. & Sali, A. Optimizing model representation for integrative structure determination of macromolecular assemblies. Proc. Natl Acad. Sci. USA 116, 540–545 (2019).

    Article  CAS  PubMed  Google Scholar 

  141. Saltzberg, D. J. et al. Using Integrative Modeling Platform to compute, validate, and archive a model of a protein complex structure. Protein Sci. 30, 250–261 (2021).

    Article  CAS  PubMed  Google Scholar 

  142. Viswanath, S., Chemmama, I. E., Cimermancic, P. & Sali, A. Assessing exhaustiveness of stochastic sampling for integrative modeling of macromolecular structures. Biophys. J. 113, 2344–2353 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Russel, D. et al. Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol. 10, e1001244 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank P. Beltrao and R. B. Babu for helpful discussion and comments on the manuscript. This research was funded by grants from the National Institutes of Health (NIH) (U54CA209891, U54NS100717, 1U01MH115747, U19 AI135990, U19AI135972, and P50AI150476 to N.J.K; R01GM083960 and P41GM109824 to A.S.). This work was supported by the Defense Advanced Research Projects Agency (DARPA) under Cooperative Agreements HR00111920020 and HR00112020029 to N.J.K. The views, opinions and/or findings contained in this material are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the US Government.

Author information

Authors and Affiliations

Authors

Contributions

The authors contributed equally to all aspects of the article.

Corresponding author

Correspondence to Nevan J. Krogan.

Ethics declarations

Competing interests

The Krogan Laboratory has received research support from Vir Biotechnology and F. Hoffmann-La Roche. N.J.K. has consulting agreements with the Icahn School of Medicine at Mount Sinai, New York, Maze Therapeutics and Interline Therapeutics. N.J.K. is a shareholder in Tenaya Therapeutics, Maze Therapeutics and Interline Therapeutics, and a financially compensated Scientific Advisory Board Member for GEn1E Lifesciences, Inc. The other authors declare no competing interests.

Additional information

Peer review information

Nature Reviews Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

Multiple sequence alignment

An alignment of the sequences from multiple proteins. The multiple sequence alignment defines how the residue positions in each protein relate to those of the other proteins.

Protein family

A group of evolutionarily related proteins. The members of a protein family will typically have similar sequences and/or structures and related functions.

Orthologues

Evolutionarily related genes in different species. The proteins encoded by orthologous genes are typically responsible for the same function in the respective organisms.

Paralogues

Genes with similar sequences that originated via a duplication event within a genome. Paralogues belong to the same species and their encoded proteins are typically not involved in the same function.

Neural network

A category of machine learning that is inspired by the human brain and is central to deep learning algorithms.

Homology modelling

A method for determining the structure of a protein on the basis of sequence similarity with another protein of known structure by satisfying spatial restraints.

Subunits

Single proteins in the context of a protein complex.

Knockdowns

Genes whose expression has been reduced.

Complex haploinsufficiencies

Negative genetic interactions observed in cells that are hemizygous for two different genes. The phenotype of the two hemizygous loci combined is more severe than expected if the genes were unrelated.

Hemizygous

A diploid cell is hemizygous for a gene if it harbours only one functional allele of the gene.

Allostery

A process whereby an active site in a protein (enzyme) is regulated by the binding of a molecule to a different site (typically distal in space).

Knockouts

Genes that have been inactivated (for example, deleted).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Braberg, H., Echeverria, I., Kaake, R.M. et al. From systems to structure — using genetic data to model protein structures. Nat Rev Genet 23, 342–354 (2022). https://doi.org/10.1038/s41576-021-00441-w

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41576-021-00441-w

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing