Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Structure-Based Protein Function Prediction using Graph Convolutional Networks

View ORCID ProfileVladimir Gligorijevic, P. Douglas Renfrew, View ORCID ProfileTomasz Kosciolek, View ORCID ProfileJulia Koehler Leman, Daniel Berenberg, View ORCID ProfileTommi Vatanen, Chris Chandler, View ORCID ProfileBryn C. Taylor, Ian M. Fisk, Hera Vlamakis, View ORCID ProfileRamnik J. Xavier, View ORCID ProfileRob Knight, View ORCID ProfileKyunghyun Cho, View ORCID ProfileRichard Bonneau
doi: https://doi.org/10.1101/786236
Vladimir Gligorijevic
1Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vladimir Gligorijevic
  • For correspondence: vgligorijevic{at}flatironinstitute.org rb133{at}nyu.edu
P. Douglas Renfrew
1Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: vgligorijevic{at}flatironinstitute.org rb133{at}nyu.edu
Tomasz Kosciolek
2Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
3Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tomasz Kosciolek
Julia Koehler Leman
1Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Julia Koehler Leman
Daniel Berenberg
1Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tommi Vatanen
5Broad Institute of MIT and Harvard, Cambridge, MA, USA
9The Liggins Institute, University of Auckland, Auckland, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tommi Vatanen
Chris Chandler
1Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bryn C. Taylor
15Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Bryn C. Taylor
Ian M. Fisk
10Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hera Vlamakis
5Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ramnik J. Xavier
5Broad Institute of MIT and Harvard, Cambridge, MA, USA
6Center for Computational and Integrative Biology, Massachusetts General Hospital and, Harvard Medical School, Boston, MA, USA
7Gastrointestinal Unit, and Center for the Study of Inflammatory Bowel Disease, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ramnik J. Xavier
Rob Knight
2Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
11Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
12Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rob Knight
Kyunghyun Cho
4Facebook AI Research
13CIFAR Azrieli Global Scholar
14Center for Data Science, New York University, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kyunghyun Cho
Richard Bonneau
1Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
14Center for Data Science, New York University, New York, NY, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Richard Bonneau
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The large number of available sequences and the diversity of protein functions challenge current experimental and computational approaches to determining and predicting protein function. We present a deep learning Graph Convolutional Network (GCN) for predicting protein functions and concurrently identifying functionally important residues. This model is initially trained using experimentally determined structures from the Protein Data Bank (PDB) but has significant de-noising capability, with only a minor drop in performance observed when structure predictions are used. We take advantage of this denoising property to train the model on > 200,000 protein structures, including many homology-predicted structures, greatly expanding the reach and applications of the method. Our model learns general structure-function relationships by robustly predicting functions of proteins with ≤ 40% sequence identity to the training set. We show that our GCN architecture predicts functions more accurately than Convolutional Neural Networks trained on sequence data alone and previous competing methods. Using class activation mapping, we automatically identify structural regions at the residue-level that lead to each function prediction for every confidently predicted protein, advancing site-specific function prediction. We use our method to annotate PDB and SWISS-MODEL proteins, making several new confident function predictions spanning both fold and function classifications.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • New results on Swiss-Model structures added. New figures with class activation map added. Figures 2, 3, and 4 revised. Author list updated. Supplementary material updated.

  • ↵1 |GO|, |EC| - denotes the number of GO term, EC numbers in the set.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted June 10, 2020.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Structure-Based Protein Function Prediction using Graph Convolutional Networks
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Structure-Based Protein Function Prediction using Graph Convolutional Networks
Vladimir Gligorijevic, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C. Taylor, Ian M. Fisk, Hera Vlamakis, Ramnik J. Xavier, Rob Knight, Kyunghyun Cho, Richard Bonneau
bioRxiv 786236; doi: https://doi.org/10.1101/786236
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Structure-Based Protein Function Prediction using Graph Convolutional Networks
Vladimir Gligorijevic, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C. Taylor, Ian M. Fisk, Hera Vlamakis, Ramnik J. Xavier, Rob Knight, Kyunghyun Cho, Richard Bonneau
bioRxiv 786236; doi: https://doi.org/10.1101/786236

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (7528)
  • Biochemistry (17312)
  • Bioengineering (13549)
  • Bioinformatics (41119)
  • Biophysics (21093)
  • Cancer Biology (18196)
  • Cell Biology (25053)
  • Clinical Trials (138)
  • Developmental Biology (13192)
  • Ecology (19583)
  • Epidemiology (2067)
  • Evolutionary Biology (24021)
  • Genetics (15437)
  • Genomics (22182)
  • Immunology (17423)
  • Microbiology (39727)
  • Molecular Biology (16864)
  • Neuroscience (87115)
  • Paleontology (662)
  • Pathology (2786)
  • Pharmacology and Toxicology (4717)
  • Physiology (7508)
  • Plant Biology (14859)
  • Scientific Communication and Education (2029)
  • Synthetic Biology (4206)
  • Systems Biology (9668)
  • Zoology (2235)