I was happy to contribute to the collaborative manuscript Ten Quick Tips for Deep Learning in Biology arxiv.org/abs/2105.14372 1/
Anthony Gitter
2,817 posts
Computational biologist; Associate Prof. at University of Wisconsin-Madison; Jeanne M. Rowe Chair at Morgridge Institute
Joined April 2015
- My promotion to associate professor with tenure has been approved! There are still a few more steps to make it official, but I'm too excited to wait to share the news. 1/
- Our manuscript "Biophysics-based protein language models for protein engineering" with @romerolab1 is now on bioRxiv. We present Mutational Effect Transfer Learning (METL), a protein language model trained on biophysical simulations, and showcase it for protein engineering. 1/
- I'm sharing some first impressions of the ESM3 paper in this thread. The model and generative programming results look great, and I may come back to those later. 1/We have trained ESM3 and we're excited to introduce EvolutionaryScale. ESM3 is a generative language model for programming biology. In experiments, we found ESM3 can simulate 500M years of evolution to generate new fluorescent proteins. Read more: evolutionaryscale.ai/blog/esm3-rele…
00:00 - The February issue of @NatureBiotech is a focus on protein engineering. There are so many great news & views, primers, and reviews. 1/
- Our paper "Neural networks to learn protein sequence–function relationships from deep mutational scanning data" with @romerolab1 has been published in @PNASNews doi.org/10.1073/pnas.2… 1/n
- Our paper "Open collaborative writing with Manubot" is now available at @PLOSCompBiol. #Manubot uses Markdown + GitHub for continuous publication. doi.org/10.1371/journa… 1/8
- Our commentary "A renewed call for open artificial intelligence in biomedicine" is now available as a preprint. We call for sharing training data, code, and model weights in biomedical artificial intelligence research. 1/
- Protein design is my new LLM vibe check for biology. Here is a snippet of Llama 3.1 generating a green fluorescent protein. The AlphaFold3-predicted structure of "llamaGFP" is shown below with the full sequence as alt text. It has only 30% sequence identity with wild type avGFP!
- This is a great overview of machine learning written for a biological audience. It covers not only different algorithms but also data leakage, evaluating articles that use machine learning, etc. nature.com/articles/s4158…
- Protein Mutational Effect Predictor (ProMEP) uses self-supervised training on 160 million AlphaFold2-predicted structures with an SE(3)-Transformer. Then it can perform zero shot mutation effect prediction and was tested by engineering TnpB. doi.org/10.21203/rs.3.…
- Our review "Opportunities and obstacles for deep learning in biology and medicine" has been published at Journal of the Royal Society Interface. Now only 47 pages with the journal formatting! @GreeneScientist rsif.royalsocietypublishing.org/content/15/141…Happy to be part of the epic review "Opportunities and obstacles for deep learning in biology and medicine". At 123 pages and 552 refs, this encylopedic review summarizes much of deep learning in biomedicine to date. Led by @anthonygitter @GreeneScientist doi.org/10.1101/142760
- Scientific Large Language Models: A Survey on Biological & Chemical Domains arxiv.org/abs/2401.14656 75 pages covering large language models for scientific text, proteins, genomes, molecules, and multi-modal inputs. Figure 3 here gives an idea of how grand in scope it is.
- The 310.ai Molecule Programming Model version 4 loves alpha helices. I scrolled through all 1,053 generated proteins in their repo and saw a small fraction of beta strands.First text2protein AI model, compressing billions of years of life. 800+ novel, functional and foldable proteins are discovered by researchers. Whitepaper and repo bit.ly/310paper
00:00












