<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>csvoss.com</title>
    <description></description>
    <link>https://csvoss.com//</link>
    <atom:link href="https://csvoss.com//feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Fri, 05 Dec 2025 23:53:46 +0000</pubDate>
    <lastBuildDate>Fri, 05 Dec 2025 23:53:46 +0000</lastBuildDate>
    <generator>Jekyll v3.10.0</generator>
    
      
      <item>
        <title>A Mechanist&apos;s Guide to the Coronavirus Genome</title>
        <description>&lt;p&gt;Hello and welcome to my Coronavirus Genome Walkthrough.&lt;/p&gt;

&lt;p&gt;(Hoping someone comes out with that Vaccine Speedrun soon. This boss battle is really shaping up to be an intense one and we’ll need all the artifacts we can get.)&lt;/p&gt;

&lt;p&gt;Here, I aim to provide a &lt;em&gt;mechanistic explanation&lt;/em&gt; of the SARS-CoV-2 genome’s syntax and semantics. Let’s investigate what the SARS-CoV-2 viral genome actually does as if reading through code like a compiler, from nucleotides to amino acids all the way to proteins. From the four base pairs all the way up to the completed protein-coated virus, what is a virus like this is actually made of on the concrete, physical level?&lt;/p&gt;

&lt;h3 id=&quot;understanding-a-full-system&quot;&gt;Understanding a Full System&lt;/h3&gt;

&lt;p&gt;The underlying purpose of this essay is less about the coronavirus &lt;em&gt;per se&lt;/em&gt; and more about how having a small—but functionally complete—piece of viral RNA to analyze gives me a unique opportunity to try to understand a complete self-replicating machine from scratch. This is not a feat that I would have the fortitude to manually replicate with the full human genome, for example—but the coronavirus genome, like the &lt;a href=&quot;http://openworm.org/&quot;&gt;nematode genome&lt;/a&gt;, is small enough that we stand a chance at building a complete understanding. The task is perhaps akin to &lt;a href=&quot;https://distill.pub/2018/building-blocks/&quot;&gt;interpretability&lt;/a&gt;, but for biological systems instead of artificial neural networks.&lt;/p&gt;

&lt;p&gt;As a consequence, this essay is not intended to produce epidemiological conclusions; there are plenty of other sources for that! This essay is about fully understanding a biological system at the chemical and physical level.&lt;/p&gt;

&lt;h3 id=&quot;play-curiosity-and-mechanical-understanding&quot;&gt;Play, Curiosity, and Mechanical Understanding&lt;/h3&gt;

&lt;p&gt;Throughout this essay, I follow my curiosity in the style of &lt;a href=&quot;https://en.wikipedia.org/wiki/Serious_play&quot;&gt;serious play&lt;/a&gt;: if I &lt;a href=&quot;https://www.readthesequences.com/Noticing-Confusion-Sequence&quot;&gt;notice I’m confused&lt;/a&gt; about something, I look into it and explore it until I’m satisfied that I now understand, and that my understanding is &lt;em&gt;a &lt;a href=&quot;https://plato.stanford.edu/entries/science-mechanisms/#ConMec&quot;&gt;mechanical&lt;/a&gt; understanding&lt;/em&gt;. Things are made of stuff! It turns out that we can understand that stuff!&lt;/p&gt;

&lt;p&gt;I may skip over some details that were not confusing to me during my own research, but your journey need not be the same as mine. If you’re confused about something while reading this essay, I encourage you to go and look it up! &lt;a href=&quot;http://agentyduck.blogspot.com/2015/06/the-art-of-noticing.html&quot;&gt;Notice&lt;/a&gt; when your curiosity arises; that’s the meditation. It’s always possible to discover the &lt;a href=&quot;http://samoburja.com/how-to-find-the-frontier-of-knowledge/&quot;&gt;frontier of your own knowledge&lt;/a&gt; and to expand it.&lt;/p&gt;

&lt;p&gt;This all, at least, has been my intention as I set out to create this piece! As Ken Liu said of his philosophy while translating The Three-Body Problem, “I may not have succeeded, but these were the standards I had in mind as I set about my task.”&lt;/p&gt;

&lt;p&gt;Part 1, here, covers just the genome and its translation to proteins. I hope to also write a Part 2 which would cover the structure and function of those proteins, their protein-protein interactions, and the full viral life cycle.&lt;/p&gt;

&lt;!--Finally, as you may already be able to tell, this essay also serves as a philosophical manifesto-by-example of how to think concretely about problems in biology. Along the way, I give some of my thoughts about the role of thermodynamics in molecular biology, legibility in complex systems, pedagogy, and the future of computational modeling.--&gt;

&lt;p&gt;Let’s get started.&lt;/p&gt;

&lt;h1 id=&quot;viruses&quot;&gt;Viruses&lt;/h1&gt;

&lt;p&gt;As a reminder, SARS-CoV-2 is a &lt;em&gt;positive-sense single-stranded RNA virus&lt;/em&gt;.&lt;/p&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;https://upload.wikimedia.org/wikipedia/commons/3/3b/HCV_EM_picture_2.png&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Positive-sense_single-stranded_RNA_virus&quot;&gt;Positive-sense single-stranded RNA virus - Wikipedia&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;en.wikipedia.org&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;What does this mean we can expect?&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;em&gt;Single-stranded&lt;/em&gt;: Its genome is a single strand of &lt;a href=&quot;https://en.wikipedia.org/wiki/RNA&quot;&gt;RNA&lt;/a&gt; (ssRNA).&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Positive-sense&lt;/em&gt;: That single strand of RNA can be immediately translated into protein by the ribosomes of the cell it infects.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From this we can also infer that one of the proteins the virus encodes for must be &lt;em&gt;RNA-dependent RNA polymerase&lt;/em&gt; (RdRP), a protein which synthesizes new RNA given an RNA template. That’s right: RNA → RNA. However, according to the &lt;a href=&quot;https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology&quot;&gt;central dogma of molecular biology&lt;/a&gt;, isn’t RNA → RNA an unconscionable heresy? Correspondingly, RdRP is not naturally found in cells! All known positive-sense ssRNA viruses therefore &lt;em&gt;must encode&lt;/em&gt; RdRP in order to successfully commit this heresy.&lt;/p&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;/images/rdrp-white.png&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/RNA-dependent_RNA_polymerase&quot;&gt;RNA-dependent RNA polymerase - Wikipedia&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;en.wikipedia.org&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;…Wait a minute, the phrase “positive-sense ssRNA virus” implies the existence of &lt;em&gt;negative-sense&lt;/em&gt; viruses. If those don’t encode their proteins directly, how can they possibly work?&lt;/p&gt;

&lt;h2 id=&quot;positive-sense-and-negative-sense&quot;&gt;Positive sense and negative sense&lt;/h2&gt;

&lt;p&gt;Negative-sense ssRNA viruses also exist! Influenza, Ebola, and measles are examples.&lt;/p&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Ebola_Virus_TEM_PHIL_1832_lores.jpg/1200px-Ebola_Virus_TEM_PHIL_1832_lores.jpg&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Negative-sense_single-stranded_RNA_virus&quot;&gt;Negative-sense single-stranded RNA virus - Wikipedia&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;en.wikipedia.org&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The inner contents of &lt;em&gt;negative-sense&lt;/em&gt; ssRNA viruses consist not of an RNA genome but of a &lt;em&gt;ribonucleoprotein&lt;/em&gt;, which incorporates both an RNA genome as well as a cohort of viral proteins capable of replicating RNA. Unlike positive-sense ssRNA viruses, negative-sense ssRNA viruses must travel with a working copy of their RNA-replicating proteins. This ribonucleoprotein has enzymatic activity!&lt;/p&gt;

&lt;h2 id=&quot;rdrp-as-drug-target&quot;&gt;RdRP as drug target&lt;/h2&gt;

&lt;p&gt;Since RdRP has (as far as I know) no legitimate purpose in human cells and is not naturally coded by them, might it offer a potential target for novel antiviral drugs?&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/24102407/&quot;&gt;Velkov et al. 2014&lt;/a&gt; explores RdRP as a drug target for antivirals against the &lt;a href=&quot;https://en.wikipedia.org/wiki/Henipavirus&quot;&gt;Hendra virus&lt;/a&gt;, a negative-sense ssRNA virus, though I am unable to find the full text.&lt;/p&gt;

&lt;!-- &lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;https://cdn.ncbi.nlm.nih.gov/pubmed/persistent/pubmed-meta-image.png&quot;&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/24102407/&quot;&gt;The RNA-dependent-RNA Polymerase, an Emerging Antiviral Drug Target for the Hendra Virus - PubMed&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;Australia is facing a major national medical challenge with the emergence of the Hendra virus (HeV) as a medically and economically important pathogen of humans and animals. Clinical symptoms of human HeV infection can include fever, hypotension, dizziness, encephalitis, respiratory haemorrhage and …&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;pubmed.ncbi.nlm.nih.gov&lt;/div&gt;&lt;/div&gt; --&gt;

&lt;blockquote&gt;
  &lt;p&gt;This review examines the current knowledge based on the multi-domain architecture of the Hendra RdRP and highlights which essential domain functions represent tangible targets for drug development against this deadly disease.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There must be some reason that developing antivirals against this protein is technically (or socially) complicated, or I’d have expected us to do it by now – there are a lot of RNA viruses that this drug target could theoretically hit. Flagging this discrepancy for further research.&lt;sup id=&quot;fnref:2&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;h1 id=&quot;the-full-genome&quot;&gt;The full genome&lt;/h1&gt;

&lt;p&gt;Back to SARS-CoV-2! First, let’s get us a genome. Obviously this virus has seen some mutations as it’s spread around, as you can explore at &lt;a href=&quot;https://nextstrain.org/ncov/global&quot;&gt;NextStrain&lt;/a&gt;, so we’ve technically got choices as to which one to analyze. For this thread I’ll just stick to analyzing &lt;em&gt;one&lt;/em&gt; version of the genome: Wuhan-Hu-1.&lt;/p&gt;

&lt;p&gt;As a reminder, each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;G&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; in a genome is one of the four &lt;a href=&quot;https://en.wikipedia.org/wiki/Nucleotide&quot;&gt;nucleotides&lt;/a&gt;: &lt;a href=&quot;https://en.wikipedia.org/wiki/Adenine&quot;&gt;adenine&lt;/a&gt;, &lt;a href=&quot;https://en.wikipedia.org/wiki/Guanine&quot;&gt;guanine&lt;/a&gt;, &lt;a href=&quot;https://en.wikipedia.org/wiki/Cytosine&quot;&gt;cytosine&lt;/a&gt;, and &lt;a href=&quot;https://en.wikipedia.org/wiki/Thymine&quot;&gt;thymine&lt;/a&gt;. There are actually &lt;a href=&quot;https://www.scripps.edu/romesberg/publications.html&quot;&gt;plenty of ways to engineer&lt;/a&gt; different &lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/22850726/&quot;&gt;unnatural base pair systems&lt;/a&gt; by adding &lt;a href=&quot;https://science.sciencemag.org/content/363/6429/884&quot;&gt;artificial nucleotides&lt;/a&gt;, and these can even be integrated into &lt;a href=&quot;https://www.pnas.org/content/98/9/4922&quot;&gt;transcription&lt;/a&gt; and &lt;a href=&quot;https://www.nature.com/articles/nature24659&quot;&gt;translation&lt;/a&gt;, but &lt;a href=&quot;https://carlbrannen.wordpress.com/2007/06/13/why-does-dna-only-use-4-nucleotides/&quot;&gt;for&lt;/a&gt; &lt;a href=&quot;https://dreamerbiologist.wordpress.com/2013/02/16/why-did-nature-settle-on-just-four-nucleotides/&quot;&gt;whatever&lt;/a&gt; &lt;a href=&quot;https://www.pnas.org/content/114/32/E6476&quot;&gt;reason&lt;/a&gt;, these four &lt;a href=&quot;https://www.nature.com/articles/s41467-018-07389-2&quot;&gt;and not others&lt;/a&gt; are &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3331698/&quot;&gt;what life ultimately ended up with&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/nucleotides.png&quot; style=&quot;max-height: 600px&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;The four nucleotides in DNA.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;The genome of Wuhan-Hu-1 is available from &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3&quot;&gt;NCBI GenBank&lt;/a&gt;. Since SARS-CoV-2 is an RNA virus, each &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt; in this string technically represents a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;U&lt;/code&gt;, for &lt;a href=&quot;https://en.wikipedia.org/wiki/Uracil&quot;&gt;uracil&lt;/a&gt;, RNA’s information-equivalent of thymine. The genome sequence is therefore:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;1     AUUAAAGGUU UAUACCUUCC CAGGUAACAA ACCAACCAAC UUUCGAUCUC UUGUAGAUCU
61    GUUCUCUAAA CGAACUUUAA AAUCUGUGUG GCUGUCACUC GGCUGCAUGC UUAGUGCACU
121   CACGCAGUAU AAUUAAUAAC UAAUUACUGU CGUUGACAGG ACACGAGUAA CUCGUCUAUC

...

29761 ACAGUGAACA AUGCUAGGGA GAGCUGCCUA UAUGGAAGAG CCCUAAUGUG UAAAAUUAAU
29821 UUUAGUAGUG CUAUCCCCAU GUGAUUUUAA UAGCUUCUUA GGAGAAUGAC AAAAAAAAAA
29881 AAAAAAAAAA AAAAAAAAAA AAA
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;a target=&quot;blank&quot; class=&quot;cta-button&quot; style=&quot;position: relative;&quot; href=&quot;https://benchling.com/s/seq-28k9llmwnY475iv7ogwF/edit&quot;&gt;Follow along with the genome »&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s 29,903 nucleotides. Since there are only four possible nucleotides, we can estimate the information compression value of each nucleotide at approximately 2 bits; the virus’s genome therefore requires only 7.5 kilobytes to store. That’s roughly as much data, byte for byte, as there are characters in this essay up to this point!&lt;/p&gt;

&lt;!-- &lt;img src=&quot;https://s3-us-west-2.amazonaws.com/courses-images/wp-content/uploads/sites/110/2016/05/02212445/Figure_03_05_03.png&quot;&gt; --&gt;

&lt;p&gt;Lay out those 29,903 nucleobases along a ribose-phosphate backbone, reading them left to right &lt;a href=&quot;https://en.wikipedia.org/wiki/Directionality_(molecular_biology)&quot;&gt;from the 5’ end to the 3’ end&lt;/a&gt;, and bam – if that single molecule* were teleported into a cell, that’s 100% chemically sufficient** to infect a person with the plague du jour.&lt;/p&gt;

&lt;p&gt;*plus the 5’ cap, discussed below&lt;/p&gt;

&lt;p&gt;**modulo viral load effects??&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/polynucleotide.png&quot; style=&quot;max-height: 500px&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;How to interpret the Wuhan-Hu-1 genome as a complete molecule.&lt;/small&gt;&lt;/p&gt;

&lt;h2 id=&quot;poly-a-tail&quot;&gt;Poly-A tail&lt;/h2&gt;

&lt;p&gt;First question, and perhaps the most obvious one to the naked eye – what’s with all the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AAAAA&lt;/code&gt; at the end of the viral genome?&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;29821 ...                                                ... AAAAAAAAAA
29881 AAAAAAAAAA AAAAAAAAAA AAA
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;a target=&quot;blank&quot; class=&quot;cta-button&quot; style=&quot;position: relative;&quot; href=&quot;https://benchling.com/s/seq-28k9llmwnY475iv7ogwF/edit&quot;&gt;Follow along with the genome »&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s… yelling at us? Is it… suffering? Should we &lt;a href=&quot;https://reducing-suffering.org/is-there-suffering-in-fundamental-physics/&quot;&gt;help&lt;/a&gt;?&lt;/p&gt;

&lt;p&gt;Simple: It’s a &lt;a href=&quot;https://bioinformatics.stackexchange.com/questions/11227/why-does-the-sars-cov2-coronavirus-genome-end-in-aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa&quot;&gt;3’ poly-A tail&lt;/a&gt;! This &lt;a href=&quot;https://en.wikipedia.org/wiki/Polyadenylation&quot;&gt;long tail of adenosine monomers&lt;/a&gt; is extremely common in both our own cells and in RNA viruses.&lt;/p&gt;

&lt;p&gt;Our own messenger RNA (mRNA) has a poly-A tail when it’s freshly produced in the nucleus so as to slow its degradation by the cell, allowing it to last long enough to be transcribed into protein. Naturally, if you’re a positive-strand RNA virus, you’re also going to want to last long enough to be transcribed into protein – so, you need the same feature, yourself.&lt;/p&gt;

&lt;p&gt;Genome 0.11% explained. So far so good!&lt;/p&gt;

&lt;h2 id=&quot;5-cap&quot;&gt;5’ cap&lt;/h2&gt;

&lt;p&gt;While we’re discussing chemical features of mRNA, note that the viral genome presumably must also have a &lt;a href=&quot;https://en.wikipedia.org/wiki/Five-prime_cap&quot;&gt;5’ cap&lt;/a&gt; – an extra &lt;a href=&quot;https://en.wikipedia.org/wiki/7-Methylguanosine&quot;&gt;7-methylguanosine&lt;/a&gt; at the 5’ end of its RNA strand – just like mRNAs do.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/5primecap.png&quot; style=&quot;max-height: 400px&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;A 5&apos; cap, consisting of a 7-methylguanosine as well as methylation of the first two ribose sugars.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;The cap is not directly shown in the viral genome sequence or mentioned in NCBI GenBank, but it is referenced in multiple papers discussing coronaviral genomes:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Since 2003, the outbreak of severe acute respiratory syndrome coronavirus has drawn increased attention and stimulated numerous studies on the molecular virology of coronaviruses. Here, we review the current understanding of the mechanisms adopted by coronaviruses to produce the 5′-cap structure and methylation modification of viral genomic RNAs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img style=&quot;display: none&quot; class=&quot;unfurl-embed-card-feature-image not-pretty&quot; src=&quot;https://media.springernature.com/w110/springer-static/cover/journal/12250.jpg&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://link.springer.com/article/10.1007/s12250-016-3726-4&quot;&gt;Molecular mechanisms of coronavirus RNA capping and methylation&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;The 5′-cap structures of eukaryotic mRNAs are important for RNA stability, pre-mRNA splicing, mRNA export, and protein translation. Many viruses have evolved mechanisms for generating their own cap structures with methylation at the N7 position of the capped guanine and the ribose 2′-Oposition of the first nucleotide, which help viral RNAs escape recognition by the host innate immune system. The RNA genomes of coronavirus were identified to have 5′-caps in the early 1980s. However, for decades the RNA capping mechanisms of coronaviruses remained unknown. Since 2003, the outbreak of severe acute respiratory syndrome coronavirus has drawn increased attention and stimulated numerous studies on the molecular virology of coronaviruses. Here, we review the current understanding of the mechanisms adopted by coronaviruses to produce the 5′-cap structure and methylation modification of viral genomic RNAs.&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;link.springer.com&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;Coronaviruses possess a cap structure at the 5′ ends of viral genomic RNA and subgenomic RNAs, which is generated through consecutive methylations by virally encoded guanine-N7-methyltransferase (N7-MTase) and 2′-O-methyltransferase (2′-O-MTase). The coronaviral N7-MTase is unique for its physical linkage with an exoribonuclease (ExoN) harbored in nonstructural protein 14 (nsp14) of coronaviruses.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img style=&quot;display: none&quot; class=&quot;unfurl-embed-card-feature-image not-pretty&quot; src=&quot;https://www.ncbi.nlm.nih.gov/corehtml/pmc/pmcgifs/pmc-logo-share.png&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3648086/&quot;&gt;Structure-Function Analysis of Severe Acute Respiratory Syndrome Coronavirus RNA Cap Guanine-N7-Methyltransferase&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;Coronaviruses possess a cap structure at the 5′ ends of viral genomic RNA and subgenomic RNAs, which is generated through consecutive methylations by virally encoded guanine-N7-methyltransferase (N7-MTase) and 2′-O-methyltransferase (2′-O-MTase). ...&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;www.ncbi.nlm.nih.gov&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;Here, we have reconstituted complete SARS-CoV mRNA cap methylation &lt;em&gt;in vitro&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img style=&quot;display: none&quot; class=&quot;unfurl-embed-card-feature-image not-pretty&quot; src=&quot;https://www.ncbi.nlm.nih.gov/corehtml/pmc/pmcgifs/pmc-logo-share.png&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2858705/&quot;&gt;&lt;i&gt;In Vitro&lt;/i&gt; Reconstitution of SARS-Coronavirus mRNA Cap Methylation&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;SARS-coronavirus (SARS-CoV) genome expression depends on the synthesis of a set of mRNAs, which presumably are capped at their 5′ end and direct the synthesis of all viral proteins in the infected cell. Sixteen viral non-structural proteins (nsp1 ...&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;www.ncbi.nlm.nih.gov&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Like the poly-A tail, the 5’ cap helps the genome to be recognized and translated by ribosomes rather than destroyed by the cell’s immune response.&lt;/p&gt;

&lt;p&gt;How does the virus even ensure that it receives a 5’ cap and a poly-A tail, not to mention its outer coat? Hopefully these questions will be resolved by our review of its genes… let’s move on to look at those!&lt;/p&gt;

&lt;h1 id=&quot;translation&quot;&gt;Translation&lt;/h1&gt;

&lt;p&gt;Per the “Features” section of the genome, again from &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3&quot;&gt;NCBI GenBank&lt;/a&gt;, here are the identifiable genes in this genome, in order:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf1ab&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1791269089&quot;&gt;orf1ab polyprotein&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1791269090&quot;&gt;surface glycoprotein&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf3a&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1791269091&quot;&gt;orf3a protein&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;E&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1791269092&quot;&gt;envelope protein&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;M&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1791269093&quot;&gt;membrane glycoprotein&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf6&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1791269094&quot;&gt;orf6 protein&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf7a&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1791269095&quot;&gt;orf7a protein&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf8&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1791269096&quot;&gt;orf8 protein&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1798172432&quot;&gt;nucleocapsid phosphoprotein&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf10&lt;/code&gt; (for &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/protein/1798172433&quot;&gt;orf10 protein&lt;/a&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let’s understand how these genes get translated into proteins.
&lt;!-- Let&apos;s go through them one by one. By the end, my hope is to understand how they fit together, what they each do, and which ones form which components of that classic spiky coronavirus image we see everywhere. --&gt;&lt;/p&gt;

&lt;h2 id=&quot;translation-of-orf1ab&quot;&gt;Translation of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf1ab&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;This is the first gene in the genome and it is also by far the &lt;em&gt;longest&lt;/em&gt;, weighing in at 7,096 amino acids:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;1    MESLVPGFNE KTHVQLSLPV LQVRDVLVRG FGDSVEEVLS EARQHLKDGT CGLVEVEKGV
61   LPQLEQPYVF IKRSDARTAP HGHVMVELVA ELEGIQYGRS GETLGVLVPH VGEIPVAYRK
121  VLLRKNGNKG AGGHSYGADL KSFDLGDELG TDPYEDFQEN WNTKHSSGVT RELMRELNGG

...

6961 LGGSVAIKIT EHSWNADLYK LMGHFAWWTA FVTNVNASSS EAFLIGCNYL GKPREQIDGY
7021 VMHANYIFWR NTNPIQLSSY SLFDMSKFPL KLRGTAVMSL KEGQINDMIL SLLSKGRLII
7081 RENNRVVISS DVLVNN
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;These letters are &lt;a href=&quot;https://en.wikipedia.org/wiki/Amino_acid#Table_of_standard_amino_acid_abbreviations_and_properties&quot;&gt;single-letter amino acid abbreviations&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It is quite long: this virus has 10 genes, and this single gene represents 71.2% of the viral genome.&lt;sup id=&quot;fnref:3&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; More on this polypeptide’s structure and function later, but first: how do the underlying nucleotides of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf1ab&lt;/code&gt; gene produce these particular amino acids?&lt;/p&gt;

&lt;h3 id=&quot;a-thermodynamic-surprise-ribosomal-frameshift&quot;&gt;A thermodynamic surprise: Ribosomal frameshift&lt;/h3&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf1ab&lt;/code&gt; gene spans the range from nucleotide 266 to nucleotide 21,555, &lt;a href=&quot;https://stackoverflow.com/questions/39010041/what-is-the-meaning-of-exclusive-and-inclusive-when-describing-number-ranges&quot;&gt;inclusive&lt;/a&gt;. Nucleotides in this GenBank data are unfortunately 1-indexed, not &lt;a href=&quot;https://en.wikipedia.org/wiki/Zero-based_numbering&quot;&gt;0-indexed&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We can see at nucleotide 266 the signature &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AUG&lt;/code&gt; of a start codon, and at nucleotide 21,553 the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UAA&lt;/code&gt; of an &lt;a href=&quot;https://en.wikipedia.org/wiki/Stop_codon&quot;&gt;ochre stop codon&lt;/a&gt;. So far so good!&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;241   ...                     ...AUGGA GAGCCUUGUC CCUGGUUUCA ACGAGAAAAC
301   ACACGUCCAA CUCAGUUUGC CUGUUUUACA GGUUCGCGAC GUGCUCGUAC GUGGCUUUGG
361   AGACUCCGUG GAGGAGGUCU UAUCAGAGGC ACGUCAACAU CUUAAAGAUG GCACUUGUGG

...

21481 CUUAGUAAAG GUAGACUUAU AAUUAGAGAA AACAACAGAG UUGUUAUUUC UAGUGAUGUU
21541 CUUGUUAACA ACUAA...                                           ...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;a target=&quot;blank&quot; class=&quot;cta-button&quot; style=&quot;position: relative;&quot; href=&quot;https://benchling.com/s/seq-28k9llmwnY475iv7ogwF/edit&quot;&gt;Follow along with the genome »&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, confusingly, the length of this coding region is 21,555 - 265 = &lt;u&gt;21,290&lt;/u&gt;, which is not divisible by 3. Usually, 3 nucleotides = 1 amino acid, so a gene’s length is typically divisible by 3. What’s going on?&lt;/p&gt;

&lt;p&gt;Note that in the GenBank data the gene is tagged &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ribosomal_slippage&lt;/code&gt;. Also note that in GenBank the gene’s region is notated as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;join(266..13468,13468..21555)&lt;/code&gt; instead of just &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;266..21555&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After some research, the answer here is that nucleotide 13,468 is actually used &lt;em&gt;twice&lt;/em&gt;, thanks to a &lt;em&gt;-1 ribosomal frameshift&lt;/em&gt;, a fascinating thermodynamic-biochemical quirk of certain viral genomes!&lt;/p&gt;

&lt;p&gt;Per &lt;a href=&quot;https://viralzone.expasy.org/860&quot;&gt;this article&lt;/a&gt; on ribosomal frameshifting in viruses:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Programmed ribosomal frameshifting  is an alternate mechanism of translation to merge proteins encoded by two overlapping open reading frames. The frameshift occurs at low frequency and consists of ribosomes slipping by one base in either the 5’(-1) or 3’(+1) directions during translation. Some viruses contains both a +1 and a -1 ribosomal frameshift. […]&lt;/p&gt;

  &lt;p&gt;All cis-acting frameshift signals encoded in mRNAs are minimally composed of two functional elements: a &lt;strong&gt;heptanucleotide “slippery sequence”&lt;/strong&gt; conforming to the general form &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;XXXYYYZ&lt;/code&gt;, followed by an RNA structural element, usually an H-type RNA pseudoknot, positioned an optimal number of nucleotides (5 to 9) downstream.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If we look around nucleotide 13,468, we do in fact find the heptanucleotide “slippery sequence” responsible: it’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUUAAAC&lt;/code&gt;. That &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt; is nucleotide 13,468 and it ends up getting transcribed twice.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;13441 GUCAGCUGAU GCACAAUCGU UUUUAAACGG GUUUGCGGUG UAAGUGCAGC CCGUCUUACA
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;a target=&quot;blank&quot; class=&quot;cta-button&quot; style=&quot;position: relative;&quot; href=&quot;https://benchling.com/s/seq-28k9llmwnY475iv7ogwF/edit&quot;&gt;Follow along with the genome »&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This frameshift gives us a total length of &lt;u&gt;21,291&lt;/u&gt; nucleotides. Subtract 3 for the stop codon and then divide by 3, and we get a number which matches the reported protein sequence’s length: 7,096 amino acids. Hooray!&lt;/p&gt;

&lt;p&gt;So, the math checks out. We now know what &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ribosomal_slippage&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;join(266..13468,13468..21555)&lt;/code&gt; mean, and we know how these 21,290 nucleotides become 7,096 amino acids. However, I still have two questions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;em&gt;What??&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;How does ribosomal frameshifting even work??&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;ribosomal-frameshifting-at-the-molecular-level&quot;&gt;Ribosomal frameshifting at the molecular level&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Thermodynamic control of -1 programmed ribosomal frameshifting&lt;/em&gt; (&lt;a href=&quot;https://www.nature.com/articles/s41467-019-12648-x&quot;&gt;Bock et al. 2019&lt;/a&gt;) explores how ribosomal frameshifting happens by performing free-energy molecular dynamics simulations. This paper also explains the structure and function of that heptanucleotide slippery sequence, stating:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Spontaneous ribosome slippage is a rare event that occurs, on average, once in 10&lt;sup&gt;4&lt;/sup&gt;–10&lt;sup&gt;5&lt;/sup&gt; codons. This low spontaneous frameshifting increases dramatically on particular mRNAs that contain sequences for programmed ribosomal frameshifting (PRF). PRF &lt;strong&gt;requires a slippery sequence&lt;/strong&gt;, which usually comprises a X XXY YYZ heptamer, where XXX and YYY are triplets of identical bases and Z is any nucleotide, which allows for &lt;strong&gt;cognate pairing&lt;/strong&gt; of the P-site and A-site tRNAs in the 0-frame and −1-frame. The nature of the tRNAs bound to the slippery site codons is critical, including the modifications of nucleotides in the anticodon loop (i.e., at positions 34 and 37 of the tRNA).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;https://media.springernature.com/m685/springer-static/image/art%3A10.1038%2Fs41467-019-12648-x/MediaObjects/41467_2019_12648_Fig1_HTML.png&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://www.nature.com/articles/s41467-019-12648-x&quot;&gt;Thermodynamic control of −1 programmed ribosomal frameshifting&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;Programmed ribosomal frameshifting (PRF) is an alternative translation strategy that causes controlled slippage of the ribosome along the mRNA, changing the sequence of the synthesized protein. Here the authors provide a thermodynamic framework that explains how mRNA sequence determines the efficiency of frameshifting.&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;www.nature.com&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This paper goes on to analyze several heptanucleotide slippery sequences, drawing examples from the &lt;em&gt;E. coli&lt;/em&gt; &lt;a href=&quot;https://biocyc.org/gene?orgid=ECOLI&amp;amp;id=EG10245&quot;&gt;&lt;em&gt;dnaX&lt;/em&gt;&lt;/a&gt; gene and explaining their thermodynamic characteristics. Per their breakdown, each example sequence depends on one or more of the following &lt;span id=&quot;wobble-pairings&quot;&gt;wobble pairings&lt;/span&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;The U·G wobble pair.&lt;/strong&gt; Per &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1083677&quot;&gt;Varani and McClain 2000&lt;/a&gt;, the U·G wobble pair “has comparable thermodynamic stability to Watson–Crick base pairs and is nearly isomorphic to them.”&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;A·A and U·U mismatches.&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;G·S and A·S pairs.&lt;/strong&gt; Per Bock et al. 2019, &lt;em&gt;E. coli&lt;/em&gt; “has a single tRNA&lt;sup&gt;Lys&lt;/sup&gt; isoacceptor (anticodon &lt;sup&gt;3’&lt;/sup&gt;UUS&lt;sup&gt;5’&lt;/sup&gt;) for decoding the two Lys codons, AAG and AAA,” where “S denotes the modified nucleotide &lt;a href=&quot;https://pubchem.ncbi.nlm.nih.gov/compound/5-Methylaminomethyl-2-thiouridine&quot;&gt;mnm5s2U&lt;/a&gt;.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, one of the heptanucleotide slippery sequences explained in Bock et al. is the heptanucleotide sequence &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUUUAAG&lt;/code&gt;. When the ribosome reads this sequence, it initially parses it as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;..U&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUU&lt;/code&gt;&lt;sup&gt;Phe&lt;/sup&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AAG&lt;/code&gt;&lt;sup&gt;Lys&lt;/sup&gt;, but then jolts -1 backwards into the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;...&lt;/code&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUU&lt;/code&gt;&lt;sup&gt;Phe&lt;/sup&gt; &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UAA&lt;/code&gt;&lt;sup&gt;Lys&lt;/sup&gt; reading frame. Despite normally being translated as a stop codon, that second &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UAA&lt;/code&gt; retains its attached tRNA&lt;sup&gt;Lys&lt;/sup&gt; via the combination of a U·U mismatch and an A·S pair.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/frameshift.png&quot; style=&quot;max-height: 400px&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
The UUUUAAG slippery sequence.
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;Unfortunately, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUUUAAG&lt;/code&gt; isn’t the sequence we’re interested in if we want to understand SARS-CoV-2! We need our &lt;em&gt;particular&lt;/em&gt; heptanucleotide slippery sequence of interest, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUUAAAC&lt;/code&gt;, and despite this paper’s thoroughness and usefulness, none of its examples involve it. How can we be sure that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUUAAAC&lt;/code&gt; has the thermodynamic properties that it needs in order for SARS-CoV-2 to be able to produce a protein here?&lt;/p&gt;

&lt;p&gt;After &lt;a href=&quot;https://twitter.com/csvoss/status/1271931010335125504?s=20&quot;&gt;some investigation&lt;/a&gt;, I finally stumbled upon &lt;em&gt;Mutational Analysis of the “Slippery-sequence” Component of a Coronavirus Ribosomal Frameshifting Signal&lt;/em&gt; (&lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/002228369290901U&quot;&gt;Brierley, Jenner, and Inglis 1992&lt;/a&gt;), a research paper which covers exactly this same heptanucleotide sequence (and in the context of coronaviruses, too), and even gives a helpful diagram!&lt;/p&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img style=&quot;display: none&quot; class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;https://ars.els-cdn.com/content/image/S00222836.gif&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://www.sciencedirect.com/science/article/pii/002228369290901U&quot;&gt;Mutational analysis of the “slippery-sequence” component of a coronavirus ribosomal frameshifting signal&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;The ribosomal frameshift signal in the genomic RNA of the coronavirus IBV is composed of two elements, a heptanucleotide “slippery-sequence” and a dow…&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;www.sciencedirect.com&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/brierley_transparent.png&quot; style=&quot;max-height: 600px&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
The UUUAAAC slippery sequence.
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;The paper performs some experiments and confirms how the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUUAAAC&lt;/code&gt; slippery sequence works:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;First, a tRNA&lt;sup&gt;Leu&lt;/sup&gt; and a tRNA&lt;sup&gt;Asn&lt;/sup&gt; bind to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUA&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AAC&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;After a -1 frameshift, those two tRNAs are now wobble-paired to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UUU&lt;/code&gt; (with a U·U mismatch) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AAA&lt;/code&gt; (with an A·G mismatch&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;).&lt;/li&gt;
  &lt;li&gt;Translation then proceeds as normal from there, with the next codon producing a tRNA&lt;sup&gt;Arg&lt;/sup&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I’m still a little weirded out by ribosomal frameshift, but satisfied.&lt;/p&gt;

&lt;!-- (By the way, you might notice that the above diagram depicts the ribosome as having a [P-site](https://en.wikipedia.org/wiki/P-site) and an [A-site](https://en.wikipedia.org/wiki/A-site), but no [E-site](https://en.wikipedia.org/wiki/E-site)! Note the paper is from 1992. The E-site was discovered recently enough that my own high school biology teacher had stories of having to change the curriculum when the new textbooks came in and, surprise surprise, the textbook started showing ribosomes as having an E-site. It appears that the universality of the E-site across all domains of life [was only confirmed](https://www.researchgate.net/publication/27268735_Features_and_functions_of_the_ribosomal_E_site) in the mid-1990s, although I&apos;m unable to find a fully conclusive source for this.) --&gt;

&lt;h3 id=&quot;partial-translation-of-orf1ab&quot;&gt;Partial translation of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf1ab&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;One last detail while we’re discussing the gene &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf1ab&lt;/code&gt;. Numerous papers I’ve read so far seem to allude to the fact that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf1ab&lt;/code&gt; actually produces two protein products: one, its complete protein product (named &lt;em&gt;pp1ab&lt;/em&gt;), and another, a partial translation (named &lt;em&gt;pp1a&lt;/em&gt;) due to the ribosome falling off at the ribosomal frameshift instead of undergoing a frameshift event. That first half of the sequence itself can be called the gene &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf1a&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example, from &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2637536/&quot;&gt;Graham et al. 2008&lt;/a&gt; on the SARS coronavirus:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Translation of ORF1a results in a theoretical polyprotein of ∼500 kDa, while translation of ORF1ab results an ∼800 kDa polyprotein.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;The ORF 1a and 1ab polyproteins are not detected during infection, since they are most likely processed co- and post-translationally into intermediate and mature proteins by proteinase activities in the nascent polyproteins.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Both of these genes produce &lt;em&gt;polyproteins&lt;/em&gt; that actually get &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/gene/1489680&quot;&gt;chopped up into smaller proteins&lt;/a&gt; before going on to carry out their function, so the possibility of premature termination of &lt;em&gt;pp1a&lt;/em&gt; ends up being of little consequence except inasmuch as it partially reduces translation of &lt;em&gt;pp1ab&lt;/em&gt;.&lt;/p&gt;

&lt;h2 id=&quot;translation-of-all-the-other-genes&quot;&gt;Translation of all the other genes&lt;/h2&gt;

&lt;p&gt;This concludes our analysis of the translation of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf1ab&lt;/code&gt; gene. Genome 71.31% explained so far!&lt;/p&gt;

&lt;p&gt;In comparison, the remaining nine genes are fairly uneventful. They all start with a start codon (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AUG&lt;/code&gt;), end with a stop codon (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UAA&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UGA&lt;/code&gt;, or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;UAG&lt;/code&gt;), and don’t try to do anything tricky in between.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Gene           &lt;/th&gt;
      &lt;th&gt;Start Nucleotide           &lt;/th&gt;
      &lt;th&gt;End Nucleotide           &lt;/th&gt;
      &lt;th&gt;Gene Length           &lt;/th&gt;
      &lt;th&gt;Polypeptide Length           &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;S&lt;/code&gt;   &lt;/td&gt;
      &lt;td&gt;21563&lt;/td&gt;
      &lt;td&gt;25384&lt;/td&gt;
      &lt;td&gt;3822&lt;/td&gt;
      &lt;td&gt;1274&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf3a&lt;/code&gt;   &lt;/td&gt;
      &lt;td&gt;25393&lt;/td&gt;
      &lt;td&gt;26220&lt;/td&gt;
      &lt;td&gt;828&lt;/td&gt;
      &lt;td&gt;276&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;E&lt;/code&gt;   &lt;/td&gt;
      &lt;td&gt;26245&lt;/td&gt;
      &lt;td&gt;26472&lt;/td&gt;
      &lt;td&gt;228&lt;/td&gt;
      &lt;td&gt;76&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;M&lt;/code&gt;   &lt;/td&gt;
      &lt;td&gt;26523&lt;/td&gt;
      &lt;td&gt;27191&lt;/td&gt;
      &lt;td&gt;669&lt;/td&gt;
      &lt;td&gt;223&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf6&lt;/code&gt;   &lt;/td&gt;
      &lt;td&gt;27202&lt;/td&gt;
      &lt;td&gt;27387&lt;/td&gt;
      &lt;td&gt;186&lt;/td&gt;
      &lt;td&gt;62&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf7a&lt;/code&gt;   &lt;/td&gt;
      &lt;td&gt;27394&lt;/td&gt;
      &lt;td&gt;27759&lt;/td&gt;
      &lt;td&gt;366&lt;/td&gt;
      &lt;td&gt;122&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf8&lt;/code&gt;   &lt;/td&gt;
      &lt;td&gt;27894&lt;/td&gt;
      &lt;td&gt;28259&lt;/td&gt;
      &lt;td&gt;366&lt;/td&gt;
      &lt;td&gt;122&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt;   &lt;/td&gt;
      &lt;td&gt;28274&lt;/td&gt;
      &lt;td&gt;29533&lt;/td&gt;
      &lt;td&gt;1260&lt;/td&gt;
      &lt;td&gt;420&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Orf10&lt;/code&gt;   &lt;/td&gt;
      &lt;td&gt;29558&lt;/td&gt;
      &lt;td&gt;29674&lt;/td&gt;
      &lt;td&gt;117&lt;/td&gt;
      &lt;td&gt;39&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;
By their powers combined, that explains 97.53% of the genome. If you take a look at &lt;a href=&quot;https://benchling.com/s/seq-28k9llmwnY475iv7ogwF/edit&quot;&gt;the rest&lt;/a&gt;, you’ll see that there isn’t all that much left that’s not accounted for. With the two untranslated regions – there’s the 5’ UTR weighing in at 265 base pairs, and there’s the 3’ UTR (which includes the poly-A tail) weighing in at 229 base pairs. That covers 99.07% of the genome! The remaining 277 base pairs are scattered in the space between the ten genes.&lt;/p&gt;

&lt;p&gt;I now feel basically confident that I know what the nucleotides get translated to!&lt;/p&gt;

&lt;h2 id=&quot;secondary-structure&quot;&gt;Secondary structure&lt;/h2&gt;

&lt;p&gt;No discussion of the structure and function of a long, single-stranded piece of RNA is complete without a discussion on secondary structure.&lt;/p&gt;

&lt;p&gt;Yes, RNA has secondary structure too – it’s not just for proteins! Just like double-helical DNA binds one strand to another, a single strand of RNA can bind to itself when regions have sufficiently complementary nucleotides, forming stems, hairloops, and yet more complex 3D structures.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/secondary-structures.png&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;pseudoknot&lt;/em&gt; may sound familiar – it’s mentioned back in our discussion on ribosomal frameshifting:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;All cis-acting frameshift signals encoded in mRNAs are minimally composed of two functional elements: a heptanucleotide “slippery sequence” conforming to the general form &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;XXXYYYZ&lt;/code&gt;, followed by an RNA structural element, &lt;strong&gt;usually an H-type RNA pseudoknot,&lt;/strong&gt; positioned an optimal number of nucleotides (5 to 9) downstream.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;One of the more complex examples of RNA secondary structure, the pseudoknot was first discovered in the &lt;a href=&quot;https://en.wikipedia.org/wiki/Turnip_yellow_mosaic_virus&quot;&gt;turnip yellow mosaic virus&lt;/a&gt;, which is itself another single-stranded positive-sense RNA virus just like the coronavirus.&lt;/p&gt;

&lt;div class=&quot;unfurl-embed-card&quot;&gt;&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Pseudoknot.svg/1200px-Pseudoknot.svg.png&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Pseudoknot&quot;&gt;Pseudoknot - Wikipedia&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;en.wikipedia.org&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;What’s an &lt;em&gt;H-type RNA pseudoknot&lt;/em&gt;, though? According to &lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2661829/&quot;&gt;Cao and Chen 2009&lt;/a&gt;,&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;An H-type pseudoknot is formed by base-pairing between a hairpin loop and the single-stranded region of the hairpin. The structure consists of two &lt;strong&gt;helix stems&lt;/strong&gt; and &lt;strong&gt;two loops&lt;/strong&gt; as well as a possible &lt;strong&gt;third loop/junction&lt;/strong&gt; that connects the two helix stems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Perhaps we can determine the secondary structure of the SARS-CoV-2 genome, or if not, at least determine the secondary structure for the pseudoknot, as that part serves an important regulatory function.&lt;/p&gt;

&lt;p&gt;As we know from our study of ribosomal frameshifting, the pseudoknot should occur around 5 to 9 nucleotides downstream of the slippery sequence. Here’s a snippet of the surrounding genome sequence again:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;13441                 ...GU UUUUAAACGG GUUUGCGGUG UAAGUGCAGC CCGUCUUACA
13501 CCGUGCGGCA CAGGCACUAG UACUGAUGUC GUAUACAGGG CUUUUG...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;a target=&quot;blank&quot; class=&quot;cta-button&quot; style=&quot;position: relative;&quot; href=&quot;https://benchling.com/s/seq-28k9llmwnY475iv7ogwF/edit&quot;&gt;Follow along with the genome »&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For RNAs shorter than 4000 nucleotides, there exist some online tools such as &lt;a href=&quot;http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAfold.cgi&quot;&gt;RNAfold&lt;/a&gt; that can make predictions about the secondary structure of arbitrary RNAs; however, I’m unable to find any that can handle our 29,903-nucleotide minor behemoth of an RNA. So, that leaves me with searching through the existing literature to see what results exist for this particular big long RNA.&lt;/p&gt;

&lt;p&gt;Luckily, &lt;em&gt;RNA Genome Conservation and Secondary Structure in SARS-CoV-2 and SARS-Related Viruses&lt;/em&gt; (&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7217285/&quot;&gt;Rangan et al. 2020&lt;/a&gt;) gives an overview of some of the virus’s secondary structures, including the pseudoknot, as well as a thoughtful analysis of which structural elements have remained conserved as viruses in this family have evolved!&lt;/p&gt;

&lt;div class=&quot;unfurl-embed-card&quot;&gt;&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img style=&quot;display: none&quot; class=&quot;unfurl-embed-card-feature-image not-pretty&quot; src=&quot;https://www.ncbi.nlm.nih.gov/corehtml/pmc/pmcgifs/pmc-logo-share.png&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7217285/
&quot;&gt;RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;As the COVID-19 outbreak spreads, there is a growing need for a compilation of conserved RNA genome regions in the SARS-CoV-2 virus along with their structural propensities to guide development of antivirals and diagnostics. Using sequence alignments ...&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;www.ncbi.nlm.nih.gov&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As depicted in that paper’s figure 4, here is how the region near the slippery sequence winds itself into an H-type pseudoknot:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/pseudoknot.png&quot; style=&quot;height: 600px&quot; /&gt;&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;small&gt;Starting 6 nucleotides after the heptanucleotide slippery sequence, nucleotides 13474 through 13542 together form a pseudoknot. (Here drawn schematically in 2D – in real life they wouldn&apos;t be so stretched out!)&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;Note the presence here of a few U·G wobble pairs, as mentioned in our &lt;a href=&quot;#wobble-pairing&quot;&gt;earlier discussion&lt;/a&gt; on wobble pairings!&lt;/p&gt;

&lt;p&gt;We can view this pseudoknot in 3D by converting the above secondary structure into &lt;a href=&quot;https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/rna_structure_notations.html&quot;&gt;dot-bracket notation&lt;/a&gt;, passing it into &lt;a href=&quot;http://rnacomposer.cs.put.poznan.pl/&quot;&gt;RNAComposer&lt;/a&gt; in order to extract a predicted 3D folding as &lt;a href=&quot;/assets/pseudoknot.pdb&quot;&gt;a PDB file&lt;/a&gt;, and finally passing &lt;em&gt;that&lt;/em&gt; into &lt;a href=&quot;http://web.x3dna.org/&quot;&gt;Web 3DNA&lt;/a&gt; to render a 3D image.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;GUUUUUAAACGGGUUUGCGGUGUAAGUGCAGCCCGUCUUACACCGUGCGGCACAGGCACUAGUACUGAUGUCGUAUACAGGGCUUUUG
...............(((((((((((...[[[[[[[)))))))))))[[[[[[[[.........]]].]]]]]...]].]]]]]....
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p align=&quot;center&quot;&gt;&lt;small&gt;The pseudoknot in dot-bracket notation.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/Pseudoknot_3D.png&quot; style=&quot;height: 600px&quot; /&gt;&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;small&gt;The pseudoknot, rendered in 3D.&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;For more information on secondary structure and how it can affect the life cycle and transmissibility of RNA viruses, as well as details about how secondary structure can be elucidated, I also recommend:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;em&gt;RNA Structure—A Neglected Puppet Master for the Evolution of Virus and Host Immunity&lt;/em&gt; (&lt;a href=&quot;https://www.frontiersin.org/articles/10.3389/fimmu.2018.02097/full&quot;&gt;Smyth et al. 2018&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Viral RNAs Are Unusually Compact&lt;/em&gt; (&lt;a href=&quot;https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0105875&quot;&gt;Gopal et al. 2014&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Visualizing the global secondary structure of a viral RNA genome with cryo-electron microscopy&lt;/em&gt; (&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4408795/&quot;&gt;Garmann et al. 2015&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;The influence of viral RNA secondary structure on interactions with innate host cell defences&lt;/em&gt; (&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3950689/&quot;&gt;Witteveldt et al. 2014&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;shoutouts&quot;&gt;Shoutouts&lt;/h2&gt;

&lt;p&gt;Thanks to &lt;a href=&quot;https://twitter.com/lauramvaughan&quot;&gt;Laura Vaughan&lt;/a&gt; for a thoughtful review of this piece and for help with RNA 3D structure visualization!&lt;/p&gt;

&lt;p&gt;Shoutout also to &lt;a href=&quot;https://nabeelqu.co/&quot;&gt;Nabeel Qureshi&lt;/a&gt; who has worked on a &lt;a href=&quot;https://twitter.com/nabeelqu/status/1271932798119612419?s=20&quot;&gt;similar endeavor&lt;/a&gt;, writing up &lt;a href=&quot;https://github.com/nqureshi/sars-cov-2/blob/master/SARS-Cov-2.ipynb&quot;&gt;an executable iPython notebook&lt;/a&gt; for understanding the full coronavirus genome.&lt;/p&gt;

&lt;p&gt;As fate would have it, I know &lt;a href=&quot;https://twitter.com/ramyarangan&quot;&gt;Ramya Rangan&lt;/a&gt; from our time as students of &lt;a href=&quot;https://twitter.com/jeanqasaur&quot;&gt;Jean Yang&lt;/a&gt;! It was really exciting to organically stumble upon a former colleague’s recent work, and especially to have so many of my questions about coronaviral secondary structure just answered immediately by that work.&lt;/p&gt;

&lt;!-- ## Next time: Protein is great!

That&apos;s all for now! Stay tuned for part 2, _A Mechanist&apos;s Guide to the Coronavirus Proteome_.

&lt;a href=&quot;https://www.gunnerkrigg.com/?p=75&quot;&gt;&lt;img src=&quot;/assets/proteins_yay.png&quot; style=&quot;height: 211px;&quot;/&gt;&lt;/a&gt;
 --&gt;

&lt;p&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;
&lt;h4 id=&quot;footnotes&quot;&gt;Footnotes&lt;/h4&gt;

&lt;div class=&quot;footnotes&quot; role=&quot;doc-endnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:2&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;Update: &lt;a href=&quot;https://rationalconspiracy.com/&quot;&gt;Alyssa Vance&lt;/a&gt; notes to me that targeting RdRP is actually the &lt;a href=&quot;https://en.m.wikipedia.org/wiki/Remdesivir#Mechanism_of_action&quot;&gt;mechanism of action of remdesivir&lt;/a&gt;! Nabeel Qureshi &lt;a href=&quot;https://twitter.com/nabeelqu/status/1295468129116717059?s=20&quot;&gt;adds&lt;/a&gt; that this is a property of &lt;a href=&quot;https://en.wikipedia.org/wiki/Favipiravir&quot;&gt;favipiravir&lt;/a&gt; as well. From Wikipedia: &lt;em&gt;“As an adenosine nucleoside triphosphate analog, the active metabolite of remdesivir interferes with the action of viral RNA-dependent RNA polymerase and evades proofreading by viral exoribonuclease, causing a decrease in viral RNA production.”&lt;/em&gt; &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;A previous version of this essay claimed that all of orf1ab encodes for RdRP because I had seen that &lt;a href=&quot;https://www.uniprot.org/uniprot/A7J8L2&quot;&gt;this protein product&lt;/a&gt; consumes monomers of RNA and catalyzes their polymerization. This is untrue! Nabeel Qureshi has &lt;a href=&quot;https://twitter.com/nabeelqu/status/1296244985260539904?s=20&quot;&gt;pointed out&lt;/a&gt; to me that RdRP is only a small fraction of the orf1ab polyprotein. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:1&quot; role=&quot;doc-endnote&quot;&gt;
      &lt;p&gt;You might note that “A·G mismatch” is not among the canonical &lt;a href=&quot;#wobble-pairings&quot;&gt;wobble pairing&lt;/a&gt; types that we had already discussed. I’m therefore a little skeptical of it; can it really be so thermodynamically favorable, especially given how A and G are such big unwieldy &lt;a href=&quot;https://en.wikipedia.org/wiki/Purine&quot;&gt;purines&lt;/a&gt;? Some quick research reveals that it’s at least &lt;a href=&quot;https://pubmed.ncbi.nlm.nih.gov/12893083/&quot;&gt;possible&lt;/a&gt;, if uncommon. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot; role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
        <pubDate>Sun, 16 Aug 2020 00:00:00 +0000</pubDate>
        <link>https://csvoss.com//a-mechanists-guide-to-the-coronavirus-genome</link>
        <guid isPermaLink="true">https://csvoss.com//a-mechanists-guide-to-the-coronavirus-genome</guid>
        
        
        <category>projects</category>
        
      </item>
      
    
      
      <item>
        <title>What is that scarf?</title>
        <description>&lt;p&gt;If you meet me in person, you might notice that I’m carrying around one of these knitting projects. Both projects bear patterns that are imbued with deeper mathematical stories.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/1.jpg&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;rule-110-scarf&quot;&gt;Rule 110 Scarf&lt;/h2&gt;

&lt;p&gt;My first knitting project is a scarf that depicts Wolfram’s &lt;a href=&quot;http://mathworld.wolfram.com/Rule110.html&quot;&gt;Rule 110&lt;/a&gt; elementary cellular automaton.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/2.gif&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;hang-on-whats-an-elementary-cellular-automaton&quot;&gt;Hang on, what’s an elementary cellular automaton?&lt;/h3&gt;

&lt;p&gt;Maybe you’ve heard of &lt;a href=&quot;https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life&quot;&gt;Conway’s Game of Life&lt;/a&gt;? Conway’s Game of Life is a &lt;em&gt;two-dimensional&lt;/em&gt; cellular automaton; it features a grid of pixels, and the grid evolves according to a particular ruleset, timestep by timestep.&lt;/p&gt;

&lt;p&gt;Wolfram’s elementary cellular automata are similar, but they’re &lt;em&gt;one-dimensional;&lt;/em&gt; a single line of pixels evolves according to a particular ruleset, timestep by timestep. Typically, the time dimension is shown going downwards, as in the above depiction of rule 110.&lt;/p&gt;

&lt;p&gt;Furthermore, elementary cellular automata are constrained so that each pixel in the next generation can only depend on the &lt;em&gt;three pixels&lt;/em&gt; immediately above it.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/3.gif&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
(&lt;a href=&quot;https://commons.wikimedia.org/w/index.php?curid=74536256&quot;&gt;image credit&lt;/a&gt;)
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;There are many ways to create rules for elementary cellular automata. Each rule is uniquely defined by the set of eight pixels that it prescribes as the descendant for each of the eight possible three-pixel combinations of ancestor pixels. Therefore, there are 2^8 = 256 possible elementary cellular automaton rules, of which rule 110 is merely one.&lt;/p&gt;

&lt;h3 id=&quot;why-choose-rule-110&quot;&gt;Why choose Rule 110?&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;/assets/scarf/4.jpg&quot; alt=&quot;Rule 110 after 250 iterations&quot; /&gt;&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
Rule 110 after 250 iterations
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;It just so happens that Rule 110 is &lt;a href=&quot;https://en.wikipedia.org/wiki/Turing_completeness&quot;&gt;Turing-complete&lt;/a&gt;, as proven in &lt;a href=&quot;http://wpmedia.wolfram.com/uploads/sites/13/2018/02/15-1-1.pdf&quot;&gt;Universality in Elementary Cellular Automata&lt;/a&gt; (Cook 2004)!&lt;/p&gt;

&lt;p&gt;(Conway’s Game of Life, incidentally, is also Turing-complete, as shown by such lovely constructions as &lt;a href=&quot;https://www.youtube.com/watch?v=My8AsV7bA94&quot;&gt;Universal Turing Machine&lt;/a&gt; in Life or even &lt;a href=&quot;https://www.youtube.com/watch?v=xP5-iIeKXE8&quot;&gt;Life in Life&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;The proof is a cute little construction. First, it identifies several &lt;em&gt;gliders&lt;/em&gt; which move predictably through a specially-defined “ether” (a repeating background pattern), and then it classifies the ways in which the gliders can collide and interact depending on their spacing at the time of their collision.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/5.png&quot; alt=&quot;Three gliders&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
Three gliders (&lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Ca110-structures2.png&quot;&gt;image credit&lt;/a&gt;)
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/6.png&quot; alt=&quot;Two types of collision&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
Two types of collision (&lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Ca110-interaction2.png&quot;&gt;image credit&lt;/a&gt;)
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;By carefully handling the ways in which gliders are initialized and are set up to collide, it is possible to carry out enough computation in order to emulate &lt;a href=&quot;https://en.wikipedia.org/wiki/Tag_system&quot;&gt;cyclic tag systems&lt;/a&gt;, a formalism which was already known to be Turing-complete.&lt;/p&gt;

&lt;p&gt;This particular knitting project is close to my heart, as Rule 110’s Turing-completeness gives it a certain numinous beauty and mystery. Why, of Wolfram’s 256 elementary cellular automata, should it be the case that that there is &lt;em&gt;even one&lt;/em&gt; that is Turing-complete? Amidst the unruly chaos of Rule 30’s &lt;a href=&quot;http://mathworld.wolfram.com/Rule30.html&quot;&gt;pseudorandom number generation&lt;/a&gt; and the ordered austerity of Rule 90’s &lt;a href=&quot;http://mathworld.wolfram.com/Rule90.html&quot;&gt;perfect Sierpinski triangles&lt;/a&gt;, what deep mathematical forces conspired to give us this – this simple rule which, by itself alone, is capable of &lt;em&gt;just&lt;/em&gt; enough complex behavior to be capable of carrying out computations? It balances on the knife’s edge between the dynamic and the static, exhibiting a kind of critical behavior of the sort that can also be seen in discussions of &lt;a href=&quot;https://arxiv.org/pdf/1707.05952.pdf&quot;&gt;neuroscience&lt;/a&gt; and the physics of &lt;a href=&quot;https://en.wikipedia.org/wiki/Phase_transition#Critical_exponents_and_universality_classes&quot;&gt;phase transitions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/7.gif&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
Rule 30, sometimes used as a pseudorandom number generator
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/8.gif&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
Rule 90, which generates Sierpinski triangles when initialized with a single pixel
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;The scarf also serves for me as a physical symbol of the following cluster of philosophical problems: What ethics govern the simulation of minds? If I were to create a scarf with a simulation of a mind in a video game environment (such as the one in &lt;a href=&quot;https://store.steampowered.com/app/257510/The_Talos_Principle/&quot;&gt;The Talos Principle&lt;/a&gt;), is it at all meaningful – ethically – for me to sit there and knit out the entire scarf so that the simulation’s output is legible to us, or is it equally meaningful for me to merely cast on the start state of the scarf, and note how I might allow the computation to proceed if I &lt;em&gt;were&lt;/em&gt; to knit out the entire scarf? If the two situations are the same, is it also equally meaningful for me to merely &lt;em&gt;think&lt;/em&gt; about creating the scarf, designing its start state in my head, but never casting it onto physical yarn? And if the full spectrum of situations are not the same, then why?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/9.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I used YouTube videos to learn the technique of double-knitting in order to be able to construct a scarf with a two-color pixelated pattern. Due to the double-knitting constraint, the pixels on one side of the scarf are inverted on the other side, so &lt;em&gt;technically&lt;/em&gt; it’s a scarf bearing Rule 110 on one side and &lt;a href=&quot;http://atlas.wolfram.com/01/01/193/&quot;&gt;Rule 193&lt;/a&gt; on the other.&lt;/p&gt;

&lt;h2 id=&quot;rule-105-scarf&quot;&gt;Rule 105 Scarf&lt;/h2&gt;

&lt;p&gt;Unlike the rule 110 scarf, this knitting project is my current work in progress.&lt;/p&gt;

&lt;p&gt;The premise: Let’s say you want to create a &lt;a href=&quot;https://en.wikipedia.org/wiki/M%C3%B6bius_strip&quot;&gt;Möbius strip&lt;/a&gt; scarf… but you &lt;em&gt;also&lt;/em&gt; want it to be a cellular automaton.&lt;/p&gt;

&lt;p&gt;Simply diving in to knitting any old cellular automaton won’t do, because in order to make it be Möbius, you must flip it around and join it to itself as the end of the process. For example, I clearly cannot do this with my rule 110 scarf. As shown, the final row does not proceed smoothly into the starting row, as would have been required!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/10.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;What’s more, even deeper than this requirement about the final row’s behavior is the requirement that the whole &lt;em&gt;cellular automaton rule&lt;/em&gt; must continue to be the same, &lt;em&gt;even when we flip the scarf around&lt;/em&gt;. As we can see in the above example, leftward-leaning white triangles on blue (Rule 110) is a completely different pattern texture from rightward-leaning blue triangles on white (Rule 193).&lt;/p&gt;

&lt;p&gt;This imposes the following two constraints on any cellular automaton we choose to Möbiusify:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The rule must be &lt;em&gt;left-right symmetric.&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;The rule must be &lt;em&gt;zero-one symmetric&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Alternatively put, using Wolfram’s terminology, the rule must be equal to its &lt;em&gt;mirrored complement&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This requirement limits us to a small handful of valid rules, many of which are so simple as to be blindingly, boringly unaesthetic. (&lt;a href=&quot;http://atlas.wolfram.com/01/01/51/&quot;&gt;Rule 51&lt;/a&gt; is one example of a blindingly boring rule, not that I have any problem with stripes.)&lt;/p&gt;

&lt;p&gt;I decided to choose &lt;a href=&quot;http://atlas.wolfram.com/01/01/105/&quot;&gt;Rule 105&lt;/a&gt; for its aesthetics. &lt;a href=&quot;http://mathworld.wolfram.com/Rule150.html&quot;&gt;Rule 150&lt;/a&gt; is another nice one that I considered.&lt;/p&gt;

&lt;p&gt;I also needed to choose how to deal with the boundary conditions. I decided to treat the edges as if the scarf wraps around on itself. In a certain sense, this makes it perhaps more like a &lt;a href=&quot;https://en.wikipedia.org/wiki/Klein_bottle&quot;&gt;Klein bottle&lt;/a&gt; than a Möbius strip, spiritually if not literally.&lt;/p&gt;

&lt;p&gt;Given all of the above parameters, the only thing that remained to do was to find a starting state that produces a pattern that repeats on itself in an interesting, pretty way. Most starting states only repeat on themselves after a small number of timesteps – boring! I wanted to maximize the cycle length of my scarf, for maximum aesthetic appeal.&lt;/p&gt;

&lt;p&gt;I sent over all of the above parameters to &lt;a href=&quot;https://codegolf.stackexchange.com/users/39242/anders-kaseorg&quot;&gt;Anders&lt;/a&gt; as a small puzzle, and three lines of Mathematica later, received a characterization of the complete 22-dimensional optimal solution space on scarves of 38 columns, from which I chose the following start state. Using this start state, the pixels at row 511 are exactly what they need to be such that the row at 512 is a left-right-flipped and zero-one-flipped copy of row 1 – the mirrored complement!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/11.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
Row 1 and the first several rows.
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/12.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
Row 512 is the mirrored complement of row 1.
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;I’m practicing the &lt;em&gt;provisional cast-on&lt;/em&gt; knitting technique in order to make sure that once I reach row 511, I will be able to seamlessly knit the scarf to itself with nobody the wiser about where the scarf ever began or ended. That’s the red yarn in the below picture.&lt;/p&gt;

&lt;p&gt;Right now I’m around 30 rows in. It’s going to be a long one!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/13.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;I was inspired to start knitting by some friends who use knitting as a way to keep their hands busy while still attending to verbal inputs, such as having conversations or listening to others. The knitting projects themselves were inspired by the work of Fabienne Serriere (&lt;a href=&quot;https://knityak.com/&quot;&gt;KnitYak&lt;/a&gt;), who creates scarves with &lt;a href=&quot;http://mathworld.wolfram.com/ElementaryCellularAutomaton.html&quot;&gt;cellular automata&lt;/a&gt; and other beautiful mathematical patterns using her industrial knitting machine. (She’s even got some of &lt;a href=&quot;https://knityak.com/products/rule-110-scarf-120-elementary-cellular-automata-knit-second?variant=36740646866&quot;&gt;Rule 110&lt;/a&gt; and &lt;a href=&quot;https://knityak.com/products/rule-105-wrap-21-elementary-cellular-automata-knit-second?variant=36578454930&quot;&gt;Rule 105&lt;/a&gt;!)&lt;/p&gt;

&lt;h2 id=&quot;future-work&quot;&gt;Future work&lt;/h2&gt;

&lt;p&gt;Once I finish the Möbius strip scarf, the ultimate knitting project for me would be to create &lt;em&gt;another&lt;/em&gt; Rule 110 project and have it actually encode a real Turing machine this time, unpacking the lessons from Cook’s proof of Rule 110’s Turing completeness in order to do so. I would pick a small Turing machine, convert it to a universal cyclic tag system using &lt;a href=&quot;https://link.springer.com/chapter/10.1007%2F11786986_13&quot;&gt;the polynomial reduction&lt;/a&gt; of Neary and Woods (2006), and then convert that into a Rule 110 start state.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://arxiv.org/pdf/0906.3248.pdf&quot;&gt;A Concrete View of Rule 110 Computation&lt;/a&gt; (Cook 2009) demonstrates by example how to perform these reductions in order to compile a Turing machine into a Rule 110 cellular automaton. The resulting pattern would be wide enough that it would need be a blanket instead of a scarf!&lt;/p&gt;

&lt;p&gt;If I wanted to &lt;em&gt;truly satisfy&lt;/em&gt; my artistic preferences over such a project, I’d make sure to pick a Turing machine that’s a nice meaningful one. It’s too bad that &lt;a href=&quot;https://www.scottaaronson.com/busybeaver.pdf&quot;&gt;A Relatively Small Turing Machine Whose Behavior Is Independent of Set Theory&lt;/a&gt; (Aaronson and Yedidia 2016) has 7,918 states, otherwise I would love to encode it!&lt;/p&gt;

&lt;p&gt;The best-known five-state &lt;a href=&quot;https://www.scottaaronson.com/writings/bignumbers.html&quot;&gt;Busy Beaver Turing machine&lt;/a&gt; might be a nice, simple Turing machine to use for such a project. The best candidate currently known for BB(5) takes 47,176,870 steps before finally halting.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/14.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
State transitions for a candidate BB(5), as given in &lt;a href=&quot;https://pdfs.semanticscholar.org/6fa1/3c10697ee4f2815089f0c5f71bab7caf650a.pdf&quot;&gt;Attacking the Busy Beaver 5&lt;/a&gt;.
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/scarf/15.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
A space-time history of the activity of Rule 110, started at the top with a row of randomly set cells, from &lt;a href=&quot;http://wpmedia.wolfram.com/uploads/sites/13/2018/02/15-1-1.pdf&quot;&gt;Universality in Cellular Automata&lt;/a&gt;. Perhaps you see now what I mean by “blanket”.
&lt;/small&gt;&lt;/p&gt;
</description>
        <pubDate>Wed, 21 Aug 2019 00:00:00 +0000</pubDate>
        <link>https://csvoss.com//cellular-automaton-scarf</link>
        <guid isPermaLink="true">https://csvoss.com//cellular-automaton-scarf</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
      
      <item>
        <title>Reflections on Pilot’s Series B</title>
        <description>&lt;p&gt;Pilot, the startup I work at, &lt;a href=&quot;https://pilot.com/blog/introducing-pilot-tax/&quot;&gt;announced our Series B&lt;/a&gt; today.&lt;/p&gt;

&lt;p&gt;This is really exciting for me. I’ve been here for a little over a year and a half now. I joined right before &lt;a href=&quot;https://techcrunch.com/2018/03/14/pilot-raises-15m-to-bring-bookkeeping-into-the-modern-era/&quot;&gt;the Series A&lt;/a&gt;, and since then we’ve outgrown offices twice and hired to more than 4x what our size was when I joined. This funding round raises our valuation to $355M (!) and also includes a strategic investment from Stripe, aligning their interests with ours (!!).&lt;/p&gt;

&lt;p&gt;I joined Pilot in order to learn more stuff about engineering &lt;em&gt;and also&lt;/em&gt; to learn more stuff about organizational efficiency and effectiveness, and I’ve been really satisfied on both fronts. Right after I joined I sometimes described the decision to friends as “Well, this is the founders’ third startup together, and their first two were successful exits, so… [eyebrow wiggle suggesting probable future success].” Waseem, Jessica, and Jeff really do know what they’re doing, and I haven’t been disappointed.&lt;/p&gt;

&lt;h3 id=&quot;optimize-processes-not-just-tech&quot;&gt;Optimize processes, not just tech&lt;/h3&gt;

&lt;p&gt;At Pilot I’ve learned a bunch about the optimization of human-centered processes and why it’s important. As someone who came from a hard computer science background, this lesson is one that I’ve only really absorbed during the last year or two, and it came both from observing Pilot and from observing other highly functional institutions within my broader community.&lt;/p&gt;

&lt;p&gt;Pilot works on making bookkeeping less painful for humans to do, so that we can provide it at scale. This means not only that the engineering team’s automation tools have to be really good, but also that our human-centered processes for onboarding new team members and for spreading knowledge around have to be really good. As Catherine Olsson summarized my own description to me the other day, Pilot is “actually in the process optimization business.”&lt;/p&gt;

&lt;p&gt;Checklists, common knowledge generation, and open lines of interpersonal communication – these and other tools are all really important to designing well-functioning processes for organizations made out of people.&lt;/p&gt;

&lt;h3 id=&quot;aim-at-real-problems-and-propagate-incentives&quot;&gt;Aim at real problems, and propagate incentives&lt;/h3&gt;

&lt;p&gt;I’ve also learned that the powerful engine of process optimization absolutely &lt;em&gt;must&lt;/em&gt; be aimed at real problems in the world. I think this is a crucial puzzle piece that distinguishes startups that succeed from startups that don’t. I’ve liked getting to observe how the real-world problems faced by Pilot’s clients get propagated all the way down to the level of engineers’ day to day efforts.&lt;/p&gt;

&lt;p&gt;Our organization is actually set up such that the incentives are really well aligned throughout. Potential clients (the source of real-world problems) come in via the sales team. Fulfilling our service to our clients becomes the problem of our operations team, who is tasked with closing the books. We record metrics about what’s taking our operations team the most time, as well as free-form feedback, and then the product team distills that data into goals for engineering and design.&lt;/p&gt;

&lt;p&gt;The entire product process is &lt;a href=&quot;https://www.ribbonfarm.com/2010/07/26/a-big-little-idea-called-legibility/&quot;&gt;legible&lt;/a&gt; and &lt;a href=&quot;https://www.jefftk.com/p/responsible-transparency-consumption&quot;&gt;transparent&lt;/a&gt; to people on the engineering team, so I can audit what I’m working on and why, and question decisions at the appropriate level if I feel something’s off. Team members are also empowered to propose product directions directly.&lt;/p&gt;

&lt;p&gt;If anything, the one thing I’d worry about with our setup is a hypothetical world where the most stress to deliver our service to our clients is felt by the operations team, and where the engineering team doesn’t feel the fire as much as the operations team does. We have been addressing this by specifying specific goals for the engineering team to meet in order to properly support the operations team, so I can trust that we will succeed inasmuch as we craft those goals carefully. And since we’re using data and metrics to motivate our choices, plus the management team sometimes ends up doing a little bookkeeping if we find ourselves under-capacity, I can trust that the organization’s incentives are aligned to craft those goals well.&lt;/p&gt;

&lt;h3 id=&quot;culture-can-be-engineered&quot;&gt;Culture can be engineered&lt;/h3&gt;

&lt;p&gt;I’ve learned by example what it looks like for a community to deliberately engineer itself to have a good culture. On day 1, I was impressed when Jessica replied to a member of the team pointing out something to fix about part of our infrastructure by responding with “Good callout,” encouraging honest feedback. Just last month, &lt;a href=&quot;https://twitter.com/haikuginger&quot;&gt;a member of the engineering team&lt;/a&gt; took on the task of writing up a doc about our code review standards, making implicit social norms explicit. From various other members of the team I’ve learned about &lt;a href=&quot;https://codeascraft.com/2012/05/22/blameless-postmortems/&quot;&gt;blameless retrospectives&lt;/a&gt;, about &lt;a href=&quot;https://businessethicsblog.com/2010/12/08/business-ethics-and-the-new-york-times-rule/&quot;&gt;the New York Times rule&lt;/a&gt;, and about &lt;a href=&quot;http://akaptur.com/blog/2017/06/03/two-kinds-of-feedback/&quot;&gt;two kinds of feedback&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Pilot also has some employees with roots in the Recurse Center, whose &lt;a href=&quot;https://www.recurse.com/social-rules&quot;&gt;social rules&lt;/a&gt; and &lt;a href=&quot;https://www.recurse.com/diversity&quot;&gt;attention to inclusiveness&lt;/a&gt; are another example of an organization achieving a well-engineered culture.&lt;/p&gt;

&lt;p&gt;While we don’t currently have an explicit set of company values, as an individual participant in company culture I can tell from everyday communications that we think of ourselves as valuing work-life balance, diversity, and open communication, that we take actions that promote those values, and that we talk about those values often, thereby elevating them to common shared culture.&lt;/p&gt;

&lt;h3 id=&quot;short-feedback-loops-can-be-engineered&quot;&gt;Short feedback loops can be engineered&lt;/h3&gt;

&lt;p&gt;Although I already knew that short feedback loops are important for getting good engineering velocity (write unit tests! use debugging tools!), one thing I’ve learned at Pilot is that it’s possible to &lt;em&gt;structurally engineer an organizational system&lt;/em&gt; such that it has short feedback loops built in. In particular, our engineering team writes software for our in-house users, which means I have the luxury of not needing to regularly set up A/B tests in order to infer what’s going on in users’ minds; instead, we can ask our users directly.&lt;/p&gt;

&lt;h3 id=&quot;processes-can-become-stale--change-them&quot;&gt;Processes can become stale – change them&lt;/h3&gt;

&lt;p&gt;I’ve also seen firsthand that as organizations scale, the old processes and ways of doing things don’t necessarily work anymore and you have to find new ones. I’ve been pleased that as we’ve been growing, the management team has been attentive to whether there are any growing pains showing up in our processes, and I’ve been pleased that we’ve been actively working on moving to newer ways of doing things that work better for our current situation, rather than sticking with old solutions because they’re the way things have always been done.&lt;/p&gt;

&lt;p&gt;Since I have joined, we have actively improved our systems for triaging issues, for tracking projects, and for conducting engineering team meetings, each time preventing ourselves from getting stuck in old ways that weren’t serving us as well anymore.&lt;/p&gt;

&lt;p&gt;This seems like a pretty hard category of problem in general, and I’m interested in learning more heuristics for how to notice it and address it. It seems that as organizations grow larger, oftentimes problems like this can slip through the cracks.&lt;/p&gt;

&lt;h3 id=&quot;conclusions&quot;&gt;Conclusions&lt;/h3&gt;

&lt;p&gt;There are countless other things I’ve learned that I could mention as well. I’ve learned a lot of stuff about good habits for frontend development in &lt;a href=&quot;https://vuejs.org/&quot;&gt;Vue.js&lt;/a&gt; from Pilot’s first engineer. I’ve learned what good management feels like as an employee from our engineering manager’s care and diligence. I’ve learned new things about computer security directly from the &lt;a href=&quot;https://github.com/glyph&quot;&gt;founder&lt;/a&gt; of the &lt;a href=&quot;https://github.com/twisted/twisted&quot;&gt;Twisted&lt;/a&gt; project, including how SSL certificate issuance works and how to do &lt;a href=&quot;https://en.wikipedia.org/wiki/Threat_model&quot;&gt;formal threat modeling&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Overall, I’ve been really interested in institutioncraft lately, and Pilot feels like a success case in institutioncraft from which I’ve learned a whole bunch. Of course, the things I’ve learned so far, like all things I learn, are a work in progress that is continued every day. I’m looking forward to building more stuff with these amazing people and to learning more as we grow together.&lt;/p&gt;
</description>
        <pubDate>Wed, 17 Apr 2019 00:00:00 +0000</pubDate>
        <link>https://csvoss.com//reflections-on-pilot-series-b</link>
        <guid isPermaLink="true">https://csvoss.com//reflections-on-pilot-series-b</guid>
        
        
        <category>posts</category>
        
      </item>
      
    
      
      <item>
        <title>Reversible computing, and a puzzle</title>
        <description>&lt;p&gt;If you’ve seen my other posts, you probably already know that I am a sucker for good visual notations. Some of my favorites include &lt;a href=&quot;/projects/2015/11/08/lambda-circuitry.html&quot;&gt;circuitry for lambda calculus&lt;/a&gt; and &lt;a href=&quot;/projects/2017/07/29/deconstruct-talk.html&quot;&gt;Feynman diagrams&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So when I heard about a graphical notation for linear algebra, I really wanted to learn how it works. I decided to learn enough about &lt;a href=&quot;https://graphicallinearalgebra.net/&quot;&gt;Graphical Linear Algebra&lt;/a&gt; to be able to use the notation to express and solve one of my favorite puzzles from quantum computing. I was curious whether graphical linear algebra would make the problem more intuitive, and I think the notation actually succeeds!&lt;/p&gt;

&lt;p&gt;In this post, I’ll pose the puzzle, along with some relevant background about logic gates in reversible computing. In a future post, I’ll give the puzzle’s solution, both with and without the aid of graphical linear algebra.&lt;/p&gt;

&lt;h1 id=&quot;reversible-computing&quot;&gt;Reversible computing&lt;/h1&gt;

&lt;p&gt;We usually think of quantum computing (as exemplified by &lt;a href=&quot;https://en.wikipedia.org/wiki/Quantum_circuit&quot;&gt;quantum circuits&lt;/a&gt;) as being more powerful than classical computing (as exemplified by the &lt;a href=&quot;https://en.wikipedia.org/wiki/Turing_machine&quot;&gt;Turing machine&lt;/a&gt;), because it operates over &lt;em&gt;qubits&lt;/em&gt; and qubits can exist in superposition, unlike normal bits. However, it’s also true that programs written for quantum computers must obey some constraints that programs written for Turing machines need &lt;em&gt;not&lt;/em&gt; obey! In particular, all operations in a quantum circuit must be &lt;em&gt;reversible&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;For example, the classical &lt;a href=&quot;https://en.wikipedia.org/wiki/XOR_gate&quot;&gt;XOR&lt;/a&gt; gate consumes two inputs and produces a value that’s true if and only if the two inputs are different. XOR is an example of an &lt;em&gt;irreversible&lt;/em&gt; operation; knowing the value of A XOR B does not always give you enough information to derive what the values of the two inputs had been beforehand. An XOR gate has no inverse.&lt;/p&gt;

&lt;p&gt;In contrast, the quantum equivalent of an XOR gate must produce at least one more output bit in order to preserve reversibility. Not only that, it must also use that output bit in a way that ensures that the two outputs could in principle be reversed in order to recover the original two inputs. The &lt;a href=&quot;https://en.wikipedia.org/wiki/Controlled_NOT_gate&quot;&gt;CNOT gate&lt;/a&gt; is one way of implementing these constraints in order to create a component that can be used to calculate XOR.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/cnot.png&quot; alt=&quot;X cnot Y&quot; width=&quot;200&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Unlike XOR, a CNOT gate has a logic gate that acts as its inverse. (All reversible gates must.)&lt;/p&gt;

&lt;p&gt;The requirement for reversibility in quantum computing is, incidentally, an unavoidable consequence of the fact that the laws of physics require that &lt;em&gt;information be conserved&lt;/em&gt;. In quantum physics, this concept is known as &lt;a href=&quot;https://en.wikipedia.org/wiki/Unitarity_(physics)&quot;&gt;unitarity&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For more on this subject, I recommend Scott Aaronson’s excellent book &lt;em&gt;Quantum Computing Since Democritus&lt;/em&gt; for a tour of the theory behind both classical and quantum computing. I also recommend the paper &lt;a href=&quot;https://arxiv.org/abs/1504.05155&quot;&gt;The Classification of Reversible Bit Operations&lt;/a&gt; by Aaronson, Grier, and Schaeffer if you want to learn more about logic gates in reversible computing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tangent.&lt;/strong&gt; Real-world computers, like quantum computers (but unlike Turing machines), are actually built out of real physical stuff and must &lt;em&gt;also&lt;/em&gt; use only reversible operations, because they work under the same laws of physics as quantum computers. Hidden under all of our careful abstractions, there are still trash bits and there is still physical entropy at play. It’s just that we programmers typically prefer to elide over those details and ignore them when thinking about algorithm design.&lt;/p&gt;

&lt;p&gt;The details are still there, though; for example, it’s not possible to wipe a laptop’s hard drive, overwriting all of its data as 0s, without the entropy from that information being radiated out as some small amount of heat to the rest of the universe outside the laptop. This is a consequence of &lt;a href=&quot;https://en.wikipedia.org/wiki/Landauer%27s_principle&quot;&gt;Landauer’s principle&lt;/a&gt;, first described by Rolf Landauer in a 1961 &lt;a href=&quot;http://worrydream.com/refs/Landauer%20-%20Irreversibility%20and%20Heat%20Generation%20in%20the%20Computing%20Process.pdf&quot;&gt;paper&lt;/a&gt;, &lt;em&gt;Irreversibility and Heat Generation in the Computing Process&lt;/em&gt;. I would &lt;em&gt;love&lt;/em&gt; to know more about the implications of Landauer’s principle on the energy efficiency of various algorithms when reversibility and entropy are taken into account – what low-hanging fruit are algorithm designers missing out on? – but haven’t yet found any good source material that combines the physics concepts with the theoretical computer science.&lt;/p&gt;

&lt;h1 id=&quot;the-puzzle&quot;&gt;The puzzle&lt;/h1&gt;

&lt;p&gt;Consider the &lt;a href=&quot;https://en.wikipedia.org/wiki/Fredkin_gate&quot;&gt;Fredkin gate&lt;/a&gt;, also known as CSWAP, a reversible gate which swaps bits 2 and 3 (below) only when bit 1 is true.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/cswap.png&quot; alt=&quot;CSWAP&quot; width=&quot;150&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Consider also the &lt;a href=&quot;https://en.wikipedia.org/wiki/Toffoli_gate&quot;&gt;Toffoli gate&lt;/a&gt;, also known as CCNOT, a reversible gate which inverts bit 3 if and only if bits 1 and 2 are both true.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/ccnot.png&quot; alt=&quot;CCNOT&quot; width=&quot;150&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Note that Toffoli gates are &lt;a href=&quot;https://en.wikipedia.org/wiki/Functional_completeness&quot;&gt;&lt;em&gt;complete&lt;/em&gt;&lt;/a&gt; gates in reversible computing: that is, it’s possible to form any other reversible logic operation using only some combination of Toffoli gates. This sort of completeness is the same completeness that &lt;a href=&quot;https://en.wikipedia.org/wiki/NAND_gate&quot;&gt;NAND&lt;/a&gt; has for Boolean logic.&lt;/p&gt;

&lt;p&gt;(The Toffoli gate does not by itself form a complete gate set in quantum computing, but it does when it’s combined with only one other gate, the Hadamard gate. Here’s one &lt;a href=&quot;https://arxiv.org/pdf/quant-ph/0301040.pdf&quot;&gt;proof&lt;/a&gt; of that fact.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The puzzle:&lt;/strong&gt;  Construct a Fredkin gate using only Toffoli gates.&lt;/p&gt;

&lt;!-- **Update:** [The solution](/projects/2019/09/21/graphical-reversible-gates.html). --&gt;

&lt;!--
# Graphical linear algebra for matrices

Pawel Sobocinski&apos;s [blog post series](https://graphicallinearalgebra.net/) explaining graphical linear algebra is quite deep: 31 episodes (at time of writing this post), covering not only linear algebra but also its roots in category theory. For this problem, all I&apos;ll really want is
--&gt;
</description>
        <pubDate>Thu, 07 Feb 2019 00:00:00 +0000</pubDate>
        <link>https://csvoss.com//reversible-computing-puzzle</link>
        <guid isPermaLink="true">https://csvoss.com//reversible-computing-puzzle</guid>
        
        
        <category>projects</category>
        
      </item>
      
    
      
      <item>
        <title>&lt;i&gt;Programming Languages as Notations&lt;/i&gt;, Deconstruct 2017</title>
        <description>&lt;p&gt;Last April, I attended &lt;strike&gt;&lt;a href=&quot;https://twitter.com/hashtag/GaryConf?src=hash&quot;&gt;GaryConf&lt;/a&gt;&lt;/strike&gt; &lt;strike&gt;&lt;a href=&quot;https://www.destroyallsoftware.com/talks/wat&quot;&gt;WATCON&lt;/a&gt;&lt;/strike&gt; &lt;a href=&quot;http://deconstructconf.com/&quot;&gt;Deconstruct&lt;/a&gt; 2017, got to listen to some excellent speakers, and enjoyed the opportunity to give a talk of my own: &lt;i&gt;Programming Languages as Notations&lt;/i&gt;. &lt;a href=&quot;/assets/csvoss-deconstructconf-2017.pdf&quot;&gt;Here are the slides&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When I was deciding what to talk about, I had been reading a bunch about the history of notations in math and physics. It’s really fascinating how different people throughout history have designed different languages for representing fundamentally identical concepts – for example, we have different notations for arithmetic, different notations for solving problems in quantum electrodynamics, and different notations for manipulating vectors. Some thoughts on each of these I’d like to share:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Arithmetic&lt;/strong&gt;: Compare Western civilization’s arithmetic notation and algorithms (for addition, multiplication, division, and fractions) to the notation and algorithms used in &lt;a href=&quot;https://en.wikipedia.org/wiki/Ancient_Egyptian_mathematics&quot;&gt;Ancient Egyptian mathematics&lt;/a&gt;. I didn’t end up mentioning Ancient Egyptian arithmetic in my talk, but it’s neat. &lt;a href=&quot;https://blogs.scientificamerican.com/roots-of-unity/learn-to-count-like-an-egyptian/&quot;&gt;&lt;em&gt;Count Like An Egyptian&lt;/em&gt;&lt;/a&gt; by David Reimer is a fun resource for learning more.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Quantum electrodynamics&lt;/strong&gt;: Although two techniques – brute-force algebra and Feynman diagrams – each target the same kinds of problem (quantum electrodynamics calculations), Feynman diagrams are an interesting innovation both because they create both a new abstraction for dealing with those problems at a higher level and because they represent that abstraction in a beautifully visual way. Read David Kaiser’s article &lt;a href=&quot;http://web.mit.edu/dikaiser/www/FdsAmSci.pdf&quot;&gt;&lt;em&gt;Physics and Feynman’s Diagrams&lt;/em&gt;&lt;/a&gt; for some neat history here, or check out his book &lt;a href=&quot;http://web.mit.edu/dikaiser/www/DrawingTheoriesApart.html&quot;&gt;&lt;em&gt;Drawing Theories Apart&lt;/em&gt;&lt;/a&gt; for a more in-depth look. I love finding examples of mathematical notations that provide a new abstraction over the problems they solve, and I love highly visual notations, so Feynman diagrams really hit my aesthetic buttons.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Vector notation&lt;/strong&gt; is another example of a notation that creates a new abstraction: when we represent doubling some three-dimensional vector by writing down the notation 2&lt;strong&gt;x&lt;/strong&gt; instead of 〈2x₁, 2x₂, 2x₃〉, for example, the vector abstraction has saved us from something that’s much like code duplication, and in the process has reduced the number of opportunities we have to accidentally introduce an error. Also, vector notation provides a hilarious example of &lt;a href=&quot;https://xkcd.com/927/&quot;&gt;Standardization Wars&lt;/a&gt; happening over one hundred years ago. Florian Cajori in his work &lt;em&gt;A History of Mathematical Notation&lt;/em&gt; describes some of the vitriol that got thrown back and forth; some choice quotes from that book are in my slides. It’s nice to know (I suppose) that the modern impulse to fight over standards and their implementation details was shared by our ancestors as well.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because notation had been on my mind, my talk centers around some parallels I see between mathematical notation design and programming language design.&lt;/p&gt;

&lt;p&gt;One of those parallels: I like it when I find new programming paradigms that introduce new abstractions for the problems that we as programmers often solve. Here are some things in this general concept-space that have recently piqued my interest:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://rise4fun.com/dafny&quot;&gt;&lt;strong&gt;Dafny&lt;/strong&gt;&lt;/a&gt; is a research programming language that has syntax built in for writing down a function’s preconditions (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;requires&lt;/code&gt;) and postconditions (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ensures&lt;/code&gt;); the verifier then checks that those conditions are true.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Computational biology&lt;/strong&gt; has a need for programming tools that model at many different levels of abstraction – from the molecular level (e.g. protein folding: tools – projects like &lt;a href=&quot;http://folding.stanford.edu/&quot;&gt;Folding@Home&lt;/a&gt; and &lt;a href=&quot;https://fold.it/portal/&quot;&gt;Foldit&lt;/a&gt;) to the cellular level (e.g. protein signalling networks: tools – programming languages like &lt;a href=&quot;http://dev.executableknowledge.org/&quot;&gt;Kappa&lt;/a&gt;) to the whole-organism level (tools – we need them!).&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;https://en.wikipedia.org/wiki/%CE%A0-calculus&quot;&gt;&lt;strong&gt;pi-calculus&lt;/strong&gt;&lt;/a&gt; is a model of computation in the same way that lambda calculus or Turing machines are models of computation, but is unusual in that it allows both &lt;em&gt;parallel composition&lt;/em&gt; and &lt;em&gt;sequential composition&lt;/em&gt; of code. Normally we write code sequentially, without the ability to specify when two operations or sequences of operations are independent and could very well have happened in parallel. Instead, the pi-calculus formalizes the ability to specify code as running sequentially or in parallel, opening up the possibility that the compiler could optimize code to run concurrently both more easily and with less thought required from the programmer. &lt;a href=&quot;http://www.cis.upenn.edu/~bcpierce/papers/pict/Html/Pict.html&quot;&gt;Pict&lt;/a&gt; is a concurrent programming language which is built upon the pi-calculus.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another of those parallels: I like it when I find programming tools that enable visual representation of code. For example, &lt;a href=&quot;http://grokcode.com/864/snakefooding-python-code-for-complexity-visualization/&quot;&gt;snakefood&lt;/a&gt; is pretty nifty. Towards giving an example of what a &lt;em&gt;completely&lt;/em&gt; visual programming language could look like, I discuss a visual circuitry-like notation I designed for lambda calculus, giving examples that dive into lambda calculus and combinatory logic. &lt;a href=&quot;/projects/2015/11/08/lambda-circuitry.html&quot;&gt;This blog post&lt;/a&gt; has more detail on that project. I don’t claim that we should be using visual representations all the time – user interface design is tricky, and our text-based systems have a lot going for them – but I think there still exist areas in which to innovate visually.&lt;/p&gt;
</description>
        <pubDate>Sat, 29 Jul 2017 00:00:00 +0000</pubDate>
        <link>https://csvoss.com//programming-languages-as-notations</link>
        <guid isPermaLink="true">https://csvoss.com//programming-languages-as-notations</guid>
        
        
        <category>projects</category>
        
      </item>
      
    
      
      <item>
        <title>A circuit-like notation for lambda calculus</title>
        <description>&lt;p&gt;Lately, I’ve been playing around with inventing a visual writing system for lambda calculus.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Lambda_calculus&quot;&gt;Lambda calculus&lt;/a&gt; (λ-calculus) is a sort of proto-functional-programming, originally invented by Alonzo Church while he was trying to solve &lt;a href=&quot;http://en.wikipedia.org/wiki/Entscheidungsproblem&quot;&gt;the same problem&lt;/a&gt; that led Turing to invent Turing machines. It’s another way of reasoning about computation.&lt;/p&gt;

&lt;p&gt;Python’s lambda is an idea that was borrowed from λ-calculus. In Python, you can use a &lt;a href=&quot;https://docs.python.org/2/tutorial/controlflow.html#lambda-expressions&quot;&gt;lambda expression&lt;/a&gt; like the following in order to define a function that returns the square of a number:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;square = lambda x: x * x
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In λ-calculus, the idea is the same: we create a function by using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;λ&lt;/code&gt; to specify which arguments a function takes in, then we give an expression for the function’s return value. Pure lambda calculus doesn’t include operators of any sort – just functions being applied to other functions – so if we try to write a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;square&lt;/code&gt; function, we have to suppose that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;multiply&lt;/code&gt; is a function of two variables that has already been defined:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;square = λx. multiply x x
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;square&lt;/code&gt; function, once defined, can be applied to arguments and evaluated into something simpler.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;square 4 = (λx. multiply x x) 4
         = multiply 4 4
         = 16
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One of the cool things about lambda calculus is that we can represent most common programming abstractions using λ-calculus, even though it’s nothing but functions: numbers, arithmetic, booleans, lists, if statements, loops, recursion… the list goes on. Before I introduce the visual writing system I’ve been using, let’s take a detour and discuss how we can represent numbers and arithmetic using lambda calculus.&lt;/p&gt;

&lt;h2 id=&quot;church-numerals-in-lambda-calculus&quot;&gt;Church numerals, in lambda calculus&lt;/h2&gt;
&lt;p&gt;Alonzo Church figured out how to represent numbers as lambda functions; these numbers are referred to as Church numerals.&lt;/p&gt;

&lt;p&gt;We can represent any nonnegative integer as long as we have two things: (1) a value for &lt;strong&gt;zero&lt;/strong&gt;, and (2) a &lt;strong&gt;successor&lt;/strong&gt; function, which returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n + 1&lt;/code&gt; for any number &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt;. To represent numbers as functions, then, we require that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;z&lt;/code&gt; (zero) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; (successor) be passed in as arguments, and go from there. Each number is actually secretly a function of those two inputs.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;zero = λs. λz. z
one = λs. λz. s z
two = λs. λz. s (s z)
three = λs. λz. s (s (s z))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The actual details of how to implement zero and successor should be implemented as are left as someone else’s problem — we can survive without them. All we care about is that our numbers do the right thing, given whatever zero and successor someone may provide.&lt;/p&gt;

&lt;p&gt;What about &lt;strong&gt;addition&lt;/strong&gt;? Addition is a function that takes in two numbers (let’s call them &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;), and produces a number representing their sum. To sum them together, we’ll want to produce a number that applies &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt;, the successor function, a total of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x + y&lt;/code&gt; times. For example, we could first apply it &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; times to the zero, then apply it &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; more times to that result.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;plus = λx. λy. (λs. λz. x s (y s z))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let’s try proving that one plus one equals two. In λ-calculus, this proof looks like the following:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;one = λs. λz. s z
two = λs. λz. s (s z)

plus = λx. λy. (λs. λz. x s (y s z))

plus one one = (λx. λy. (λs. λz. x s (y s z))) one one
             = λs. λz. one s (one s z)
             = λs. λz. (λs. λz. s z) s (one s z)
             = λs. λz. s (one s z)
             = λs. λz. s ((λs. λz. s z) s z)
             = λs. λz. s (s z)
             = two
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(Long, but at least conciser than &lt;a href=&quot;http://en.wikipedia.org/wiki/Principia_Mathematica&quot;&gt;Bertrand Russell’s&lt;/a&gt;.)&lt;/p&gt;

&lt;h2 id=&quot;lambda-circuitry&quot;&gt;Lambda circuitry&lt;/h2&gt;

&lt;p&gt;There are a lot of lambdas, parentheses, and arguments being pushed around in that proof. Mentally matching up parentheses is annoying. Scope is especially annoying: which &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;s&lt;/code&gt; am I looking at again in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;λs. λz. (λs. λz. s z) s (one s z)&lt;/code&gt;, the inner one or the outer one?&lt;/p&gt;

&lt;p&gt;A linear string of lambdas and parentheses is an ineffective way to provide intuition for the computations that are taking place. This problem isn’t unique to lambda calculus, either; consider trying to represent a binary tree using a linear string:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Node(2, Node(7, Leaf(2), Node(6, Leaf(5), Leaf(11))), Node(5, None, Node(9, Leaf(4), None)))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Unambiguous, but not very intuitive. Contrast that representation with the diagram we use when we’re trying to explain that same binary tree at a chalkboard, a more visual notation:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Binary_tree.svg/288px-Binary_tree.svg.png&quot; alt=&quot;Binary tree diagram, from Wikipedia&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image from &lt;a href=&quot;https://commons.wikimedia.org/wiki/File:Binary_tree.svg&quot;&gt;Wikipedia&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I remember programming constructs better when I can reason about them visually like this: when I imagine cutting an array in half for binary search, when I imagine pointers in a linked list being shuffled around to insert a new element, and when I imagine traversing up and down the branches of a binary tree.&lt;/p&gt;

&lt;p&gt;Why can’t lambda calculus get some visual intuitions, in the same way? Lambda calculus is a dance of variables flowing through and being manipulated by functions, and I want a writing system for lambda calculus that will visually display this dance. It shouldn’t look like strings of parentheses and symbols: it should create visual intuition.&lt;/p&gt;

&lt;p&gt;After some trial and error, here is the system I came up with. I aimed for something that would resemble circuitry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Values&lt;/strong&gt; flow along wires, where they may be passed in as arguments to functions or applied as functions themselves. Some are inputs, some are outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Functions&lt;/strong&gt; are represented as boxes which are applied to their inputs on one side and produce a single output on the other. The notation must indicate which function is applied; this may either be drawn within the box itself, or wired in to the middle of the box from some other value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Arguments&lt;/strong&gt; are represented as inputs, coming in from the right side of the diagram; these arguments might pass through functions, or they might be functions-to-apply themselves. If an argument has not been passed in yet, it’s an empty arrow beginning a wire; if an argument has been passed in, its value is attached to the wire. Arguments are always passed in from top to bottom, in order.&lt;/p&gt;

&lt;p&gt;As an example, here’s a function which takes in two functions, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;g&lt;/code&gt;, then a value &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;, and returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f (g x)&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-f-g-x.png&quot; alt=&quot;lambda f. lambda g. lambda x. f (g x)&quot; /&gt;&lt;/p&gt;

&lt;p&gt;As another example, here’s the M combinator &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;M = λx. x x&lt;/code&gt; (the “mockingbird” in &lt;a href=&quot;http://smile.amazon.com/gp/product/B00A1P096Y&quot;&gt;&lt;em&gt;To Mock a Mockingbird&lt;/em&gt;&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/m-combinator.png&quot; alt=&quot;lambda x. x x&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;church-numerals-in-lambda-circuitry&quot;&gt;Church numerals, in lambda circuitry&lt;/h2&gt;

&lt;p&gt;Here’s the Church numeral &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;four = λs. λz. s (s (s (s z)))&lt;/code&gt;, drawn out in lambda circuitry:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-four.png&quot; alt=&quot;Four, in lambda circuitry&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Let’s take that proof from earlier that one plus one is two. What does it look like to draw that proof in lambda circuitry, instead?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-oneplusoneistwo.png&quot; alt=&quot;Proof that one plus one is two, in lambda circuitry&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;text-align: right;&quot;&gt;∎&lt;/p&gt;

&lt;p&gt;We could also consider &lt;strong&gt;multiplication&lt;/strong&gt;. A multiply function would take in two numbers, m and n, and computes a new number which is their product. In lambda calculus, we’d write:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;multiply = λm. λn. λs. λz. m (n s) z
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the notation of lambda circuitry, this looks like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-multiply.png&quot; alt=&quot;Multiplication function, in lambda circuitry&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Using this function, we can check that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;multiply 2 3&lt;/code&gt; evaluates to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;6&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-multiply-1.png&quot; alt=&quot;Multiply(2, 3), step 1&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-multiply-2.png&quot; alt=&quot;Multiply(2, 3), step 2&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-multiply-3.png&quot; alt=&quot;Multiply(2, 3), step 3&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-multiply-4.png&quot; alt=&quot;Multiply(2, 3), step 4&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-multiply-5.png&quot; alt=&quot;Multiply(2, 3), step 5&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;text-align: right;&quot;&gt;∎&lt;/p&gt;

&lt;h2 id=&quot;sidenote-de-bruijn-indices&quot;&gt;Sidenote: De Bruijn indices&lt;/h2&gt;
&lt;p&gt;One of the nice things about lambda circuitry is that it completely removes the need for variable names.&lt;/p&gt;

&lt;p&gt;There’s another notation for lambda calculus that does this too: &lt;a href=&quot;https://en.wikipedia.org/wiki/De_Bruijn_index&quot;&gt;&lt;em&gt;De Bruijn indices&lt;/em&gt;&lt;/a&gt;. A lambda expression written with De Bruijn indices indicates which variables are used where with a positive integer; the smaller the integer, the more recently the argument it refers to was passed in.&lt;/p&gt;

&lt;p&gt;For example, the identity function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;λx. x&lt;/code&gt; may be written with De Bruijn indices like so:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;identity = λ 1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The Church numeral for two, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;λs. λz. s (s z)&lt;/code&gt;, may be written like so:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;two = λ λ 2 (2 1)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The addition function, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;λx. λy. (λs. λz. x s (y s z))&lt;/code&gt;, may be written like so:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;plus = λ λ (λ λ 4 2 (3 2 1))
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;An evaluation of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;plus one one&lt;/code&gt; looks like this:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;plus one one = (λ λ (λ λ 4 2 (3 2 1))) (λ λ 2 1) (λ λ 2 1)
             = (λ (λ λ (λ λ 2 1) 2 (3 2 1))) (λ λ 2 1)
             = λ λ (λ λ 2 1) 2 ((λ λ 2 1) 2 1)
             = λ λ (λ λ 2 1) 2 (2 1)
             = λ λ 2 (2 1)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;One of the tricky things about writing a lambda calculus interpreter is getting the renaming rules right; De Bruijn indices are convenient because they remove the need for this. Lambda circuitry is similar in spirit to De Bruijn indices in that it doesn’t require variable names at all, but instead indicates which variables are passed where by connecting values directly to an arrow indicating when they were passed in.&lt;/p&gt;

&lt;h2 id=&quot;argument-switching-function-in-lambda-circuitry&quot;&gt;Argument-switching function, in lambda circuitry&lt;/h2&gt;

&lt;p&gt;I’ll provide more examples just to further demonstrate how the notation works in different situations. Let’s consider the “argument-switching” function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt;, where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C f x y&lt;/code&gt; returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f y x&lt;/code&gt;. (This is actually the &lt;a href=&quot;https://en.wikipedia.org/wiki/B,C,K,W_system&quot;&gt;C combinator&lt;/a&gt;.)&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;C = λf. λx. λy. f y x
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/c-combinator.png&quot; alt=&quot;C combinator&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Suppose we try applying this to a silly function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f&lt;/code&gt; where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f x y&lt;/code&gt; discards &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; and just returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt;. Then, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C f&lt;/code&gt; should switch around &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f&lt;/code&gt;‘s arguments and create a function which returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt; instead. Let’s check:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;f = λx. λy. x

C f = λf. λx. λy. (f y x) f
    = λx. λy. f y x
    = λx. λy. (λx. λy. x) y x
    = λx. λy. y
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-f.png&quot; alt=&quot;f = lambda x. lambda y. x&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/c-combinator-of-f-1.png&quot; alt=&quot;C(f), step 1&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/c-combinator-of-f-2.png&quot; alt=&quot;C(f), step 2&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/c-combinator-of-f-3.png&quot; alt=&quot;C(f), step 3&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;text-align: right;&quot;&gt;∎&lt;/p&gt;

&lt;p&gt;We could also try a function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;g&lt;/code&gt; where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;g x y&lt;/code&gt; returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x y&lt;/code&gt;. Then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C g x y&lt;/code&gt; should return &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y x&lt;/code&gt;. Let’s check:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;g = λx. λy. x y

C g x y = λf. λx. λy. (f y x) g x y
        = g y x
        = (λx. λy. x y) y x
        = y x
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&quot;/images/lambda-g.png&quot; alt=&quot;g = lambda x. lambda y. x(y)&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/c-combinator-of-g-1.png&quot; alt=&quot;C(g), step 1&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/c-combinator-of-g-2.png&quot; alt=&quot;C(g), step 2&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/c-combinator-of-g-3.png&quot; alt=&quot;C(g), step 3&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/c-combinator-of-g-4.png&quot; alt=&quot;C(g), step 4&quot; /&gt;&lt;/p&gt;

&lt;p style=&quot;text-align: right;&quot;&gt;∎&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Exercise&lt;/em&gt;: Show that applying &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt; twice reverses it. That is, show that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C (C f)&lt;/code&gt; returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f&lt;/code&gt;, for any &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f&lt;/code&gt;.
(Note that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C f&lt;/code&gt; is a function which takes in two arguments, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;y&lt;/code&gt;, and returns &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f y x&lt;/code&gt;. Applying &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;C&lt;/code&gt; only to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;f&lt;/code&gt; like this is &lt;a href=&quot;http://en.wikipedia.org/wiki/Partial_application&quot;&gt;partial application&lt;/a&gt;.)&lt;/p&gt;

&lt;h2 id=&quot;prior-work&quot;&gt;Prior work&lt;/h2&gt;
&lt;p&gt;There are some other systems that give visual intuition to lambda calculus.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://dkeenan.com/Lambda/&quot;&gt;&lt;em&gt;To Dissect a Mockingbird&lt;/em&gt;&lt;/a&gt; describes a notation that is actually very similar to the one I’ve described, and demonstrates it on various problems from &lt;em&gt;To Mock a Mockingbird&lt;/em&gt;. I like the way this looks, especially how every function is enclosed by two halves of a circle which make it obvious how that function might be applied. My notation doesn’t have this feature, but requires drawing fewer enclosing boxes as a result.&lt;/p&gt;

&lt;p&gt;Visual Lambda (&lt;a href=&quot;https://code.google.com/p/visual-lambda/&quot;&gt;code&lt;/a&gt;, &lt;a href=&quot;http://bntr.planet.ee/lambda/visual_lambda_bubble_notation.gif&quot;&gt;basics&lt;/a&gt;, &lt;a href=&quot;http://bntr.planet.ee/lambda/work/visual_lambda.pdf&quot;&gt;paper&lt;/a&gt;) represents lambda expressions as colored bubbles, and provides an interface for manipulating them.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://worrydream.com/AlligatorEggs/&quot;&gt;Alligator Eggs&lt;/a&gt; is a description of a puzzle game based on lambda calculus, which also happens to provide a visual way of working with and evaluating lambda expressions.&lt;/p&gt;

&lt;p&gt;These last two don’t happen to satisfy the aesthetic that I personally was aiming for: they use color to represent variable names, whereas I wanted something that would be closer in spirit to De Bruijn indices, providing computational meaning by the careful placement of symbols or wires – but they are nifty nonetheless.&lt;/p&gt;

&lt;h2 id=&quot;further-reading&quot;&gt;Further reading&lt;/h2&gt;
&lt;p&gt;An explanation of some of the nittier, grittier details of lambda calculus can be found at &lt;a href=&quot;http://en.wikipedia.org/wiki/Lambda_calculus&quot;&gt;Wikipedia: Lambda calculus&lt;/a&gt; as well as in the textbook &lt;a href=&quot;http://www.cis.upenn.edu/~bcpierce/tapl/&quot;&gt;Types and Programming Languages&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://smile.amazon.com/gp/product/B00A1P096Y&quot;&gt;&lt;em&gt;To Mock a Mockingbird&lt;/em&gt;&lt;/a&gt; is a great puzzle book, and an introduction to &lt;a href=&quot;https://en.wikipedia.org/wiki/SKI_combinator_calculus&quot;&gt;combinator calculus&lt;/a&gt;; I had a lot of fun reading it and writing out some of the proofs for answers to some of the problems in lambda circuitry notation.&lt;/p&gt;

&lt;h2 id=&quot;updates&quot;&gt;Updates&lt;/h2&gt;

&lt;h3 id=&quot;2020-08-18&quot;&gt;2020-08-18&lt;/h3&gt;

&lt;p&gt;Since this piece was written, I have discovered another notation by John Tromp which I consider to be the most breathtakingly beautiful:&lt;/p&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;https://tromp.github.io/img/cl/primes.alt.gif&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://tromp.github.io/cl/diagrams.html&quot;&gt;Lambda Diagrams&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;tromp.github.io&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here used by Paul Crowley to describe Graham’s number:&lt;/p&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;https://mindsarentmagic.files.wordpress.com/2020/02/new-piskel-1.png.png&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://mindsarentmagic.org/2020/02/19/a-picture-of-grahams-number/&quot;&gt;A picture of Graham’s Number&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;One of the first posts I made on this blog was Lambda calculus and Graham’s number, which set out how to express the insanely large number known as Graham’s Number precisely and concise…&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;mindsarentmagic.org&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;unfurl-embed-info-media-default gallery-item-selectable&quot;&gt;&lt;img class=&quot;unfurl-embed-card-feature-image&quot; src=&quot;https://mindsarentmagic.files.wordpress.com/2020/02/l_1-2.png&quot; /&gt;&lt;div class=&quot;unfurl-embed-card-title unfurl-embed-card-title-default notranslate&quot;&gt;&lt;a href=&quot;https://mindsarentmagic.org/2020/02/24/some-more-numbers-as-lambda-calculus/&quot;&gt;Some more numbers as lambda calculus&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-description unfurl-embed-card-description-default notranslate&quot;&gt;&lt;div style=&quot;overflow: hidden; text-overflow: ellipsis; -webkit-box-orient: vertical; display: -webkit-box; -webkit-line-clamp: 2;&quot;&gt;On Reddit, u/spriteguard asks: Do you have any diagrams of smaller numbers for comparison? I’d love to see a whole sequence of these. I have work to put off, so I couldn’t resist the ch…&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;unfurl-embed-card-url notranslate&quot;&gt;mindsarentmagic.org&lt;/div&gt;&lt;/div&gt;
</description>
        <pubDate>Sun, 08 Nov 2015 00:00:00 +0000</pubDate>
        <link>https://csvoss.com//circuit-notation-lambda-calculus</link>
        <guid isPermaLink="true">https://csvoss.com//circuit-notation-lambda-calculus</guid>
        
        
        <category>projects</category>
        
      </item>
      
    
      
      <item>
        <title>Schrödinger&apos;s deploys no more: how we update translations</title>
        <description>&lt;p&gt;&lt;em&gt;Cross-posted from the &lt;a href=&quot;http://engineering.khanacademy.org/posts/translation-server.htm&quot;&gt;Khan Academy engineering blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you’re trying to bring the best learning experience to people around the world, it’s important to, well, think about the world.&lt;/p&gt;

&lt;p&gt;Khan Academy is translated into &lt;a href=&quot;http://es.khanacademy.org/&quot;&gt;Spanish&lt;/a&gt; and &lt;a href=&quot;https://tr.khanacademy.org/&quot;&gt;Turkish&lt;/a&gt; and &lt;a href=&quot;https://pl.khanacademy.org/&quot;&gt;Polish&lt;/a&gt; and more – and this includes not only text, but also the articles, exercises, and videos. Thanks to the efforts of translators, learners &lt;a href=&quot;http://international.khanacademy.org/&quot;&gt;around the world&lt;/a&gt; can use Khan Academy to learn in their language.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/khan/collage.png&quot; alt=&quot;&amp;quot;You can learn anything&amp;quot; in several languages&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Internationalization is important. Internationalization is also an engineering challenge: it requires infrastructure to mark which strings in a codebase need to exist in multiple different languages, to store and look up the translated versions of those strings, and to show the user a different website accordingly. Additionally, since our translations are crowdsourced, we need infrastructure to allow translators to translate strings, to show translators where their effort is most needed, and to show these translations once they’re ready. There are many moving parts.&lt;/p&gt;

&lt;p&gt;When I arrived at Khan Academy at the beginning of this summer, some of these moving parts in our internationalization infrastructure were responsible for most of the time our deploys took to finish. One of the things I accomplished this summer during my internship here was to banish this slowness from our deploy times.&lt;/p&gt;

&lt;h1 id=&quot;the-problem&quot;&gt;The problem&lt;/h1&gt;

&lt;p&gt;Whenever we download the latest translation data from &lt;a href=&quot;http://crowdin.com/&quot;&gt;Crowdin&lt;/a&gt;, which hosts our crowdsourced translations, we rebuild the &lt;em&gt;translation files&lt;/em&gt; – files which the Khan Academy webapp can read in order to show translated webpages. The next time an engineer deploys a new version of the webapp, these new translation files are then deployed as well.&lt;/p&gt;

&lt;p&gt;Uploading files to Google App Engine, which hosts the Khan Academy website, is usually the slowest part of our deploys; the translation files are big, so translation files in particular are a major contributor to this. So, whenever the latest translations are downloaded and rebuilt, the next deploy would be quite a bit slower while it uploaded the changed files.&lt;/p&gt;

&lt;p&gt;Furthermore, since it’s not always the case that translation files have been rebuilt recently, as an engineer it’s hard to tell whether the deploy you’re about to make will be hit with Translations Upload Duty or not. Sometimes deploys would take around 30 minutes, sometimes they would take around 75 minutes or more:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/khan/graph_before.png&quot; alt=&quot;Graph of deploy times, before translation server&quot; /&gt;&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
The previous state of affairs – 30-minute deploys punctuated by 75-minute deploys. There are a couple of lulls in this graph where deploys are consistently near 30 minutes: these reflect times when our download from Crowdin was not working.
&lt;/small&gt;&lt;/p&gt;

&lt;h1 id=&quot;the-fix&quot;&gt;The fix&lt;/h1&gt;

&lt;p&gt;We decided to rearrange the infrastructure around this so that instead of uploading translation files to Google App Engine (GAE) along with the rest of the webapp, we would upload the translation files to Google Cloud Storage (GCS) in a separate process and then modify the webapp to read the files from there.&lt;/p&gt;

&lt;p&gt;Implementing this required making a few different changes, and the changes had to be coordinated in such a way as to keep internationalized sites up and running throughout the entire process:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Upload the translation files to GCS whenever they’re updated.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Change the webapp to read translations from GCS instead of from GAE.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Stop uploading the now-unnecessary translation files to GAE.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These steps by themselves are enough to implement the change to make deploys faster, but we also want to make sure that this project won’t break anything – so instead, the steps look something like:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Upload the translation files to GCS whenever they’re updated. Measure everything. (How much will this cost? How fast will the upload be?)&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Change the webapp to read translations from GCS instead of from GAE. Measure everything. (How much slower is this than reading from disk? Will requests be slower?)&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Stop uploading the now-unnecessary translation files to GAE. Measure everything. (Do translated sites still work?)&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One thing I experimented with while working on this project was keeping a lab notebook of sorts in a Google Doc; this was where I went to record everything I learned, from little commands that might be useful later, to dependencies I had to install, to all of the measurements I ended up making. This was a good decision. This habit and these notes did in fact turn out to be useful, frequently.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/khan/science.jpg&quot; alt=&quot;Mythbusters&apos; Adam Savage: &amp;quot;Remember kids, the only difference between screwing around and science is writing it down.&amp;quot;&quot; /&gt;&lt;/p&gt;

&lt;h1 id=&quot;the-consequences&quot;&gt;The consequences&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Deploys are faster!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I deployed the last piece of this project on August 20, 2015. Deploy times have been more consistent since: the graph of deploy times is free of the spikes that previously indicated translation uploads.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/khan/graph_after.png&quot; alt=&quot;Graph of deploy times, after translation server&quot; /&gt;&lt;/p&gt;

&lt;p align=&quot;center&quot;&gt;&lt;small&gt;
Before and after; the change happened on 8/20.
&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Translations can be updated independently!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now we don’t require an engineer to deploy new code in order to change the translations that appear on Khan Academy – translations are updated by a separate job. Also, this opens up new possibilities – we can now do exciting things like updating our languages independently of each other, and make it so that the time between when a translator makes a translation and when that translation shows up on the main site becomes even shorter. Our internationalization efforts will be able to push forward even faster!&lt;/p&gt;
</description>
        <pubDate>Tue, 13 Oct 2015 00:00:00 +0000</pubDate>
        <link>https://csvoss.com//schrodingers-deploys-no-more</link>
        <guid isPermaLink="true">https://csvoss.com//schrodingers-deploys-no-more</guid>
        
        
        <category>projects</category>
        
      </item>
      
    
      
      <item>
        <title>Modeling Molecules with Recurrent Neural Networks</title>
        <description>&lt;p&gt;I enjoyed reading Andrej Karpathy’s &lt;a href=&quot;http://karpathy.github.io/2015/05/21/rnn-effectiveness/&quot;&gt;&lt;em&gt;The Unreasonable Effectiveness of Recurrent Neural Networks&lt;/em&gt;&lt;/a&gt; lately – it’s got some fascinating examples and some good explanations. I’ve been playing around with the &lt;a href=&quot;https://github.com/karpathy/char-rnn&quot;&gt;&lt;em&gt;char-rnn&lt;/em&gt;&lt;/a&gt; code from that post, and I want to share some of my experiments.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/rnn_meme.jpg&quot; alt=&quot;Meme: Recurrent neural networks - so hot right now&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;chemical-formulas-and-names&quot;&gt;Chemical formulas and names&lt;/h2&gt;

&lt;p&gt;First experiment: I trained &lt;em&gt;char-rnn&lt;/em&gt; on 3,892 real chemical compounds, both organic and inorganic, one formula and name per line, in randomized order.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;C6H12O6, β-D-galactose
C4H8O2, ethyl acetate
C7H8N4O2, xantheose
C2H4O3, glycolic acid
CH5N, methylamine
C7H8N2S, phenylthiocarbamide
C17H20N4O6, riboflavin
C8H10, ortho-xylene
WOF2, tungsten(VI) oxytetrafluoride
PoO3, polonium trioxide
CuSe, copper(II) selenide
ErI3, erbium triiodide
NaHCO3, sodium bicarbonate
CH4, methane
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After ten minutes of training, I had access to an excellent, excellent generator of completely fake chemistry jargon.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;C4H8N2O2, thiochlorophecylene
AlCl6, aluminium cyandate bromide
C2H4BrO, bromo-3,5-cylohexanol
CH2CH3COOH, coltium carbide
C3I7N3, 2-buumenidine
C2H2O2, ethenol
RgClO4, malobium hexaxide
C2H5NO2, vinyl chloride
Na(VO3)2, sodium metatitanatedioxide
HOB2, deiopyre fluoride
C2H3CH3NO2, iomone acetic acid
CoCl2, cobalt(II) carbide
C19H14O, methoprepane
SnO3, strontium monoxide
C9H11NO, 1,3-pytan
Nr2, gannisil fluoride
C6H12, hadequine
C2HTi, chlorobenzelymethane
RCCl3, thyll
H2Br2, bromofluorosante
C15H17Cl2N2O3, lyctosin
C5H8ClN, anthalum-3-carboxyblue
N2N2O4, nitrogun grupiodide
C9H7N, γ-pinolylionine
C10C18O2, heptane
C32H21N4, gipsatedrale
SiBr4, strium tetralicolipethylachloradehydrame
C2H4ClO6, 2-chlorobromoprenidine
C7H4O4, methyl fluoride
FeS, vanady(III) fluoride
III2, irophiorite
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Interestingly, the neural net is doing a good job of pairing organic-looking formulas with organic-looking names, and inorganic-looking formulas with inorganic-looking names. Other than that, it’s clearly not got enough data to have internalized the full periodic table of elements yet. I’m currently using this neural network to generate my &lt;a href=&quot;https://sipb.mit.edu/doc/zephyr/&quot;&gt;Zephyr&lt;/a&gt; signatures.&lt;/p&gt;

&lt;h2 id=&quot;molecular-structures&quot;&gt;Molecular structures&lt;/h2&gt;

&lt;p&gt;One thing that really struck me about the &lt;em&gt;char-rnn&lt;/em&gt; neural network was its ability to &lt;em&gt;remember its position in a stack&lt;/em&gt;. Karpathy gives numerous examples of training the neural network on different sources, then sampling the neural network to get randomly-generated output. Incredibly, when trained on Linux source code, on LaTeX, or on Markdown/XML from Wikipedia, the neural network creates &lt;em&gt;syntactically-valid&lt;/em&gt; output. XML tags close. Parentheses match. This is quite beyond what Markov chains alone can do.&lt;/p&gt;

&lt;p&gt;While building &lt;a href=&quot;/projects/2013/01/31/carbonate.html&quot;&gt;Carbonate&lt;/a&gt;, I spent a lot of time working with &lt;a href=&quot;https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system&quot;&gt;SMILES&lt;/a&gt; (Simplified Molecular-Input Line-Entry System), writing code to parse SMILES into molecules and code to convert molecules into SMILES. SMILES is a formalized scheme for converting molecules &amp;ndash; which, by their nature, are nonlinear &amp;ndash; into strings, a linear data structure. Here are some examples:&lt;/p&gt;

&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right;&quot;&gt;CC(O)C, &lt;a href=&quot;https://en.wikipedia.org/wiki/Isopropyl_alcohol&quot;&gt;isopropanol&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;img class=&quot;alignleft&quot; style=&quot;margin: 20px;&quot; src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/2-Propanol.svg/242px-2-Propanol.svg.png&quot; alt=&quot;isopropanol&quot; width=&quot;66&quot; height=&quot;65&quot; /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right;&quot;&gt;C1CCCCC1, &lt;a href=&quot;https://en.wikipedia.org/wiki/Cyclohexane&quot;&gt;cyclohexane&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;img class=&quot;alignleft&quot; style=&quot;margin: 20px;&quot; src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/4/45/Cyclohexane-2D-skeletal.svg/209px-Cyclohexane-2D-skeletal.svg.png&quot; alt=&quot;cyclohexane&quot; width=&quot;47&quot; height=&quot;54&quot; /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td style=&quot;text-align:right;&quot;&gt;CC(=O)NCCC1=CNc2c1cc(OC)cc2, &lt;a href=&quot;https://en.wikipedia.org/wiki/Melatonin&quot;&gt;melatonin&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;img class=&quot;alignleft&quot; style=&quot;margin: 20px;&quot; src=&quot;https://upload.wikimedia.org/wikipedia/commons/thumb/1/14/Melatonin2.svg/320px-Melatonin2.svg.png&quot; alt=&quot;melatonin&quot; width=&quot;171&quot; height=&quot;113&quot; /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;SMILES follows a &lt;a href=&quot;http://www.opensmiles.org/spec/open-smiles-2-grammar.html&quot;&gt;formal grammar&lt;/a&gt;. It uses parentheses to branch off side-chains, as in CC(O)C for isopropanol. Rings are more complicated: to create the SMILES string for a ringed molecule, one must first cut some bonds so as to turn the molecule into a tree with no cycles; numerical indices are then used to link faraway atoms to each other in the resulting string. Chirality, cis/trans bonds, double or triple bonds, and aromaticity introduce yet further complications, each of which SMILES has ways of handling.&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A valid SMILES string requires parentheses to match.&lt;/li&gt;
&lt;li&gt;&lt;i&gt;char-rnn&lt;/i&gt; can create output with matching parentheses.&lt;/li&gt;
&lt;li&gt;???&lt;/li&gt;
&lt;li&gt;Profit!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I acquired 1,200,000 different SMILES strings via &lt;a href=&quot;http://chembank.broadinstitute.org/&quot;&gt;ChemBank&lt;/a&gt;, and used them to train the neural net. Here is some of the output when I sample the result:&lt;/p&gt;

&lt;pre style=&quot;margin-top: 10px;&quot;&gt;
OCCCOc1ccc(cc1)C2=N[C@@](CCS(=O)(=O)c3ccccc3)([C@@H](O2)c4ccccc4)C(=O)NNCc5ccccc5C(F)(F)F
COc1cc(cc(OC)c1OC)c2nnc(c3ccc(C)cc3)c4ccccc24
CN(C)CCCNC(=O)Oc1ccc2oc(C)c(C(=O)C)c2c1
OCCCCCCN1C(C(=O)N(CC=C)Cn2nnc3ccccc23)C4(CC[C@@H]5O4)[C@@H]([C@@H]5C(=O)N(CC=C)Cc6ccccc6)C1=O
COc1ccc(CNNC(=O)[C@@]2(Cc3ccccc3)N=C(O[C@H]2c4ccc(Cl)cc4Cl)c5ccc(OCCCO)cc5)c1OC
OCCCOc1ccc(cc1)C2=N[C@@](CCS(=O)(=O)c3ccccc3)([C@@H](O2)c4ccccc4)C(=O)NNCc5ccccc5C(F)(F)F
CC(C)(C)OC(=O)CC[C@]1(N=C(O[C@@H]1c2ccccc2N=[N+]=[N-])c3ccc(OCCCO)cc3)C(=O)NCc4cccc(F)c4
CCOC(=O)[C@@H]1[C@H]2C(=O)N(CCCO)C(C(=O)N(CC=C)Cn3nnc4ccccc34)C52CC[C@@]1(CC)O5
COc1ccc(cc1O)[C@@H](O)c2ccc(OCCCCCC(=O)NC3CCCCC3)Cc2O
OCCCOc1ccc(cc1)C2=N[C@@](Cc3ccccc3)([C@@H](O2)c4ccccc4)C(=O)NNCCc5cccc(c5)C(=O)NCCO
CCCCCCCCCCCC(=O)N(CCc1ccccc1)C[C@]2(O)CC[C@H]3c4ccc(C[C@@H](O)CC/C(=C\CC[C@@]32C)/C)cc4C(=O)Cc5c(F)cccc5Cl
COc1ccc(cc1O)[C@H](O)CC/C=C/C(=O)NO
CC1(C)CCC[C@@H]2CCC1(C)N2C(=O)C3CCCCC3
CCOC(=O)[C@@H]1[C@H]2C(=O)N(CCCO)C(C(=O)N(CC=C)Cn3nnc4ccccc34)C52CC(C)[C@@]1(C)O5
CCCCCN1CC=C[C@@]23S[C@@H]4/C=C\CCCCOC(=O)[C@@H]4[C@H]3C(=O)N(CCCCCCO)C2C1=O
Cc1cc(C)cc(OCCCCN2CCCCCC2)c1
CC[C@]1(CCC23O1)[C@H]([C@H]3C(=O)N(CCCCCO)C2C(=O)N(CC=C)Cn4nnc5ccccc45)C(=O)OCCCCC=C
CC(C)OC(=O)[C@@H]1[C@@H]2CCC3(O2)C(N(CCO)C(=O)[C@H]13)C(=O)Nc4c(C)cccc4C
COc1cc(ccc1O)C(=O)Nc2cccc(C)c2C
COc1ccc(NC(=O)c2ccc(OC)cc2Br)cc1
CCCCCCCCCCCCCN/C=C/c1ccccc1
...
&lt;/pre&gt;

&lt;p&gt;Remarkably, these are all &lt;b&gt;syntactically valid&lt;/b&gt;! The side-chains are sane, parentheses match, the ring-bonds match, and some of them even include chirality, cis/trans, and triple bonds! They&apos;re for fairly large molecules, but that&apos;s to be expected; my training data mostly consisted of large molecules. Because they&apos;re syntactically valid, they can be rendered into actual pictures using &lt;a href=&quot;http://openbabel.org/wiki/Main_Page&quot;&gt;OpenBabel&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/smiles-output.svg&quot; /&gt;&lt;/p&gt;

&lt;h2&gt;Next steps&lt;/h2&gt;

&lt;p&gt;Neural nets are fascinating. In addition to &lt;i&gt;The Unreasonable Effectiveness of Recurrent Neural Networks&lt;/i&gt;, there are a few other posts that I read last summer &amp;ndash; Google Research&apos;s &lt;a href=&quot;http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html&quot;&gt;Inceptionism&lt;/a&gt;, Felix Sun&apos;s &lt;a href=&quot;http://www.mit.edu/~felixsun/?neural-music.html&quot;&gt;DeepHear&lt;/a&gt; for composing and harmonizing music &amp;ndash; that finally convinced me that neural networks are worth paying attention to. And &lt;i&gt;char-rnn&lt;/i&gt;, in particular, has been simple to get working and pleasant to use.

&lt;p&gt;If I decide to play around with this more, the next thing I&apos;m going to do is to investigate whether a neural net could be trained to predict the outputs of some simple organic reactions. Modeling organic reactions was the bottleneck of the &lt;a href=&quot;/projects/2013/01/31/carbonate.html&quot;&gt;Carbonate&lt;/a&gt; project; the logic of each reaction had to be written and tested by hand. Using a neural net to learn the rules for reactions automatically might make for a better way forward.&lt;/p&gt;
&lt;/p&gt;
</description>
        <pubDate>Thu, 08 Oct 2015 00:00:00 +0000</pubDate>
        <link>https://csvoss.com//modeling-molecules-with-rnns</link>
        <guid isPermaLink="true">https://csvoss.com//modeling-molecules-with-rnns</guid>
        
        
        <category>projects</category>
        
      </item>
      
    
      
      <item>
        <title>Transliterating Tengwar</title>
        <description>&lt;h3 class=&quot;tengwar&quot; style=&quot;text-align: center; margin-bottom: 25px;&quot;&gt;&lt;i&gt;175#8j1T7F1Eb% 1b$y6E&lt;/i&gt;&lt;/h3&gt;

&lt;script&gt;
$(&quot;&lt;link /&gt;&quot;, {
    &apos;rel&apos;: &apos;stylesheet&apos;,
    &apos;href&apos;: &apos;/assets/fonts.css&apos;
  }).appendTo(&apos;head&apos;);
&lt;/script&gt;

&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Tengwar&quot;&gt;&lt;b&gt;Tengwar&lt;/b&gt;&lt;/a&gt; is a writing system invented by J.R.R. Tolkien for use by the elves of Middle-Earth. Lately, I&amp;#8217;ve learned how to write in Tengwar – not by learning any Elvish language, but by learning how to transliterate &lt;i&gt;English&lt;/i&gt; into Tengwar using the instructions found in the &lt;a href=&quot;http://www.starchamber.com/paracelsus/elvish/tengwar-textbook.pdf&quot;&gt;Tengwar Textbook&lt;/a&gt;.&lt;/p&gt;
&lt;p class=&quot;tengwar&quot;&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Tengwar&quot;&gt;&lt;b&gt;1b$y6E&lt;/b&gt;&lt;/a&gt; iG `C y71Tb% 88Ú1t$ 5%r5$12$ w`Û s-6-6- 1j^z`B5$ e6Y iJO w`Û @ j$rO_ W t2%2jO·V6E3- j1Ej$`Û· `Br`V j`V6E52$ 9yY 1`N y71TO 5% 1b$y6E  51Y w`Û j`V6E5b% 5#`Û j$rdT jb#`Ms#O· w1U w`Û j`V6E5b% 9yY 1`N 175#8j1T7F1EO b$jdT 5%1`N 1b$y6E iJb% @ 5%817zJ1`B5^_ e`N5&amp;amp;2 5% @ &lt;a href=&quot;http://www.starchamber.com/paracelsus/elvish/tengwar-textbook.pdf&quot;&gt;1b$y6E 1zFæ1w`NzH&lt;/a&gt;-&lt;/p&gt;

&lt;p&gt;I&amp;#8217;ve found this writing system to be useful for writing down small notes-to-self, and I&amp;#8217;ve become quite good at writing it.&lt;/p&gt;
&lt;p class=&quot;tengwar&quot;&gt;`Br`V e`N5&amp;amp;2 4iG y71Tb% 88Ú1t$ 1`N w`V iJeFj&amp;amp; e6Y y71Tb% 2yY5 8tj#j 51YO_\1`N\8j$e· 5#2 `Br`V wzFt^O z`M1TO x`N2^ 1E y71Tb% 1T-
&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/elvish-stickies.png&quot; style=&quot;display: block; margin: 0 auto;&quot; alt=&quot;&quot; height=&quot;434&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Problem is, I&amp;#8217;m still no good at &lt;i&gt;reading&lt;/i&gt; Tengwar. Writing characters down on a piece of paper feels fluid and easy, but once I take a step back and look at the page, it&amp;#8217;s incomprehensible at a glance. I have to sound the words out, character by character, if I want to read what I have written.&lt;/p&gt;
&lt;p class=&quot;tengwar&quot;&gt;q7w^jt$ iG· `Bt 81j%j 5`N x`N2^ 1E &lt;i&gt;7`V2#b%&lt;/i&gt; 1b$y6E- y71Tb% a7DzD16R_ 2yY5 5^ `C q`BiFO W qqE6R e`Vj$_ ej`M2% 5#2 `ViD`Û· w1U 5^iO `B 1zDO `C 81qR wzDz 5#2 j`NzH 1E @ qs#O· 1Ti 5%zt^q79V5$8w%jO 1E `C xj5#iO- `B 9r#O 1`N 8`N5&amp;amp;2 @ yuH_ `N1U· a7DzD16R w`Û a7DzD16R· eG `B y5#1 1`N 7`V2# o1E `B 9r#O y71T15$-&lt;/p&gt;

&lt;p&gt;However, Tengwar is incredibly &lt;em&gt;pretty&lt;/em&gt;. I want to get more practice reading it. What if I could read &lt;i&gt;whatever I want&lt;/i&gt; with this writing system? If I only had a script that could convert English text into readable Tengwar for me!&lt;/p&gt;
&lt;p class=&quot;tengwar&quot;&gt;9yYr$6R· 1b$y6E iG 5%z72$w%j`Û &lt;i&gt;q71R1`Û&lt;/i&gt;- `B y5#1 1`N s1R t7HO q7zD1iGO 7`V2#b% 1T- o1E eG `B z`Nm&amp;amp; 7`V2# &lt;i&gt;o1Er$6R `B y5#1&lt;/i&gt; y3G 4iG y71Tb% 88Ú1t$À eG `B 5^j`Û 92# `C 8z7qT1 41E z`Nm&amp;amp; z5^r6R1 b$jdT 1zFæ1 5%1`N 7`V2#w#jO 1b$y6E e6Y t`VÁ
&lt;/p&gt;

&lt;p&gt;As a glance through the Tengwar Textbook will demonstrate, Tengwar is pretty complicated. There isn&amp;#8217;t a single standard way to write in English using Tengwar: there are a variety of &amp;#8220;modes&amp;#8221;, each of which has a different set of rules. I have my own personal way of writing in Tengwar that combines some features of each of those modes that I like.&lt;/p&gt;
&lt;p class=&quot;tengwar&quot;&gt;iD `C xj5#iO 37`Nv&amp;amp; @ 1b$y6E 1zFæ1w`NzH yj%j 2t$5^8171EO· 1b$y6E iG q71R1`Û zt^qjzG1E2$- 47FO iG51 `C 8b%jO 815#2uD y`C`Û 1`N y71TO 5% b$jdT iJb% 1b$y6E- 47FO 7DO `C r7D`B1R`Û W t2^O_· `VaD W oaG 9iD `C 2eGe7F5$1 81R W 7j&amp;amp;O_- `B 9r#O t`Û yY5 q6R85^j# y`C`Û W y71Tb% 5% 1b$y6E 41E zt^w5%O_ 8t^O e`V1E7JO_ W `VaD W 4iHO t2^O_ 41E `B jzGO-&lt;/p&gt;

&lt;p&gt;Even though there are already some scripts around the Internet that will claim to transliterate English to Tengwar, they don&amp;#8217;t necessarily follow my mode of writing, or even a standard mode. Thus I decided, one evening, to write my own script.&lt;/p&gt;
&lt;p class=&quot;tengwar&quot;&gt;r$5$ 4`Nv&amp;amp; 47FO 7DO j#7`V2#`Û 8t^O 8z7qT1_ 7D`N5&amp;amp;2 @ 5%16R51R 41E yj%j zj`Ct% 1`N 175#8j1T7F1EO b$jdT 1`N 1b$y6E· 4`V`Û 25^1 5iFiF87Dj%`Û ej^jyY t`Û t2^O W y71Tb%· 6Y r$5$ `C 815#2uD t2^O- 3iJ `B 2iF2%2$· 5^O r$5$b%· 1`N y71TO t`Û yY5 8z7qT1-&lt;/p&gt;

&lt;p&gt;I like the look of the Tengwar Annatar font, so the script would convert English text to the characters needed to render Tengwar text in that font. Eventually, I may extend it so that I can also output Tengwar using &lt;a href=&quot;http://get-software.net/macros/latex/contrib/tengwarscript/tengwarscript.pdf&quot;&gt;TengwarScript&lt;/a&gt;, a TeX package. Writing a script with Tengwar Annatar in mind is the more difficult task of the two because of the way it typesets vowels (tehtar), so adding support for TengwarScript onto the existing script would be easy.&lt;/p&gt;
&lt;p class=&quot;tengwar&quot;&gt;`B jzGO @ j`NzH W @ 1b$y6E 5#51E6E e5^1· 8`N @ 8z7qT1 y`Nm&amp;amp; z5^r6R1 b$jdT 1zFæ1 1`N @ a7DzD16R_ 5`V2$2$ 1`N 75$26R 1b$y6E 1zFæ1 5% 41E e5^1- r$5$1`Mj#j`Û· `B t`C`Û zFæ15$2 1T 8`N 41E `B z5# j#8`N `N1Uq1U 1b$y6E iJb% &lt;a href=&quot;http://get-software.net/macros/latex/contrib/tengwarscript/tengwarscript.pdf&quot;&gt;1b$y6E8z7qT1&lt;/a&gt;· `C 1zFæ qzDzs#O- y71Tb% `C 8z7qT1 y3G 1b$y6E 5#51E6E 5% t5%2 iG @ t7HO 2eGezGj&amp;amp;1 1iDz W @ 1y`N wzF`CiJO W @ y`C`Û 1T 1qÙiF1R_ ryYj$_ Œ19V16Eœ· 8`N 2#2b% 8qUq6Y1 e6Y 1b$y6E8z7qT1 5^1`N @ zFæiG1b% 8z7qT1 y`Nm&amp;amp; w`V `ViD`Û-&lt;/p&gt;

&lt;p&gt;As I built this thing and debugged the little errors and inconsistencies that I noticed here and there, I kept track of what it output as the result for the sentence (&amp;#8220;This was a triumph. I&amp;#8217;m making a note here: huge success!&amp;#8221;) that I used for testing. Tengwar has various little complexities – the R-rule, a distinction between voiced and voiceless &amp;#8216;th&amp;#8217;, double consonants like &amp;#8216;ch&amp;#8217; and &amp;#8216;ph&amp;#8217; and &amp;#8216;ng&amp;#8217; and &amp;#8216;rd&amp;#8217;, and vowel carriers – that make correct transliteration more difficult. When put together, the history of my testing string provides a visualization of my progress against these complexities as I improved the script.&lt;/p&gt;
&lt;p class=&quot;tengwar&quot;&gt;iD `B w`Mj%1 4iG 3b% 5#2 2w$x&amp;amp;s2$ @ j1T1jO 6R76Y_ 5#2 5%z5^8iG15$8`B`V_ 41E `B 51YiG2$ 97FO 5#2 47FO· `B zqR1 17zDz W o1E 1T `N1Uq1U iD @ 7iFj&amp;amp;1 e6Y @ 85$15$iO Œ4iG yiD `C 17`Bt&amp;amp;e- `Bt tzDb% `C 51YO 97FO- 9s&amp;amp;O 8zJ8iF_Áœ 41E `B iJ2$ e6Y 1iF1b%- 1b$y6E 9iD r7D`B`NiJ j1T1jO zt^qjzFæ1T`B`V_  @ 6\7j&amp;amp;O· `C 2iG15%z1`B5^ w1Ry`V5$ r`NiG2$ 5#2 r`NiGj$iF_ 3· 2`Nw&amp;amp;jO z5^85^5#1_ jzGO a 5#2 e 5#2 b 5#2 u· 5#2 ryYj$ z6E7`B6R_  41E tzDO z6Y7zF1 175#8j1T7F1E`B5^ t7HO 2eGezGj&amp;amp;1- o5$ q1U 1s^4$6R· @ 9iG17H`Û W t`Û 1iF1b% 817b% q7r^2%O_ `C riG`Mj#,G1E`B5^ W t`Û q7x^7iF_ x#`C5%81 4iFO zt^qjzFæ1T`B`V_ iD `B t%q7r^2$ @ 8z7qT1-&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/elvish-debugging.png&quot; style=&quot;display: block; margin: 0 auto;&quot; alt=&quot;&quot; height=&quot;300&quot; /&gt;&lt;/p&gt;

&lt;p&gt;With the finished product, now I can take my favorite poems and stories, pass them through the transliterator, render the resulting text using the Tengwar Annatar font, and send that document to my Kindle! There are still some little details which could be improved upon, but I&amp;#8217;m pleased with the result so far.&lt;/p&gt;
&lt;p class=&quot;tengwar&quot;&gt;y3G @ e5%dT2$ q72^zJ1· 5yY `B z5# 1zDO t`Û er#7H1TO q`Nt$_ 5#2 817H`B`V_· qiD_ 4t$ 37`Nv&amp;amp; @ 175#8j1T7F1E6Y· 75$26R @ 7iFj&amp;amp;1b% 1zFæ1 iJb% @ 1b$y6E 5#51E6E e5^1· 5#2 85$2 41E 2zHt&amp;amp;5$1 1`N t`Û z5%2jOÁ 47FO 7DO 81j%j 8t^O j1T1jO 21R`Cj%_ oaG z`Nm&amp;amp; w`V t%q7r^2$ qU5^· w1U `Bt qj`ViD2$ y3G @ 7iFj&amp;amp;1 8`N e6E-&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/images/elvish-kindle-cropped.jpg&quot; style=&quot;display: block; margin: 0 auto;&quot; alt=&quot;&quot; width=&quot;573&quot; /&gt;&lt;/p&gt;

&lt;p&gt;(&lt;a href=&quot;https://gist.github.com/csvoss/e58302f7394a57860c46&quot;&gt;&lt;b&gt;english_to_tengwar.py&lt;/b&gt;&lt;/a&gt; on GitHub Gist)
&amp;mdash;
&lt;span class=&quot;tengwar&quot;&gt;Œ&lt;a href=&quot;https://gist.github.com/csvoss/e58302f7394a57860c46&quot;&gt;&lt;b&gt;b$jdT·1`N·1b$y6E-q`Û&lt;/b&gt;&lt;/a&gt; 5^ s3Gw&amp;amp; siG1œ&lt;/span&gt;&lt;/p&gt;
</description>
        <pubDate>Thu, 17 Sep 2015 00:00:00 +0000</pubDate>
        <link>https://csvoss.com//transliterating-tengwar</link>
        <guid isPermaLink="true">https://csvoss.com//transliterating-tengwar</guid>
        
        
        <category>projects</category>
        
      </item>
      
    
  </channel>
</rss>
