<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ali="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.1">
   <front>
      <journal-meta>
         <journal-id journal-id-type="publisher-id">peerj</journal-id>
         <journal-id journal-id-type="pmc">peerj</journal-id>
         <journal-id journal-id-type="nlm-ta">PeerJ</journal-id>
         <journal-title-group>
            <journal-title>PeerJ</journal-title>
            <abbrev-journal-title abbrev-type="publisher">PeerJ</abbrev-journal-title>
         </journal-title-group>
         <issn pub-type="epub">2167-8359</issn>
         <publisher>
            <publisher-name>PeerJ Inc.</publisher-name>
            <publisher-loc>San Francisco, USA</publisher-loc>
         </publisher>
      </journal-meta>
      <article-meta>
         <article-id pub-id-type="publisher-id">4320</article-id>
         <article-id pub-id-type="doi">10.7717/peerj.4320</article-id>
         <article-categories>
            <subj-group subj-group-type="categories">
               <subject>Bioinformatics</subject>
               <subject>Ecology</subject>
               <subject>Genomics</subject>
               <subject>Microbiology</subject>
            </subj-group>
         </article-categories>
         <title-group>
            <article-title>Linking pangenomes and metagenomes: the <italic>Prochlorococcus</italic> metapangenome</article-title>
         </title-group>
         <contrib-group content-type="authors">
            <contrib id="author-1" contrib-type="author" corresp="yes">
               <name>
                  <surname>Delmont</surname>
                  <given-names>Tom O.</given-names>
               </name>
               <email>tomodelmont@gmail.com</email><xref ref-type="aff" rid="aff-1">1</xref></contrib>
            <contrib id="author-2" contrib-type="author" corresp="yes">
               <name>
                  <surname>Eren</surname>
                  <given-names>A. Murat</given-names>
               </name>
               <email>a.murat.eren@gmail.com</email>
               <email>meren@uchicago.edu</email><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref></contrib>
            <aff id="aff-1"><label>1</label><institution>Department of Medicine, University of Chicago</institution>, <city>Chicago</city>, <state>IL</state>, <country>United States of America</country></aff>
            <aff id="aff-2"><label>2</label><institution>Josephine Bay Paul Center, Marine Biological Laboratory</institution>, <city>Woods Hole</city>, <state>MA</state>, <country>United States of America</country></aff>
         </contrib-group>
         <contrib-group content-type="editors">
            <contrib contrib-type="editor">
               <name>
                  <surname>Bajic</surname>
                  <given-names>Vladimir</given-names>
               </name>
            </contrib>
         </contrib-group>
         <pub-date pub-type="epub" date-type="pub" iso-8601-date="2018-01-25">
            <day>25</day>
            <month>1</month>
            <year iso-8601-date="2018">2018</year>
         </pub-date>
         <volume>6</volume>
         <elocation-id>e4320</elocation-id>
         <history>
            <date date-type="received" iso-8601-date="2017-10-13">
               <day>13</day>
               <month>10</month>
               <year iso-8601-date="2017">2017</year>
            </date>
            <date date-type="accepted" iso-8601-date="2018-01-13">
               <day>13</day>
               <month>1</month>
               <year iso-8601-date="2018">2018</year>
            </date>
         </history>
         <permissions>
            <copyright-statement>©2018 Delmont and Eren</copyright-statement>
            <copyright-year>2018</copyright-year>
            <copyright-holder>Delmont and Eren</copyright-holder>
            <license xlink:href="http://creativecommons.org/licenses/by/4.0/">
               <license-p>This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.</license-p>
            </license>
         </permissions>
         <self-uri xlink:href="https://peerj.com/articles/4320"/>
         <abstract>
            <p>Pangenomes offer detailed characterizations of core and accessory genes found in a set of closely related microbial genomes, generally by clustering genes based on sequence homology. In comparison, metagenomes facilitate highly resolved investigations of the relative distribution of microbial genomes and individual genes across environments through read recruitment analyses. Combining these complementary approaches can yield unique insights into the functional basis of microbial niche partitioning and fitness, however, advanced software solutions are lacking. Here we present an integrated analysis and visualization strategy that provides an interactive and reproducible framework to generate pangenomes and to study them in conjunction with metagenomes. To investigate its utility, we applied this strategy to a <italic>Prochlorococcus</italic> pangenome in the context of a large-scale marine metagenomic survey. The resulting <italic>Prochlorococcus</italic> metapangenome revealed remarkable differential abundance patterns between very closely related isolates that belonged to the same phylogenetic cluster and that differed by only a small number of gene clusters in the pangenome. While the relationships between these genomes based on gene clusters correlated with their environmental distribution patterns, phylogenetic analyses using marker genes or concatenated single-copy core genes did not recapitulate these patterns. The metapangenome also revealed a small set of core genes that mostly occurred in hypervariable genomic islands of the <italic>Prochlorococcus</italic> populations, which systematically lacked read recruitment from surface ocean metagenomes. Notably, these core gene clusters were all linked to sugar metabolism, suggesting potential benefits to <italic>Prochlorococcus</italic> from a high sequence diversity of sugar metabolism genes. The rapidly growing number of microbial genomes and increasing availability of environmental metagenomes provide new opportunities to investigate the functioning and the ecology of microbial populations, and metapangenomes can provide unique insights for any taxon and biome for which genomic and sufficiently deep metagenomic data are available.</p>
         </abstract>
         <kwd-group kwd-group-type="author">
            <kwd>Comparative genomics</kwd>
            <kwd>Metagenomics</kwd>
            <kwd>Microbial ecology</kwd>
            <kwd>Metapangenomics</kwd>
            <kwd>anvi’o</kwd>
            <kwd>Hypervariable genomic islands</kwd>
            <kwd>Sugar metabolism</kwd>
            <kwd>Pangenomics</kwd>
            <kwd>TARA Oceans</kwd>
         </kwd-group>
         <funding-group>
            <award-group id="fund-1">
               <funding-source>University of Chicago</funding-source>
            </award-group>
            <funding-statement>This work was supported by the Frank R. Lillie Research Innovation Award, and startup funds from the University of Chicago. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</funding-statement>
         </funding-group>
      </article-meta>
   </front>
   <body>
      <sec sec-type="intro">
         <title>Introduction</title>
         <p>During the last two decades, the genomic content of more than 100,000 microbial isolates has been characterized and used to study the gene pool, adaptation capabilities, and evolution of microorganisms (<xref ref-type="bibr" rid="ref-65">Smith et al., 1997</xref>; <xref ref-type="bibr" rid="ref-2">Alm et al., 1999</xref>; <xref ref-type="bibr" rid="ref-44">Makarova et al., 2006</xref>; <xref ref-type="bibr" rid="ref-39">Kumar et al., 2011</xref>; <xref ref-type="bibr" rid="ref-23">Fernández-Gómez et al., 2013</xref>). Cultivation-based approaches have paved the way for the emergence of powerful strategies to identify core and accessory genes shared between closely related genomes through pangenomics (<xref ref-type="bibr" rid="ref-58">Read et al., 2003</xref>; <xref ref-type="bibr" rid="ref-69">Tettelin et al., 2005</xref>; <xref ref-type="bibr" rid="ref-78">Zhu et al., 2015</xref>). Genomic comparisons of isolates can shed light on the biogeographic partitioning of variable genes within microbial lineages based on isolation source (<xref ref-type="bibr" rid="ref-59">Reno et al., 2009</xref>; <xref ref-type="bibr" rid="ref-53">Porter et al., 2016</xref>). Yet <italic>de novo</italic> investigations of the role of genomic traits in the adaptation of microorganisms to the environment remain difficult as cultivation alone does not offer insights into the abundance or distribution patterns of isolated populations.</p>
         <p>Shotgun metagenomics, the sequencing of DNA directly extracted from the environment (<xref ref-type="bibr" rid="ref-28">Handelsman et al., 1998</xref>), allows the study of microbial communities without the need for cultivation. As of today, metagenomic data originating from a wide range of ecosystems make up a large fraction of the sequences stored in public databases (<xref ref-type="bibr" rid="ref-55">Qin et al., 2010</xref>; <xref ref-type="bibr" rid="ref-11">Bork et al., 2015</xref>). Researchers have used metagenomics to discover new bioactive molecules (<xref ref-type="bibr" rid="ref-43">Lorenz &amp; Eck, 2005</xref>; <xref ref-type="bibr" rid="ref-70">Thies et al., 2016</xref>), investigate the functional potential of ecosystems (<xref ref-type="bibr" rid="ref-71">Tringe et al., 2005</xref>; <xref ref-type="bibr" rid="ref-1">Al-Amoudi et al., 2016</xref>), and access the genomic context of uncultivated microorganisms (<xref ref-type="bibr" rid="ref-72">Tyson et al., 2004</xref>; <xref ref-type="bibr" rid="ref-29">Haroon et al., 2016</xref>; <xref ref-type="bibr" rid="ref-17">Delmont et al., 2017</xref>). Metagenomic data also provide a means to quantify the abundance and relative distribution of genomes in environmental samples through read recruitment (<xref ref-type="bibr" rid="ref-72">Tyson et al., 2004</xref>; <xref ref-type="bibr" rid="ref-18">Dutilh et al., 2014</xref>; <xref ref-type="bibr" rid="ref-20">Eren et al., 2015</xref>). Although the environmental signal resulting from such analyses provides insights into the ecological niche of individual populations (<xref ref-type="bibr" rid="ref-64">Sharon et al., 2013</xref>; <xref ref-type="bibr" rid="ref-7">Bendall et al., 2016</xref>; <xref ref-type="bibr" rid="ref-4">Anderson et al., 2017</xref>; <xref ref-type="bibr" rid="ref-56">Quince et al., 2017</xref>), this approach alone does not reveal to what extent genes that may be linked to the ecology and fitness of microbes are conserved within a phylogenetic clade.</p>
         <p>Recently, pangenomic approaches have been used to characterize the gene content of microbial populations in environmental samples through metagenomic read recruitment (<xref ref-type="bibr" rid="ref-16">Delmont &amp; Eren, 2016</xref>; <xref ref-type="bibr" rid="ref-63">Scholz et al., 2016</xref>; <xref ref-type="bibr" rid="ref-50">Nayfach et al., 2016</xref>). Combining well-established practices from pangenomics (identifying gene clusters and inferring relationships between genomes based on shared genes), with the emerging opportunities from metagenomics (the ability to track populations precisely across environments through genome-wide read recruitment) could provide a framework to investigate the ecological role of gene clusters that may be linked to the niche partitioning and fitness of microbial populations. To explore the potential of this concept, we developed a novel workflow within an existing open-source software platform (<xref ref-type="bibr" rid="ref-20">Eren et al., 2015</xref>), and characterized the metapangenome of <italic>Prochlorococcus</italic> isolates and single-cell genomes on a large scale.</p>
         <p><italic>Prochlorococcus</italic> is an extensively studied photosynthetic bacterial taxon abundant in the euphotic zone of low latitude marine systems (<xref ref-type="bibr" rid="ref-12">Chisholm et al., 1988</xref>; <xref ref-type="bibr" rid="ref-51">Olson et al., 1990</xref>; <xref ref-type="bibr" rid="ref-62">Rusch et al., 2010</xref>), which fixes a substantial amount of carbon from the atmosphere (<xref ref-type="bibr" rid="ref-24">Flombaum et al., 2013</xref>). Cultivation efforts targeting <italic>Prochlorococcus</italic> resulted in the recovery of genomes that represent members from five major phylogenetic clades divided into groups that are adapted to high-light (sub-clades HL-I and HL-II) or low-light (sub-clades LL-I, LL-II, LL-III, and LL-IV) (<xref ref-type="bibr" rid="ref-9">Biller et al., 2014a</xref>). Environmental surveys and culture experiments revealed the ecological niche and temporal dynamics of HL and LL <italic>Prochlorococcus</italic> ecotypes in the oceans, as well as correlations between the genomic traits of isolates and their response to environmental variables (<xref ref-type="bibr" rid="ref-75">West et al., 2001</xref>; <xref ref-type="bibr" rid="ref-61">Rocap et al., 2003</xref>; <xref ref-type="bibr" rid="ref-45">Malmstrom et al., 2010</xref>). A previous study by <xref ref-type="bibr" rid="ref-13">Coleman &amp; Chisholm (2010)</xref> used a pangenome of 12 <italic>Prochlorococcus</italic> isolates to discuss the differential occurrence in <italic>Prochlorococcus</italic> populations between two sampling stations after identifying core versus accessory genes and observing that only a few genes differed significantly in abundance between the sites. In addition, <xref ref-type="bibr" rid="ref-37">Kent et al. (2016)</xref> showed a strong association between the <italic>Prochlorococcus</italic> accessory gene functions and the community composition of this lineage on a large scale using metagenomes from the Global Ocean Sampling expedition. Yet to the best of our knowledge, pangenomes have never been linked to metagenomes at an appropriate resolution to monitor the distribution of individual gene clusters. Monitoring individual gene clusters is essential to scrutinize their prevalence across multiple microbial genomes, and infer associations regarding their potential role in fitness and niche partitioning of microbial populations to which they belong.</p>
         <p>Here we investigated the gene clusters we identified in 31 <italic>Prochlorococcus</italic> isolates in conjunction with their occurrence in the surface of marine systems using 30.9 billion metagenomic reads from the TARA Oceans Project (<xref ref-type="bibr" rid="ref-67">Sunagawa et al., 2015</xref>). Our investigation revealed that closely related <italic>Prochlorococcus</italic> populations sharing the same high-light niche (i.e., near the surface) exhibit considerable differences in their relative abundance that could be explained by a small number of differentially occurring gene clusters. Finally, we extended our analysis of 31 isolates with 74 single-amplified genomes (SAGs) and revealed intriguing patterns within <italic>Prochlorococcus</italic> hypervariable genomic islands by quantifying the link between individual gene clusters and the environment</p>
      </sec>
      <sec sec-type="materials|methods">
         <title>Materials and Methods</title>
         <p>The URL <ext-link ext-link-type="uri" xlink:href="http://merenlab.org/data/2018_Delmont_and_Eren_Metapangenomics/">http://merenlab.org/data/2018_Delmont_and_Eren_Metapangenomics/</ext-link> contains a reproducible workflow that extends the descriptions and parameters of programs used in our study to (1) compute the <italic>Prochlorococcus</italic> pangenome using 31 isolate genomes, (2) profile reads isolate genomes recruited from metagenomes, and (3) generate a metapangenome for <italic>Prochlorococcus</italic>.</p>
         <sec>
            <title>Genomes and metagenomes</title>
            <p>We acquired 31 isolate genomes and 74 SAGs (minimum length &gt;1 Mbp) of <italic>Prochlorococcus</italic> from the National Center for Biotechnology Information (NCBI), and downloaded 93 TARA Oceans metagenomes from the European bioinformatics institute (EBI) repositories. <xref ref-type="supplementary-material" rid="supp-4">Table S1</xref> reports accession numbers and other information for each isolate genome, SAG and metagenome.</p>
         </sec>
         <sec>
            <title>Data preparation, quality filtering, and read recruitment</title>
            <p>We removed the low-quality reads from the TARA Oceans dataset using ‘iu-filter-quality-minoche’, which is a program in illumina-utils v1.4.1 (<xref ref-type="bibr" rid="ref-21">Eren et al., 2013</xref>) (available from <ext-link ext-link-type="uri" xlink:href="https://github.com/merenlab/illumina-utils">https://github.com/merenlab/illumina-utils</ext-link>), which implements the noise filtering parameters described by <xref ref-type="bibr" rid="ref-46">Minoche, Dohm &amp; Himmelbauer (2011)</xref>. After simplifying the header lines of 31 FASTA files for <italic>Prochlorococcus</italic> isolate genomes using the anvi’o script ‘reformat-fasta’, we concatenated all FASTA files into a single file, and used Bowtie2 (<xref ref-type="bibr" rid="ref-40">Langmead &amp; Salzberg, 2012</xref>) with default parameters and the additional ‘--no-unal’ flag to recruit quality-filtered short metagenomic reads on to <italic>Prochlorococcus</italic> isolate genomes (‘read recruitment’ is an analogous term to ‘mapping’, or ‘short read alignment’). We used samtools (<xref ref-type="bibr" rid="ref-42">Li et al., 2009</xref>) to convert resulting SAM files into sorted and indexed BAM files.</p>
         </sec>
         <sec>
            <title>Phylogenomic analysis</title>
            <p>We used Phylosift v1.0.1 (<xref ref-type="bibr" rid="ref-15">Darling et al., 2014</xref>) with default parameters to quantify evolutionary distances between genomes. Briefly, Phylosift (1) identifies a set of 37 marker gene families in each genome, (2) concatenates the alignment of each marker gene family across genomes, and (3) computes a phylogenomic tree from the concatenated alignment using FastTree 2.1 (<xref ref-type="bibr" rid="ref-54">Price, Dehal &amp; Arkin, 2010</xref>). We finalized the phylogenomic tree by setting a midpoint root with FigTree v.1.4.3 (<xref ref-type="bibr" rid="ref-57">Rambaut, 2009</xref>).</p>
         </sec>
         <sec>
            <title>Analysis of metagenomic read recruitment</title>
            <p>We used anvi’o (<xref ref-type="bibr" rid="ref-20">Eren et al., 2015</xref>) v3 (available from <ext-link ext-link-type="uri" xlink:href="http://merenlab.org/software/anvio/">http://merenlab.org/software/anvio/</ext-link>) to profile the read recruitment results following the workflow outlined by <xref ref-type="bibr" rid="ref-20">Eren et al. (2015)</xref>. Briefly, we first used the program ‘anvi-gen-contigs-database’ to profile <italic>Prochlorococcus</italic> genomes, during which Prodigal v2.6.3 (<xref ref-type="bibr" rid="ref-31">Hyatt et al., 2010</xref>) with default settings identified open reading frames. We used InterProScan v5.17-56 (<xref ref-type="bibr" rid="ref-77">Zdobnov &amp; Apweiler, 2001</xref>) and eggNOG-mapper v0.12.6 (<xref ref-type="bibr" rid="ref-30">Huerta-Cepas et al., 2016</xref>) outputs for our genes with the program ‘anvi-import-functions’ to import annotations from other databases, including PFAM (<xref ref-type="bibr" rid="ref-6">Bateman et al., 2004</xref>), and eggNOG (<xref ref-type="bibr" rid="ref-33">Jensen et al., 2008</xref>). We then used the program ‘anvi-run-ncbi-cogs’ to annotate genes with functions by searching them against the December 2014 release of the Clusters of Orthologous Groups (COGs) database (<xref ref-type="bibr" rid="ref-68">Tatusov et al., 2000</xref>) using blastp v2.3.0+ (<xref ref-type="bibr" rid="ref-3">Altschul et al., 1990</xref>). We finally used the program ‘anvi-profile’ to process the BAM file and generate an anvi’o profile database, which stored the coverage and detection statistics of each <italic>Prochlorococcus</italic> genome in the TARA Oceans data. We used ‘anvi-import-collection’ to link contigs to genomes from which they originate. Finally, the program ‘anvi-summarize’ generated a static HTML output that gave access to the mean coverage values of each genome (and individual genes within them) across metagenomes.</p>
         </sec>
         <sec>
            <title>Operational definition of ‘population’</title>
            <p>In the context of our study we define ‘population’ as an agglomerate of naturally occurring microbial cells, genomes of which are similar enough to align to the same genomic reference with high sequence identity as defined by the read recruitment stringency. Therefore, we assume that the isolate genomes in our study provide access to environmental populations to which they belong through the recruitment of short metagenomic reads.</p>
         </sec>
         <sec>
            <title>Criterion for ‘detection’</title>
            <p>Assessing the occurrence of low abundance genomes in complex data accurately can be problematic due to non-specific recruitment of short reads to regions that are conserved across multiple populations. For instance, although <italic>Prochlorococcus</italic> populations are virtually absent from the Southern Ocean (<xref ref-type="bibr" rid="ref-24">Flombaum et al., 2013</xref>), our genomes recruited up to 0.01% of the metagenomic reads from the Southern Ocean metagenomes matching to non-specific targets. To avoid high false-detection rates, we assumed that a genome was ‘detected’ in a given metagenome only if more than 50% of its nucleotide positions had at least 1X coverage.</p>
         </sec>
         <sec>
            <title>Classification of isolate genes as ‘environmental core’ and ‘environmental accessory’</title>
            <p>Assuming the environmental niche of a population is defined by the metagenomes in which it is ‘detected’, here we define ‘environmental core genes’ of a population as the genes that are systematically detected in its niche. In contrast, the genes that are not systematically detected within the niche of a given population represent its environmental accessory genes. Genes in a population that are classified as ‘environmental core’ given metagenomic data can be classified as ‘accessory’ given a pangenome, and vice versa. To avoid any confusion between these operationally distinct class designations, we refer to the genes classified given the metagenomic data as the ‘environmental core genes’ (ECGs), and the ‘environmental accessory genes’ (EAGs). To identify ECGs and EAGs for each genome independently, we used the anvi’o script ‘anvi-script-gen-distribution-of-genes-in-a-bin’ with the parameter ‘--fraction-of-median-coverage 0.25′. This script recovers the sum of coverage values for each gene in a given genome across all metagenomes in which the population is ‘detected’, and marks the genes that have less than 25% of the median coverage of all genes found in the genome as EAGs. We then visualized resulting gene classes using the program ‘anvi-interactive’.</p>
         </sec>
         <sec>
            <title>Computing the pangenome, and the definition of gene clusters</title>
            <p>The anvi’o pangenomic workflow developed for this study consists of three major steps: (1) generating an anvi’o genome database (‘anvi-gen-genomes-storage’) to store DNA and amino acid sequences, as well as functional annotations of each gene in genomes under consideration, (2) computing the pangenome (‘anvi-pan-genome’) from a genome database by identifying ‘gene clusters’, and (3) displaying the pangenome (‘anvi-display-pan’) to visualize the distribution of gene clusters across genomes, interactively bin gene clusters into logical groups, and inspect the alignment of genes in a given cluster interactively. In our study, a ‘gene cluster’ represents sequences of one or more predicted open reading frames grouped together based on their homology at the translated DNA sequence level. Gene clusters with more than one sequence may contain orthologous or paralogous sequences, or both, from one or more genomes analyzed in the pangenome. To compute the <italic>Prochlorococcus</italic> pangenome, we first generated an ‘anvi’o genomes storage database’ from the FASTA files of 31 <italic>Prochlorococcus</italic> isolate genomes using the ‘--internal-genomes’ flag. We then used the program ‘anvi-pan-genome’ with the genomes storage database, the flag ‘--use-ncbi-blast’, and parameters ‘--minbit 0.5′, and ‘--mcl-inflation 10′. This program (1) calculates similarities of each amino acid sequence in every genome against every other amino acid sequence using blastp (<xref ref-type="bibr" rid="ref-3">Altschul et al., 1990</xref>), (2) removes weak hits using the ‘minbit heuristic’, which was originally described in ITEP (<xref ref-type="bibr" rid="ref-8">Benedict et al., 2014</xref>), to filter weak hits based on the aligned fraction between the two reads, (3) uses the MCL algorithm (<xref ref-type="bibr" rid="ref-73">Van Dongen &amp; Abreu-Goodger, 2012</xref>) to identify gene clusters in the remaining blastp search results, (4) computes the occurrence of gene clusters across genomes and the total number of genes they contain, (5) performs hierarchical clustering analyses for gene clusters (based on their distribution across genomes) and for genomes (based on gene clusters they share) using Euclidean distance and Ward clustering by default, and finally (6) generates an anvi’o pan database that stores all results for downstream analyses and can be visualized by the program ‘anvi-display-pan’.</p>
         </sec>
         <sec>
            <title>Computing the metapangenome</title>
            <p>Here we define ‘metapangenome’ as the outcome of the analysis of pangenomes in conjunction with the environment where the abundance and prevalence of gene clusters and genomes are recovered through shotgun metagenomes. To connect the environmental distribution patterns of genomes to the <italic>Prochlorococcus</italic> pangenome, we used the program ‘anvi-gen-samples-database’ with the genome coverage estimates reported in the summary of the anvi’o profile database for metagenomic data. To quantify the ratio of ‘environmental core genes’ (ECGs) and the ‘environmental accessory genes’ (EAGs) in each gene cluster in the resulting pangenome, we used the anvi’o program ‘anvi-script-gen-environmental-core-summary’ with default parameters. The program ‘anvi-display-pan’ visualized the <italic>Prochlorococcus</italic> metapangenome, and ‘anvi-summarize’ generated a summary of gene clusters.</p>
         </sec>
         <sec>
            <title>Analysis of <italic>Prochlorococcus</italic> single-amplified genomes</title>
            <p>We performed a pangenomic analysis combining the 74 SAGs and 31 isolate genomes of <italic>Prochlorococcus</italic> following the same workflow as for the isolate genomes alone. From the 74 SAGs, we then selected five phylogenetically distant ones and performed a metapangenomic analysis following the same workflow as for the isolate genomes (including the same metagenomic dataset). Our selection of few distant SAGs was intended to minimize the dilution effect due to competing read recruitment onto identical regions from multiple genomes.</p>
         </sec>
         <sec>
            <title>Visualizations</title>
            <p>We used the ggplot2 (<xref ref-type="bibr" rid="ref-25">Ginestet, 2011</xref>) library for R to visualize the relative distribution of genomic groups on the world map. Anvi’o performed all other visualizations, and we finalized our figures for publication using Inkscape, an open-source vector graphics editor (available from <ext-link ext-link-type="uri" xlink:href="http://inkscape.org/">http://inkscape.org/</ext-link>).</p>
         </sec>
      </sec>
      <sec sec-type="results">
         <title>Results</title>
         <sec>
            <title>Environmental distribution of <italic>Prochlorococcus</italic> isolate genomes</title>
            <p>To estimate the abundance and relative distribution patterns of the 31 <italic>Prochlorococcus</italic> isolate genomes in environmental samples, we mapped to them 30.9 billion quality-filtered metagenomic short reads from 93 TARA Oceans samples (0.2–3 µm planktonic size fraction) that cover the Atlantic Ocean, Pacific Ocean, Indian Ocean, Southern Ocean, Mediterranean Sea and Red Sea (<xref ref-type="supplementary-material" rid="supp-4">Table S1</xref>). <italic>Prochlorococcus</italic> genomes recruited 1.68 billion reads (5.44% of the dataset) from the surface (0–15 m depth; <italic>n</italic> = 61), and the subsurface chlorophyll maximum layer (17–95 m depth; <italic>n</italic> = 32) metagenomes. The relative distribution of all <italic>Prochlorococcus</italic> genomes ranged from below the detection limit in the Southern Ocean to 24.1% in a surface metagenome from the Indian Ocean (<xref ref-type="supplementary-material" rid="supp-5">Table S2</xref>).</p>
            <p>In agreement with the literature, genomes from the Clade LL-II and Clade LL-III were not detected in the metagenomic dataset: although the isolation source for most LL-II/III genomes were 120 m (<xref ref-type="bibr" rid="ref-60">Rocap et al., 2002</xref>), the subsurface samples in TARA Oceans metagenomes averaged 53.7 m and never exceeded 100 m. The remaining clades displayed contrasting distribution patterns. The HL-I and HL-II genomes were enriched in surface samples, but they were geographically antagonistic: HL-I dominated in the Mediterranean Sea, while HL-II, the most abundant <italic>Prochlorococcus</italic> clade in the dataset, occurred mostly in the Indian Ocean and Red Sea (<xref ref-type="supplementary-material" rid="supp-1">Fig. S1</xref>). Read recruitment results were also in line with previous observations suggesting temperature as one of the main drivers of distribution patterns of HL-I and HL-II (<xref ref-type="bibr" rid="ref-34">Johnson et al., 2006</xref>; <xref ref-type="bibr" rid="ref-10">Biller et al., 2014b</xref> and references therein), as 93% and 95% of the reads recruited by the HL-I and HL-II genomes originated from samples that were below and above 22°C, respectively. The LL-I and LL-IV genomes (more characteristic to the subsurface layer) were also detected in different geographic locations, but in lower proportions (<xref ref-type="supplementary-material" rid="supp-5">Table S2</xref>). Overall, the trends observed here are largely consistent with results from previous environmental surveys and culture experiments (<xref ref-type="bibr" rid="ref-34">Johnson et al., 2006</xref>; <xref ref-type="bibr" rid="ref-41">Larkin et al., 2016</xref>), and emphasize the limited niche overlap of <italic>Prochlorococcus</italic> clades in the euphotic layer of marine systems on a large scale.</p>
         </sec>
         <sec>
            <title>The pangenome of <italic>Prochlorococcus</italic> isolate genomes</title>
            <p>Our pangenomic analysis of the 31 <italic>Prochlorococcus</italic> isolate genomes with a total of 60,054 genes resulted in 7,385 gene clusters. We grouped these gene clusters into five bins based on their occurrence across genomes: (1) HL + LL core gene clusters (<italic>n</italic> = 766), (2) HL core gene clusters (<italic>n</italic> = 492), (3) LL core gene clusters (<italic>n</italic> = 144), (4) singletons (i.e., gene clusters associated with a single genome; <italic>n</italic> = 2,215), and (5) other gene clusters that do not fit any of these classes (<italic>n</italic> = 3,768) (<xref ref-type="supplementary-material" rid="supp-2">Fig. S2</xref>). The singletons and HL + LL core gene clusters corresponded to 30% and 10.4% of all clusters, respectively. This relatively small core genome is consistent with previous pangenomic investigations and supports the concept of a <italic>Prochlorococcus</italic> ‘open pangenome’ (<xref ref-type="bibr" rid="ref-38">Kettler et al., 2007</xref>). 49.1% of all clusters contained genes that were annotated with COG functions (<xref ref-type="supplementary-material" rid="supp-6">Table S3</xref>). The functional annotation rate reached 90.5% for the HL + LL core gene clusters. In contrast, it was only 37.2% for the singletons. As the shared gene content between genomes are effective predictors of their phylogenetic relationships (<xref ref-type="bibr" rid="ref-66">Snel, Bork &amp; Huynen, 1999</xref>; <xref ref-type="bibr" rid="ref-19">Dutilh et al., 2004</xref>), we used the distribution of gene clusters to determine the relationships among our genomes. The genomic groups that emerged from this analysis matched the six <italic>Prochlorococcus</italic> phylogenetic clades (<xref ref-type="fig" rid="fig-1">Fig. 1</xref>). However, a noticeable difference emerged from the organization of clades based on gene clusters. Previous phylogenetic analyses using the internal transcribed spacer region (<xref ref-type="bibr" rid="ref-10">Biller et al., 2014b</xref>) placed LL genomes into polyphyletic clades (LL-I being an outlier), which was echoed by the phylogenomic analysis we performed in this study using 37 core genes (<xref ref-type="fig" rid="fig-1">Fig. 1</xref>). In contrast, gene clusters grouped genomes primarily based on their adaptation to light regimes (<xref ref-type="fig" rid="fig-1">Fig. 1</xref>). This result suggests that employing the whole genomic content, instead of only marker genes, may be more advantageous when the goal is to infer ecological rather than evolutionary relationships between a set of closely related genomes.</p>
            <fig id="fig-1">
               <object-id pub-id-type="doi">10.7717/peerj.4320/fig-1</object-id><label>Figure 1</label><caption>
                  <title>Organization of <italic>Prochlorococcus</italic> genomes based on shared gene clusters compared to phylogenomics.</title>
                  <p>The dendrograms on the top shows the clustering of 31 isolate genomes based on the distribution of 7,385 gene clusters recovered from the pangenomic analysis (Euclidian distance and ward clustering). The tree at the bottom organizes the same genomes based on phylogenomics using 37 concatenated core genes. Colors indicate the phylogenetic affiliations of genomes based on published literature.</p>
               </caption>
               <graphic mimetype="image" mime-subtype="png" xlink:href="https://peerj.com/articles/4320/fig-1.png"/>
            </fig>
         </sec>
         <sec>
            <title>Environmental core and accessory genes in <italic>Prochlorococcus</italic> isolate genomes</title>
            <p>Genomic islands are widespread in <italic>Prochlorococcus</italic> (<xref ref-type="bibr" rid="ref-14">Coleman et al., 2006</xref>; <xref ref-type="bibr" rid="ref-13">Coleman &amp; Chisholm, 2010</xref>) and genes from a given genome may not be found uniformly in all marine ecosystems. Besides the detection estimates at the genome level, recruiting reads from metagenomic data also provides an opportunity to investigate the occurrence and relative distribution of individual genes. We used read recruitment statistics to differentiate genes that co-occurred with the population across metagenomes from those that consistently failed to recruit reads from the environment despite the occurrence of the population. While the first group of genes is common to most cells in a given population (i.e., connected to the environment), the second group of genes occurs only in a fraction of the members of the population, or shows sporadic distribution patterns across environments (i.e., not connected to the environment). This analysis revealed 42,777 environmental core genes (ECGs) and 6,528 environmental accessory genes (EAGs) in 25 <italic>Prochlorococcus</italic> genomes (genomes from the Clade LL-II and Clade L-III were not detected in the metagenomic data, hence did not yield any estimates) (<xref ref-type="supplementary-material" rid="supp-6">Table S3</xref>). The EAGs represented in average 13.4% (±4.65%) of all genes for each <italic>Prochlorococcus</italic> genome, exposing a non-negligible, and relatively stable portion of genes occurring only in a small subset of the cells within each population to which we had access through the genomic database and metagenomic data, consistent with previous metagenomic surveys of this lineage (<xref ref-type="bibr" rid="ref-13">Coleman &amp; Chisholm, 2010</xref>). The synteny of most EAGs in a given genome were not random, and they mostly were clustered into hypervariable genomic islands (<xref ref-type="fig" rid="fig-2">Fig. 2</xref>). The classification of the genes in an isolate genome based on their environmental connectivity through metagenomics offers unique insights regarding their occurrence within a population. Furthermore, this particular use of metagenomes is also essential to subsequently quantify the environmental connectivity of genes in pangenomes.</p>
            <fig id="fig-2">
               <object-id pub-id-type="doi">10.7717/peerj.4320/fig-2</object-id><label>Figure 2</label><caption>
                  <title>The gene-level detection of isolates from HL-I and HL-II in TARA Oceans metagenomes.</title>
                  <p>Visualizations describe the gene-level niche partitioning of EQPAC1 and MIT9314, two isolates from the clade HL-I and HL-II, across 93 metagenomes from TARA Oceans. For each isolate, the genes are organized based on their order in the genome, and each layer corresponds to a metagenome, which are colored based on temperature (&lt;22 °C versus &gt;22 °C, accordingly to <xref ref-type="fig" rid="fig-1">Fig. 1</xref>). The most outer layer describes the environmental connectivity of each gene. Environmental core and accessory genes are colored in green and red, respectively. Genomic sections enriched in environmental accessory genes correspond to hypervariable regions.</p>
               </caption>
               <graphic mimetype="image" mime-subtype="png" xlink:href="https://peerj.com/articles/4320/fig-2.png"/>
            </fig>
            <fig id="fig-3">
               <object-id pub-id-type="doi">10.7717/peerj.4320/fig-3</object-id><label>Figure 3</label><caption>
                  <title>The metapangenome of Prochlorococcus.</title>
                  <p>Each one of the 7,385 gene clusters contains one or more genes contributed by one or more isolate genomes. Bars in the 31 first layers indicate the occurrence of gene clusters in a given isolate genome. Gene clusters are organized based on their distribution across genomes (i.e., gene clusters that co-occur in the same group of isolates are closer to each other), and genomes are organized based on gene clusters they share using Euclidian distance and ward ordination. The three next layers describe the gene clusters in which at least one gene was functionally annotated using Pfams, EggNOGs, or COGs. Another layer describes the ratio of environmental core versus environmental accessory genes (ECGs/EAGs) within each PC. Gray areas account for the genes in genomes undetected in the metagenomic dataset.Finally, the last layer corresponds to our selections of gene clusters. The “HL + LL Core” selection corresponds to the gene clusters that contained genes from all genomes. The “LL Core” and “HL Core” selections correspond to clusters that contained genes characteristic to the LL- and HL-adapted genomes, respectively. The last selection (“Singletons”) corresponds to clusters that contained one or multiple genes from a single genome. The right-hand side section of the figure provides additional data for each isolate. The bottom rectangle displays the relative distribution of genomes across 93 metagenomes and is followed by layers that show the average distribution of each isolate in the metagenomic dataset and the phylogenetic clades to which they belong. The dendrograms on the top represents the hierarchical clustering of genomes based on the occurrence of gene clusters.</p>
               </caption>
               <graphic mimetype="image" mime-subtype="png" xlink:href="https://peerj.com/articles/4320/fig-3.png"/>
            </fig>
         </sec>
         <sec>
            <title>The metapangenome reveals closely related isolates with different levels of fitness</title>
            <p>A metapangenome provides access to the environmental detection of individual genes in gene clusters, along with the ecological niche boundaries of individual genomes. The <italic>Prochlorococcus</italic> metapangenome revealed differences within the members of the Clade HL-II with respect to their rate of detection in the environment (<xref ref-type="fig" rid="fig-3">Fig. 3</xref>; see the interactive version at the URL <ext-link ext-link-type="uri" xlink:href="http://anvi-server.org/p/JNlBAB">http://anvi-server.org/p/JNlBAB</ext-link>). Interestingly, the organization of genomes in HL-II based on gene clusters matched their detection gradient within their niche, with the least abundant and the most abundant genomes in the metagenomic data being at the two extremes of the cluster that described the Clade HL-II (<xref ref-type="fig" rid="fig-3">Fig. 3</xref>, <xref ref-type="supplementary-material" rid="supp-5">Table S2</xref>). We tentatively grouped the HL-II genomes into three sub-groups based on their abundance in the metagenomic dataset: HL-II-Low (<italic>n</italic> = 3) with an average relative abundance of 0.037%, HL-II-Medium (<italic>n</italic> = 10) with an average relative abundance of 0.14%, and HL-II-High (<italic>n</italic> = 4) with an average relative abundance of 0.5%. Based on this grouping, HL-II-High genomes were 13.5 times more abundant in the environment on average compared to HL-II-Low genomes, despite being closely related enough to be described in the same phylogenetic group for HL. In light of this observation, we investigated whether the differentially distributed gene clusters could identify the functional basis of the apparent change in fitness. Noticeably, the HL-II-Low genomes were lacking gene clusters that resolve to DNA repair (DNA ligase; 3-methyladenine DNA glycosylase; DEAD DEAH box helicase) compared to the HL-II-High genomes (<xref ref-type="supplementary-material" rid="supp-6">Table S3</xref>). All 31 isolates carried DNA repair genes, as it is a critical protection mechanism towards light induced damages occurring in the surface layer of marine systems (<xref ref-type="bibr" rid="ref-32">Jeffrey et al., 1996</xref>); however, HL-II-High genomes carried a unique set of DNA repair genes that were missing from HL-II-Low genomes. Also missing from the HL-II-Low genomes were gene clusters corresponding to enzymes of the cupin superfamily, the fructose-bisphosphate aldolase class II, glutamine amino transferase, PAP fibrilin, a metal-binding protein, and 25 gene clusters to which we could not assign a function. The metapangenome provided access to genomic features that may explain the functional basis of such variation of fitness between closely related members of the HL-II group. Assuming that an increased relative abundance in the environment is equivalent to increased fitness, characterization of the genomic features that contribute to these differences, especially those of unknown functions, warrants further study.</p>
         </sec>
         <sec>
            <title>Genes and functions connect the hypervariable genomic islands of <italic>Prochlorococcus</italic> populations</title>
            <p>We then turned our attention to the key contribution of our metapangenomic workflow; the environmental connectivity of the pangenome as defined by the proportion of ECGs and EAGs found in each gene cluster. The percentage of EAGs from genomes that occurred in our metagenomic data differed markedly between the HL + LL core gene clusters (4.31%), LL core gene clusters (0.28%), HL core gene clusters (12.4%), and singletons (66%) (<xref ref-type="fig" rid="fig-3">Fig. 3</xref>; <xref ref-type="supplementary-material" rid="supp-6">Table S3</xref>). More than an order of magnitude difference between the ratio of ECGs to EAGs among the LL and HL core gene clusters suggests that, given the available isolate genomes, <italic>Prochlorococcus</italic> genes characteristic to low-light regime may be more stable than those characteristic to high-light regime. These results also indicate that genes present in all isolate genomes (HL + LL core) were maintained in a large fraction of the cells in populations we investigated, while those that are specific to a single isolate largely occurred in smaller number of cells in the environment and remained below our detection limit. Exceptions to low number of EAGs in HL + LL core were gene clusters #33, #44 and #431 (see <xref ref-type="supplementary-material" rid="supp-6">Table S3</xref>). The percentage of EAGs for these gene clusters in HL isolates were 100%, 95.2% and 95.2%, and their functions resolved to ‘nucleotide sugar epimerase’, ‘udp-glucose 6-dehydrogenase’ and ‘mannose-1-phosphate guanylyltransferase’, respectively. In contrast, these gene clusters contained only ECGs in the LL isolates (<xref ref-type="supplementary-material" rid="supp-6">Table S3</xref>). Sugar uptake by <italic>Prochlorococcus</italic> has been observed in both culture and <italic>in situ</italic> (<xref ref-type="bibr" rid="ref-26">Gomez-Baena et al., 2008</xref>; <xref ref-type="bibr" rid="ref-49">Muñoz Marín et al., 2013</xref>; <xref ref-type="bibr" rid="ref-48">Muñoz-Marín et al., 2017</xref>) and this process can support the growth of <italic>Prochlorococcus</italic> populations in the surface ocean (<xref ref-type="bibr" rid="ref-47">Moisander et al., 2012</xref>). The occurrence of multiple sugar metabolism genes in every HL isolate that are absent in almost all metagenomes poses an interesting conundrum.</p>
            <p>To investigate whether this could be due to a cultivation bias that selects for members from these populations with a certain set of sugar utilization genes, we analyzed 74 single amplified genomes (SAGs) from a study by <xref ref-type="bibr" rid="ref-36">Kashtan et al. (2014)</xref> (<xref ref-type="supplementary-material" rid="supp-7">Table S4</xref>). Our analysis revealed that these gene clusters also occurred in a large number of SAGs (75.7% to 81.1%) (<xref ref-type="supplementary-material" rid="supp-7">Table S4</xref>). Most interestingly, metapangenomic analysis of SAGs using the same metagenomic dataset and bioinformatics workflow we used for the isolates also revealed that all genes in these gene clusters were EAGs (<xref ref-type="supplementary-material" rid="supp-7">Table S4</xref>), consistent with our observations in the HL isolates, and ruling out the ‘cultivation bias’ hypothesis. Yet these results left us with a puzzling observation as we have identified <italic>Prochlorococcus</italic> gene clusters widespread in both isolate genomes and SAGs of the HL clades with genes rarely detected in the surface oceans and seas. Methodological differences could explain the conflict between the high prevalence of these gene clusters across genomes in the pangenome and the low detection of each gene in them across metagenomes: gene clusters are formed based on homology between amino acid sequences (<xref ref-type="bibr" rid="ref-69">Tettelin et al., 2005</xref>), hence can contain genes with relatively low sequence similarity, while metagenomic read recruitment is done at the DNA sequence-level, and is more stringent.</p>
            <p>Notably, genes in clusters #33, #44 and #431 occurred in hypervariable genomic islands of the isolates and SAGs (<xref ref-type="fig" rid="fig-4">Fig. 4</xref>, <xref ref-type="supplementary-material" rid="supp-6">Tables S3</xref> and <xref ref-type="supplementary-material" rid="supp-7">S4</xref>), and as a result are surrounded by other EAGs that are not part of the <italic>Prochlorococcus</italic> core genome. To the best of our knowledge this is the first time the <italic>Prochlorococcus</italic> core pangenome is linked to hypervariable genomic islands, indicating that core functionalities of this major lineage associated with sugar metabolism are maintained in a variety of versions within each population. Finally, analyzing the functionality of all EAGs led us to expose a prevalent role of sugar metabolism in hypervariable genomic islands beyond the three core gene clusters (<xref ref-type="fig" rid="fig-4">Fig. 4</xref> and <xref ref-type="supplementary-material" rid="supp-3">Fig. S3</xref>). Briefly, functions such as udp-glucose 4-epimerase, dTDP-4-dehydrorhamnose 3,5-epimerase, dTDP-4-dehydrorhamnose reductase, dTDP-glucose 4-6-dehydratase, GDP-mannose 4,6-dehydratase and glucose-1-phosphate cytidylyltransferase were dominated by EAGs and occurred mostly in hypervariable genomic islands of the HL populations (<xref ref-type="supplementary-material" rid="supp-8">Table S5</xref>). Overall, our analyses suggested a high rate of gene diversification traits for sugar metabolism in <italic>Prochlorococcus</italic> that may be contributing to the remarkable fitness of this group in the surface ocean.</p>
            <fig id="fig-4">
               <object-id pub-id-type="doi">10.7717/peerj.4320/fig-4</object-id><label>Figure 4</label><caption>
                  <title>Prevalence of sugar utilization in <italic>Prochlorococcus</italic> hypervariable genomic islands.</title>
                  <p>(A) describes the 25 most environmental accessory functions identified in <italic>Prochlorococcus</italic> isolates defined by unusually high ratio of EAGs. (B) and (C) display the coordinates of genes corresponding to the 25 most environmental accessory functions across five isolates genomes and five SAGs of <italic>Prochlorococcus</italic>, respectively (red in the outer layers). Inner layers correspond to the 93 TARA Oceans metagenomes, organized by geographic regions similarly to <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. For each metagenome, black sections correspond to well covered genes while white sections correspond to genes with no read recruitment.</p>
               </caption>
               <graphic mimetype="image" mime-subtype="png" xlink:href="https://peerj.com/articles/4320/fig-4.png"/>
            </fig>
         </sec>
      </sec>
      <sec sec-type="discussion">
         <title>Discussion</title>
         <p>The quantity of data in genomic databases and metagenomic surveys is increasing rapidly thanks to the advances in biotechnology and computation. Metapangenomes take advantage of both genomes and metagenomes to link two important endeavors in microbiology: inferring the relationships between isolate genomes through identifying the core and accessory genes they harbor <italic>de novo</italic>, and investigating the relative distribution of microbial populations and individual genes in the environment through metagenomics.</p>
         <p>Our metapangenomic workflow has similarities to the method described in a recently introduced metagenomics pipeline by <xref ref-type="bibr" rid="ref-50">Nayfach et al. (2016)</xref>, as both efforts offer solutions to expand conventional analyses of pangenomes by not only estimating the abundance and distribution of gene clusters in the environment, but also linking them to the distribution patterns of microbial populations. In addition to this shared goal, our approach provides a flexible starting point with project-specific genomic databases (rather than pre-computed references), and includes a comprehensive visualization strategy to summarize metapangenomes.</p>
         <p>The <italic>Prochlorococcus</italic> metapangenome revealed subtle distribution gradients among isolates that belonged to the same phylogenetic clade, and exposed differentially occurring gene clusters that could be related to genomic traits affecting the fitness among closely related members. It also revealed gene clusters that occurred in every isolate genome and in most single-cell genomes but were largely missing in the environment, exposing a core genome connecting hypervariable genomic islands of distinct <italic>Prochlorococcus</italic> phylogenetic clades. Interestingly, these gene clusters were biased towards sugar utilization. Variable genomic islands of <italic>Prochlorococcus</italic> among co-occurring cells (<xref ref-type="bibr" rid="ref-14">Coleman et al., 2006</xref>) have previously been linked to the resistance of viral infections (<xref ref-type="bibr" rid="ref-5">Avrani et al., 2011</xref>). Our findings here suggest that high sequence diversification among genes involved in sugar metabolism may be beneficial for <italic>Prochlorococcus</italic> populations, which should be further addressed. In addition, gene clusters revealed that at least some of the genes in <italic>Prochlorococcus</italic> genomic islands represent common functions with high rate of intra-population diversity at the DNA-level, rather than recent horizontal transfers from other lineages. These observations contribute to the ongoing debate on the origin, evolution and ecological role of hypervariable genomic islands within microbial populations (<xref ref-type="bibr" rid="ref-27">Hacker &amp; Carniel, 2001</xref>; <xref ref-type="bibr" rid="ref-14">Coleman et al., 2006</xref>; <xref ref-type="bibr" rid="ref-76">Wilhelm et al., 2007</xref>; <xref ref-type="bibr" rid="ref-35">Juhas et al., 2009</xref>; <xref ref-type="bibr" rid="ref-22">Fernández-Gómez et al., 2012</xref>; <xref ref-type="bibr" rid="ref-74">Vineis et al., 2016</xref>). In addition to these novel insights, the parallels in our findings and the extensive literature on <italic>Prochlorococcus</italic> emphasizes the potential of metapangenomics to facilitate the recovery of key insights from novel and less studied microbial populations, including those with no cultured representatives.</p>
         <p>The vast majority of isolate and single-amplified genomes contain only a subset of the complete set of genes microbial populations maintain within their niche boundaries (<xref ref-type="bibr" rid="ref-52">Parkhill et al., 2000</xref>; <xref ref-type="bibr" rid="ref-14">Coleman et al., 2006</xref>; <xref ref-type="bibr" rid="ref-35">Juhas et al., 2009</xref>; <xref ref-type="bibr" rid="ref-13">Coleman &amp; Chisholm, 2010</xref>). Metagenomic data make it possible to classify genes in genomes based on their occurrence in the environment. However, metagenomic short read recruitment alone does not provide access to genes that are lacking in available genomes, even if they may be critical for the functioning of the populations they originate. Characterizing all accessory genes of a given population in the environment is challenging due to the limited coverage of the environmental metagenomes and genomic databases. These limitations require careful interpretations of the observations that emerge from the metapangenomic workflow and awareness that complete understanding of the accessory genes of the environment may require additional efforts (<xref ref-type="bibr" rid="ref-36">Kashtan et al., 2014</xref>).</p>
      </sec>
      <sec>
         <title>Conclusion</title>
         <p>Here we developed novel software solutions and analytical tools within the open-source software platform anvi’o to create and study metapangenomes with interactive visualization and inspection capabilities. Our analysis of the <italic>Prochlorococcus</italic> metapangenome revealed a small number of gene clusters that may be linked to subtle fitness trends among very closely related members of this group, and displayed inter-connectivity of hypervariable genomic islands across multiple clades. Our findings suggest that metapangenomes can provide highly resolved linkage between core and accessory genes of microbial populations and the environment, for any taxon and biome for which genomic and metagenomic data are available, and can provide experimental targets to explore the functional basis of niche partitioning and fitness. Besides isolate and single-cell genomes, this strategy can also employ metagenome-assembled genomes, and be used to study questions in the context of biotechnology or medicine.</p>
      </sec>
      <sec sec-type="supplementary-material" id="supplemental-information">
         <title> Supplemental Information</title>
         <supplementary-material id="supp-1" mimetype="image" mime-subtype="png" xlink:href="https://peerj.com/articles/4320/Figure-S1.png">
            <object-id pub-id-type="doi">10.7717/peerj.4320/supp-1</object-id><label>Figure S1</label><caption>
               <title>The distribution of isolates from HL-I and HL-II in TARA Oceans metagenomes</title>
               <p>World maps describe the cumulative relative distribution of <italic>Prochlorococcus</italic> isolates from the clades HL-I (3 genomes) and HL-II (17 genomes) across 61 surface metagenomes. The size and color of dots varies as a function of relative distributions and temperature range (&lt;22 °C versus &gt;22 °C), respectively.</p>
            </caption>
         </supplementary-material>
         <supplementary-material id="supp-2" mimetype="image" mime-subtype="png" xlink:href="https://peerj.com/articles/4320/Figure_S2.png">
            <object-id pub-id-type="doi">10.7717/peerj.4320/supp-2</object-id><label>Figure S2</label><caption>
               <title>The pangenome of <italic>Prochlorococcus</italic></title>
               <p>Each one of the 7,385 gene clusters contains one or more genes contributed by one or more isolate genomes. Bars in the 31 horizontal layers indicate the occurrence of gene clusters in a given isolate genome. Gene clusters are organized based on their distribution across genomes (i.e., gene clusters that co-occur in the same group of isolates are closer to each other), and genomes are organized based on gene clusters they share using Euclidian distance and ward ordination. The “HL + LL Core” selection corresponds to the clusters that contained genes from all genomes. The “LL Core” and “HL Core” selections correspond to gene clusters that contained genes characteristic to the LL- and HL-adapted genomes, respectively. The last selection (“Singletons”) corresponds to clusters that contained one or multiple genes from a single genome.</p>
            </caption>
         </supplementary-material>
         <supplementary-material id="supp-3" mimetype="image" mime-subtype="png" xlink:href="https://peerj.com/articles/4320/Figure_S3.png">
            <object-id pub-id-type="doi">10.7717/peerj.4320/supp-3</object-id><label>Figure S3</label><caption>
               <title>Prevalence of sugar utilization in <italic>Prochlorococcus</italic> hypervariable genomic islands</title>
               <p>The figure displays the coordinates of genes corresponding to the 25 most environmental accessory functions across isolates genomes <italic>Prochlorococcus</italic> (red in the outer layers). Inner layers correspond to the 93 TARA Oceans metagenomes, organized by geographic regions similarly to <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. For each metagenome, black sections correspond to well covered genes while white sections correspond to genes with no read recruitment. Genomes are organized based on gene clusters similarly to <xref ref-type="fig" rid="fig-2">Fig. 3</xref>.</p>
            </caption>
         </supplementary-material>
         <supplementary-material id="supp-4" mimetype="application" mime-subtype="vnd.openxmlformats-officedocument.spreadsheetml.sheet" xlink:href="https://peerj.com/articles/4320/Table_S1.xlsx">
            <object-id pub-id-type="doi">10.7717/peerj.4320/supp-4</object-id><label>Table S1</label><caption>
               <title>Summary of genomes and metagenomes</title>
               <p>Summary of 31 <italic>Prochlorococcus</italic> genomes and 93 metagenomes from the TARA Oceans project.</p>
            </caption>
         </supplementary-material>
         <supplementary-material id="supp-5" mimetype="application" mime-subtype="vnd.openxmlformats-officedocument.spreadsheetml.sheet" xlink:href="https://peerj.com/articles/4320/Table_S2.xlsx">
            <object-id pub-id-type="doi">10.7717/peerj.4320/supp-5</object-id><label>Table S2</label><caption>
               <title>Genomic detection in the environment</title>
               <p>Reads recruitments, detection and relative distribution of 31 <italic>Prochlorococcus</italic> genomes in 93 metagenomes from the TARA Oceans project.</p>
            </caption>
         </supplementary-material>
         <supplementary-material id="supp-6" mimetype="application" mime-subtype="vnd.openxmlformats-officedocument.spreadsheetml.sheet" xlink:href="https://peerj.com/articles/4320/Table_S3.xlsx">
            <object-id pub-id-type="doi">10.7717/peerj.4320/supp-6</object-id><label>Table S3</label><caption>
               <title>Metapangenomics summary</title>
               <p>Summary of the metapangenomic analysis of <italic>Prochlorococcus</italic> isolates. The table describes the functionality and environment connectivity of genes identified in the 31 <italic>Prochlorococcus</italic> isolate genomes, and links each gene to a gene cluster in the <italic>Prochlorococcus</italic> pangenome.</p>
            </caption>
         </supplementary-material>
         <supplementary-material id="supp-7" mimetype="application" mime-subtype="vnd.openxmlformats-officedocument.spreadsheetml.sheet" xlink:href="https://peerj.com/articles/4320/Table_S4.xlsx">
            <object-id pub-id-type="doi">10.7717/peerj.4320/supp-7</object-id><label>Table S4</label><caption>
               <title>Metapangenomics summary when including SAGs</title>
               <p>Summary of the pangenomic and metapangenomic analyses of <italic>Prochlorococcus</italic> isolates and SAGs. The table describes the pangenomic analysis of 31 <italic>Prochlorococcus</italic> isolate genomes and 74 <italic>Prochlorococcus</italic> single cell genomes (SAGs). The table also describes the metapangenome of five SAGs, which includes the functionality and environment connectivity of genes and links each gene to a gene cluster in the corresponding <italic>Prochlorococcus</italic> pangenome.</p>
            </caption>
         </supplementary-material>
         <supplementary-material id="supp-8" mimetype="application" mime-subtype="vnd.openxmlformats-officedocument.spreadsheetml.sheet" xlink:href="https://peerj.com/articles/4320/Table_S5.xlsx">
            <object-id pub-id-type="doi">10.7717/peerj.4320/supp-8</object-id><label>Table S5</label><caption>
               <title>Environmental connectivity of functions</title>
               <p>Summary of the environmental connectivity of functions identified in the 31 <italic>Prochlorococcus</italic> isolates. The table also links the environmental connectivity of functions to the different clades of <italic>Prochlorococcus</italic>.</p>
            </caption>
         </supplementary-material>
      </sec>
   </body>
   <back>
      <ack>
         <p>We thank Bana Jabri, Sean Crosson, Ryan J. Newton, Maureen L. Coleman, Bas Dutilh, Loïs Maignien, Julie Reveillaud, Michael D. Lee, and the members of the Meren Lab for helpful discussions. We are also grateful to our anonymous reviewers for scrutinizing our work, Özcan C. Esen for his technical insights and help, and Hilary G. Morrison for her guidance to improve our manuscript. Finally, we are indebted to the scientists who made this study possible by generating the genomes and metagenomes, and making them publicly available.</p>
      </ack>
      <sec sec-type="additional-information">
         <title>Additional Information and Declarations</title>
         <fn-group content-type="competing-interests">
            <title>Competing Interests</title><fn id="conflict-1" fn-type="conflict"><p>A. Murat Eren is an Academic Editor for PeerJ.</p></fn></fn-group>
         <fn-group content-type="author-contributions">
            <title>Author Contributions</title><fn id="contribution-1" fn-type="con"><p><xref ref-type="contrib" rid="author-1">Tom O. Delmont</xref> and <xref ref-type="contrib" rid="author-2">A. Murat Eren</xref> conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.</p></fn></fn-group>
         <fn-group content-type="other">
            <title>Data Availability</title><fn id="addinfo-1"><p>The following information was supplied regarding data availability:</p>
            <p>The TARA Oceans metagenomes are publicly available through the European Bioinformatics Institute (accession IDs ERP001736) at <ext-link ext-link-type="uri" xlink:href="https://www.ebi.ac.uk/metagenomics/projects/ERP001736">https://www.ebi.ac.uk/metagenomics/projects/ERP001736</ext-link>.</p>
            <p>We also made available:</p>
            <p>(1) <italic>Prochlorococcus</italic> isolate genomes and SAGs <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.5447221.v1">https://doi.org/10.6084/m9.figshare.5447221.v1</ext-link>;</p>
            <p>(2) the anvi’o database files and the static HTML summary output for <italic>Prochlorococcus</italic> isolate genomes across TARA Oceans metagenomes</p>
            <p> 
               <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.5447224">https://doi.org/10.6084/m9.figshare.5447224</ext-link>;</p>
            <p>(3) the metapangenome of <italic>Prochlorococcus</italic> isolates <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.5447227">https://doi.org/10.6084/m9.figshare.5447227</ext-link>; an extended pangenome of <italic>Prochlorococcus</italic> isolates and SAGs <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.5447230">https://doi.org/10.6084/m9.figshare.5447230</ext-link>;</p>
            <p>(4) and the metapangenome of <italic>Prochlorococcus</italic> SAGs <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.5447233">https://doi.org/10.6084/m9.figshare.5447233</ext-link>.</p>
            <p>The URL <ext-link ext-link-type="uri" xlink:href="https://anvi-server.org/merenlab/prochlorococcus_metapangenome"> https://anvi-server.org/merenlab/prochlorococcus_metapangenome</ext-link> serves an interactive version of the metapangenome of <italic>Prochlorococcus</italic> isolates.</p></fn></fn-group>
      </sec>
      <ref-list content-type="authoryear">
         <title>References</title>
         <ref id="ref-1"><label>Al-Amoudi et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Al-Amoudi</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Razali</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Essack</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Amini</surname>
                     <given-names>MS</given-names>
                  </name>
                  <name>
                     <surname>Bougouffa</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Archer</surname>
                     <given-names>JAC</given-names>
                  </name>
                  <name>
                     <surname>Lafi</surname>
                     <given-names>FF</given-names>
                  </name>
                  <name>
                     <surname>Bajic</surname>
                     <given-names>VB</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Metagenomics as a preliminary screen for antimicrobial bioprospecting</article-title>
               <source>Gene</source>
               <volume>594</volume>
               <fpage>248</fpage>
               <lpage>258</lpage>
               <pub-id pub-id-type="doi">10.1016/j.gene.2016.09.021</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-2"><label>Alm et al. (1999)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Alm</surname>
                     <given-names>RA</given-names>
                  </name>
                  <name>
                     <surname>Ling</surname>
                     <given-names>LS</given-names>
                  </name>
                  <name>
                     <surname>Moir</surname>
                     <given-names>DT</given-names>
                  </name>
                  <name>
                     <surname>King</surname>
                     <given-names>BL</given-names>
                  </name>
                  <name>
                     <surname>Brown</surname>
                     <given-names>ED</given-names>
                  </name>
                  <name>
                     <surname>Doig</surname>
                     <given-names>PC</given-names>
                  </name>
                  <name>
                     <surname>Smith</surname>
                     <given-names>DR</given-names>
                  </name>
                  <name>
                     <surname>Noonan</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Guild</surname>
                     <given-names>BC</given-names>
                  </name>
                  <name>
                     <surname>DeJonge</surname>
                     <given-names>BL</given-names>
                  </name>
                  <name>
                     <surname>Carmel</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Tummino</surname>
                     <given-names>PJ</given-names>
                  </name>
                  <name>
                     <surname>Caruso</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Uria-Nickelsen</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Mills</surname>
                     <given-names>DM</given-names>
                  </name>
                  <name>
                     <surname>Ives</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Gibson</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Merberg</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Mills</surname>
                     <given-names>SD</given-names>
                  </name>
                  <name>
                     <surname>Jiang</surname>
                     <given-names>Q</given-names>
                  </name>
                  <name>
                     <surname>Taylor</surname>
                     <given-names>DE</given-names>
                  </name>
                  <name>
                     <surname>Vovis</surname>
                     <given-names>GF</given-names>
                  </name>
                  <name>
                     <surname>Trust</surname>
                     <given-names>TJ</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="1999">1999</year>
               <article-title>Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori</article-title>
               <source>Nature</source>
               <volume>397</volume>
               <fpage>176</fpage>
               <lpage>180</lpage>
               <pub-id pub-id-type="doi">10.1038/16495</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-3"><label>Altschul et al. (1990)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Altschul</surname>
                     <given-names>SF</given-names>
                  </name>
                  <name>
                     <surname>Gish</surname>
                     <given-names>W</given-names>
                  </name>
                  <name>
                     <surname>Miller</surname>
                     <given-names>W</given-names>
                  </name>
                  <name>
                     <surname>Myers</surname>
                     <given-names>EW</given-names>
                  </name>
                  <name>
                     <surname>Lipman</surname>
                     <given-names>DJ</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="1990">1990</year>
               <article-title>Basic local alignment search tool</article-title>
               <source>Journal of Molecular Biology</source>
               <volume>215</volume>
               <fpage>403</fpage>
               <lpage>410</lpage>
               <pub-id pub-id-type="doi">10.1016/S0022-2836(05)80360-2</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-4"><label>Anderson et al. (2017)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Anderson</surname>
                     <given-names>RE</given-names>
                  </name>
                  <name>
                     <surname>Reveillaud</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Reddington</surname>
                     <given-names>E</given-names>
                  </name>
                  <name>
                     <surname>Delmont</surname>
                     <given-names>TO</given-names>
                  </name>
                  <name>
                     <surname>Eren</surname>
                     <given-names>AM</given-names>
                  </name>
                  <name>
                     <surname>McDermott</surname>
                     <given-names>JM</given-names>
                  </name>
                  <name>
                     <surname>Seewald</surname>
                     <given-names>JS</given-names>
                  </name>
                  <name>
                     <surname>Huber</surname>
                     <given-names>JA</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2017">2017</year>
               <article-title>Genomic variation in microbial populations inhabiting the marine subseafloor at deep-sea hydrothermal vents</article-title>
               <source>Nature Communications</source>
               <volume>8</volume>
               <fpage>1114</fpage>
               <pub-id pub-id-type="doi">10.1038/s41467-017-01228-6</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-5"><label>Avrani et al. (2011)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Avrani</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Wurtzel</surname>
                     <given-names>O</given-names>
                  </name>
                  <name>
                     <surname>Sharon</surname>
                     <given-names>I</given-names>
                  </name>
                  <name>
                     <surname>Sorek</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Lindell</surname>
                     <given-names>D</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2011">2011</year>
               <article-title>Genomic island variability facilitates <italic>Prochlorococcus</italic>-virus coexistence</article-title>
               <source>Nature</source>
               <volume>474</volume>
               <fpage>604</fpage>
               <lpage>608</lpage>
               <pub-id pub-id-type="doi">10.1038/nature10172</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-6"><label>Bateman et al. (2004)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Bateman</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Coin</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Durbin</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Finn</surname>
                     <given-names>RD</given-names>
                  </name>
                  <name>
                     <surname>Hollich</surname>
                     <given-names>V</given-names>
                  </name>
                  <name>
                     <surname>Griffiths-Jones</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Khanna</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Marshall</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Moxon</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Sonnhammer</surname>
                     <given-names>ELL</given-names>
                  </name>
                  <name>
                     <surname>Studholme</surname>
                     <given-names>DJ</given-names>
                  </name>
                  <name>
                     <surname>Yeats</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Eddy</surname>
                     <given-names>SR</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2004">2004</year>
               <article-title>@Pfam@The Pfam protein families database</article-title>
               <source>Nucleic Acids Research</source>
               <volume>32</volume>
               <fpage>D138</fpage>
               <lpage>D141</lpage>
               <pub-id pub-id-type="doi">10.1093/nar/gkh121</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-7"><label>Bendall et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Bendall</surname>
                     <given-names>ML</given-names>
                  </name>
                  <name>
                     <surname>Stevens</surname>
                     <given-names>SLR</given-names>
                  </name>
                  <name>
                     <surname>Chan</surname>
                     <given-names>LK</given-names>
                  </name>
                  <name>
                     <surname>Malfatti</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Schwientek</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Tremblay</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Schackwitz</surname>
                     <given-names>W</given-names>
                  </name>
                  <name>
                     <surname>Martin</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Pati</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Bushnell</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Froula</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Kang</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Tringe</surname>
                     <given-names>SG</given-names>
                  </name>
                  <name>
                     <surname>Bertilsson</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Moran</surname>
                     <given-names>MA</given-names>
                  </name>
                  <name>
                     <surname>Shade</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Newton</surname>
                     <given-names>RJ</given-names>
                  </name>
                  <name>
                     <surname>McMahon</surname>
                     <given-names>KD</given-names>
                  </name>
                  <name>
                     <surname>Malmstrom</surname>
                     <given-names>RR</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations</article-title>
               <source>ISME Journal</source>
               <volume>10</volume>
               <fpage>1589</fpage>
               <lpage>1601</lpage>
               <pub-id pub-id-type="doi">10.1038/ismej.2015.241</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-8"><label>Benedict et al. (2014)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Benedict</surname>
                     <given-names>MN</given-names>
                  </name>
                  <name>
                     <surname>Henriksen</surname>
                     <given-names>JR</given-names>
                  </name>
                  <name>
                     <surname>Metcalf</surname>
                     <given-names>WW</given-names>
                  </name>
                  <name>
                     <surname>Whitaker</surname>
                     <given-names>RJ</given-names>
                  </name>
                  <name>
                     <surname>Price</surname>
                     <given-names>ND</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2014">2014</year>
               <article-title>ITEP: an integrated toolkit for exploration of microbial pan-genomes</article-title>
               <source>BMC Genomics</source>
               <volume>15</volume>
               <fpage>8</fpage>
               <pub-id pub-id-type="doi">10.1186/1471-2164-15-8</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-9"><label>Biller et al. (2014a)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Biller</surname>
                     <given-names>SJ</given-names>
                  </name>
                  <name>
                     <surname>Berube</surname>
                     <given-names>PM</given-names>
                  </name>
                  <name>
                     <surname>Berta-Thompson</surname>
                     <given-names>JW</given-names>
                  </name>
                  <name>
                     <surname>Kelly</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Roggensack</surname>
                     <given-names>SE</given-names>
                  </name>
                  <name>
                     <surname>Awad</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Roache-Johnson</surname>
                     <given-names>KH</given-names>
                  </name>
                  <name>
                     <surname>Ding</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Giovannoni</surname>
                     <given-names>SJ</given-names>
                  </name>
                  <name>
                     <surname>Rocap</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Moore</surname>
                     <given-names>LR</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2014">2014a</year>
               <article-title>Genomes of diverse isolates of the marine cyanobacterium <italic>Prochlorococcus</italic></article-title>
               <source>Scientific Data</source>
               <volume>1</volume>
               <fpage>140034</fpage>
               <pub-id pub-id-type="doi">10.1038/sdata.2014.34</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-10"><label>Biller et al. (2014b)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Biller</surname>
                     <given-names>SJ</given-names>
                  </name>
                  <name>
                     <surname>Berube</surname>
                     <given-names>PM</given-names>
                  </name>
                  <name>
                     <surname>Lindell</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2014">2014b</year>
               <article-title><italic>Prochlorococcus</italic>: the structure and function of collective diversity</article-title>
               <source>Nature Reviews Microbiology</source>
               <volume>13</volume>
               <fpage>13</fpage>
               <lpage>27</lpage>
               <pub-id pub-id-type="doi">10.1038/nrmicro3378</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-11"><label>Bork et al. (2015)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Bork</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Bowler</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Vargas</surname>
                     <given-names>C de</given-names>
                  </name>
                  <name>
                     <surname>Gorsky</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Karsenti</surname>
                     <given-names>E</given-names>
                  </name>
                  <name>
                     <surname>Wincker</surname>
                     <given-names>P</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2015">2015</year>
               <article-title>Tara Oceans studies plankton at planetary scale</article-title>
               <source>Science</source>
               <volume>348</volume>
               <fpage>873</fpage>
               <pub-id pub-id-type="doi">10.1126/science.aac5605</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-12"><label>Chisholm et al. (1988)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
                  <name>
                     <surname>Olson</surname>
                     <given-names>RJ</given-names>
                  </name>
                  <name>
                     <surname>Zettler</surname>
                     <given-names>ER</given-names>
                  </name>
                  <name>
                     <surname>Goericke</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Waterbury</surname>
                     <given-names>JB</given-names>
                  </name>
                  <name>
                     <surname>Welschmeyer</surname>
                     <given-names>NA</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="1988">1988</year>
               <article-title>A novel free-living prochlorophyte abundant in the oceanic euphotic zone</article-title>
               <source>Nature</source>
               <volume>334</volume>
               <fpage>340</fpage>
               <lpage>343</lpage>
               <pub-id pub-id-type="doi">10.1038/334340a0</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-13"><label>Coleman &amp; Chisholm (2010)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Coleman</surname>
                     <given-names>ML</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2010">2010</year>
               <article-title>Ecosystem-specific selection pressures revealed through comparative population genomics</article-title>
               <source>Proceedings of the National Academy of Sciences of the United States of America</source>
               <volume>107</volume>
               <fpage>18634</fpage>
               <lpage>18639</lpage>
               <pub-id pub-id-type="doi">10.1073/pnas.1009480107</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-14"><label>Coleman et al. (2006)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Coleman</surname>
                     <given-names>ML</given-names>
                  </name>
                  <name>
                     <surname>Sullivan</surname>
                     <given-names>MB</given-names>
                  </name>
                  <name>
                     <surname>Martiny</surname>
                     <given-names>AC</given-names>
                  </name>
                  <name>
                     <surname>Steglich</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Barry</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Delong</surname>
                     <given-names>EF</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2006">2006</year>
               <article-title>Genomic islands and the ecology and evolution of <italic>Prochlorococcus</italic></article-title>
               <source>Science</source>
               <volume>311</volume>
               <fpage>1768</fpage>
               <lpage>1770</lpage>
               <pub-id pub-id-type="doi">10.1126/science.1122050</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-15"><label>Darling et al. (2014)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Darling</surname>
                     <given-names>AE</given-names>
                  </name>
                  <name>
                     <surname>Jospin</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Lowe</surname>
                     <given-names>E</given-names>
                  </name>
                  <name>
                     <surname>Matsen</surname>
                     <given-names>FA</given-names>
                  </name>
                  <name>
                     <surname>Bik</surname>
                     <given-names>HM</given-names>
                  </name>
                  <name>
                     <surname>Eisen</surname>
                     <given-names>JA</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2014">2014</year>
               <article-title>PhyloSift: phylogenetic analysis of genomes and metagenomes</article-title>
               <source>PeerJ</source>
               <volume>2</volume>
               <fpage>e243</fpage>
               <pub-id pub-id-type="doi">10.7717/peerj.243</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-16"><label>Delmont &amp; Eren (2016)</label><element-citation publication-type="workingpaper">
               <person-group person-group-type="author">
                  <name>
                     <surname>Delmont</surname>
                     <given-names>TO</given-names>
                  </name>
                  <name>
                     <surname>Eren</surname>
                     <given-names>AM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Linking comparative genomics and environmental distribution patterns of microbial populations through metagenomics</article-title>
               <source>BioRxiv</source>
               <pub-id pub-id-type="doi">10.1101/058750</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-17"><label>Delmont et al. (2017)</label><element-citation publication-type="workingpaper">
               <person-group person-group-type="author">
                  <name>
                     <surname>Delmont</surname>
                     <given-names>TO</given-names>
                  </name>
                  <name>
                     <surname>Quince</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Shaiber</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Esen</surname>
                     <given-names>OC</given-names>
                  </name>
                  <name>
                     <surname>Lee</surname>
                     <given-names>STM</given-names>
                  </name>
                  <name>
                     <surname>Lucker</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Eren</surname>
                     <given-names>AM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2017">2017</year>
               <article-title>Nitrogen-fixing populations of planctomycetes and proteobacteria are abundant in the surface ocean</article-title>
               <source>BioRxiv</source>
               <pub-id pub-id-type="doi">10.1101/129791</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-18"><label>Dutilh et al. (2014)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Dutilh</surname>
                     <given-names>BE</given-names>
                  </name>
                  <name>
                     <surname>Cassman</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>McNair</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Sanchez</surname>
                     <given-names>SE</given-names>
                  </name>
                  <name>
                     <surname>Silva</surname>
                     <given-names>GGZ</given-names>
                  </name>
                  <name>
                     <surname>Boling</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Barr</surname>
                     <given-names>JJ</given-names>
                  </name>
                  <name>
                     <surname>Speth</surname>
                     <given-names>DR</given-names>
                  </name>
                  <name>
                     <surname>Seguritan</surname>
                     <given-names>V</given-names>
                  </name>
                  <name>
                     <surname>Aziz</surname>
                     <given-names>RK</given-names>
                  </name>
                  <name>
                     <surname>Felts</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Dinsdale</surname>
                     <given-names>EA</given-names>
                  </name>
                  <name>
                     <surname>Mokili</surname>
                     <given-names>JL</given-names>
                  </name>
                  <name>
                     <surname>Edwards</surname>
                     <given-names>RA</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2014">2014</year>
               <article-title>A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes</article-title>
               <source>Nature Communications</source>
               <volume>5</volume>
               <fpage>4498</fpage>
               <pub-id pub-id-type="doi">10.1038/ncomms5498</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-19"><label>Dutilh et al. (2004)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Dutilh</surname>
                     <given-names>BE</given-names>
                  </name>
                  <name>
                     <surname>Huynen</surname>
                     <given-names>MA</given-names>
                  </name>
                  <name>
                     <surname>Bruno</surname>
                     <given-names>WJ</given-names>
                  </name>
                  <name>
                     <surname>Snel</surname>
                     <given-names>B</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2004">2004</year>
               <article-title>The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise</article-title>
               <source>Journal of Molecular Evolution</source>
               <volume>58</volume>
               <fpage>527</fpage>
               <lpage>539</lpage>
               <pub-id pub-id-type="doi">10.1007/s00239-003-2575-6</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-20"><label>Eren et al. (2015)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Eren</surname>
                     <given-names>AM</given-names>
                  </name>
                  <name>
                     <surname>Esen</surname>
                     <given-names>ÖC</given-names>
                  </name>
                  <name>
                     <surname>Quince</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Vineis</surname>
                     <given-names>JH</given-names>
                  </name>
                  <name>
                     <surname>Morrison</surname>
                     <given-names>HG</given-names>
                  </name>
                  <name>
                     <surname>Sogin</surname>
                     <given-names>ML</given-names>
                  </name>
                  <name>
                     <surname>Delmont</surname>
                     <given-names>TO</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2015">2015</year>
               <article-title>Anvi’o: an advanced analysis and visualization platform for ’omics data</article-title>
               <source>PeerJ</source>
               <volume>3</volume>
               <fpage>e1319</fpage>
               <pub-id pub-id-type="doi">10.7717/peerj.1319</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-21"><label>Eren et al. (2013)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Eren</surname>
                     <given-names>AM</given-names>
                  </name>
                  <name>
                     <surname>Vineis</surname>
                     <given-names>JH</given-names>
                  </name>
                  <name>
                     <surname>Morrison</surname>
                     <given-names>HG</given-names>
                  </name>
                  <name>
                     <surname>Sogin</surname>
                     <given-names>ML</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2013">2013</year>
               <article-title>A filtering method to generate high quality short reads using illumina paired-end technology</article-title>
               <source>PLOS ONE</source>
               <volume>8</volume>
               <fpage>e66643</fpage>
               <pub-id pub-id-type="doi">10.1371/journal.pone.0066643</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-22"><label>Fernández-Gómez et al. (2012)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Fernández-Gómez</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Fernàndez-Guerra</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Casamayor</surname>
                     <given-names>EO</given-names>
                  </name>
                  <name>
                     <surname>González</surname>
                     <given-names>JM</given-names>
                  </name>
                  <name>
                     <surname>Pedrós-Alió</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Acinas</surname>
                     <given-names>SG</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2012">2012</year>
               <article-title>Patterns and architecture of genomic islands in marine bacteria</article-title>
               <source>BMC Genomics</source>
               <volume>13</volume>
               <fpage>347</fpage>
               <pub-id pub-id-type="doi">10.1186/1471-2164-13-347</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-23"><label>Fernández-Gómez et al. (2013)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Fernández-Gómez</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Richter</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Schüler</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Pinhassi</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Acinas</surname>
                     <given-names>SG</given-names>
                  </name>
                  <name>
                     <surname>González</surname>
                     <given-names>JM</given-names>
                  </name>
                  <name>
                     <surname>Pedrós-Alió</surname>
                     <given-names>C</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2013">2013</year>
               <article-title>Ecology of marine Bacteroidetes: a comparative genomics approach</article-title>
               <source>The ISME Journal</source>
               <volume>7</volume>
               <fpage>1026</fpage>
               <lpage>1037</lpage>
               <pub-id pub-id-type="doi">10.1038/ismej.2012.169</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-24"><label>Flombaum et al. (2013)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Flombaum</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Gallegos</surname>
                     <given-names>JL</given-names>
                  </name>
                  <name>
                     <surname>Gordillo</surname>
                     <given-names>RA</given-names>
                  </name>
                  <name>
                     <surname>Rincon</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Zabala</surname>
                     <given-names>LL</given-names>
                  </name>
                  <name>
                     <surname>Jiao</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Karl</surname>
                     <given-names>DM</given-names>
                  </name>
                  <name>
                     <surname>Li</surname>
                     <given-names>WKW</given-names>
                  </name>
                  <name>
                     <surname>Lomas</surname>
                     <given-names>MW</given-names>
                  </name>
                  <name>
                     <surname>Veneziano</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Vera</surname>
                     <given-names>CS</given-names>
                  </name>
                  <name>
                     <surname>Vrugt</surname>
                     <given-names>JA</given-names>
                  </name>
                  <name>
                     <surname>Martiny</surname>
                     <given-names>AC</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2013">2013</year>
               <article-title>Present and future global distributions of the marine Cyanobacteria <italic>Prochlorococcus</italic> and Synechococcus</article-title>
               <source>Proceedings of the National Academy of Sciences of the United States of America</source>
               <volume>110</volume>
               <fpage>9824</fpage>
               <lpage>9829</lpage>
               <pub-id pub-id-type="doi">10.1073/pnas.1307701110</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-25"><label>Ginestet (2011)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Ginestet</surname>
                     <given-names>C</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2011">2011</year>
               <article-title>ggplot2: elegant graphics for data analysis</article-title>
               <source>Journal of the Royal Statistical Society: Series A (Statistics in Society)</source>
               <volume>174</volume>
               <fpage>245</fpage>
               <lpage>246</lpage>
               <pub-id pub-id-type="doi">10.1111/j.1467-985X.2010.00676_9.x</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-26"><label>Gomez-Baena et al. (2008)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Gomez-Baena</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Lopez-Lozano</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Gil-Martinez</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Lucena</surname>
                     <given-names>JM</given-names>
                  </name>
                  <name>
                     <surname>Diez</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Candau</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Garcia-Fernandez</surname>
                     <given-names>JM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2008">2008</year>
               <article-title>Glucose uptake and its effect on gene expression in <italic>Prochlorococcus</italic></article-title>
               <source>PLOS ONE</source>
               <volume>3</volume>
               <elocation-id>e3416</elocation-id>
               <pub-id pub-id-type="doi">10.1371/journal.pone.0003416</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-27"><label>Hacker &amp; Carniel (2001)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Hacker</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Carniel</surname>
                     <given-names>E</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2001">2001</year>
               <article-title>Ecological fitness, genomic islands and bacterial pathogenicity: a Darwinian view of the evolution of microbes</article-title>
               <source>EMBO Reports</source>
               <volume>2</volume>
               <fpage>376</fpage>
               <lpage>381</lpage>
               <pub-id pub-id-type="doi">10.1093/embo-reports/kve097</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-28"><label>Handelsman et al. (1998)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Handelsman</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Rondon</surname>
                     <given-names>MR</given-names>
                  </name>
                  <name>
                     <surname>Brady</surname>
                     <given-names>SF</given-names>
                  </name>
                  <name>
                     <surname>Clardy</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Goodman</surname>
                     <given-names>RM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="1998">1998</year>
               <article-title>Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products</article-title>
               <source>Chemistry {&amp;} Biology</source>
               <volume>5</volume>
               <fpage>R245</fpage>
               <lpage>R249</lpage>
               <pub-id pub-id-type="doi">10.1016/S1074-5521(98)90108-9</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-29"><label>Haroon et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Haroon</surname>
                     <given-names>MF</given-names>
                  </name>
                  <name>
                     <surname>Thompson</surname>
                     <given-names>LR</given-names>
                  </name>
                  <name>
                     <surname>Parks</surname>
                     <given-names>DH</given-names>
                  </name>
                  <name>
                     <surname>Hugenholtz</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Sting</surname>
                     <given-names>U</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>A catalogue of 136 microbial draft genomes from Red Sea metagenomes</article-title>
               <source>Scientific Data</source>
               <volume>3</volume>
               <fpage>160050</fpage>
               <pub-id pub-id-type="doi">10.1038/sdata.2016.50</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-30"><label>Huerta-Cepas et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Huerta-Cepas</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Forslund</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Szklarczyk</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Jensen</surname>
                     <given-names>LJ</given-names>
                  </name>
                  <name>
                     <surname>Von Mering</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Bork</surname>
                     <given-names>P</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper</article-title>
               <source>BioRxiv</source>
               <pub-id pub-id-type="doi">10.1101/076331</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-31"><label>Hyatt et al. (2010)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Hyatt</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Chen</surname>
                     <given-names>G-L</given-names>
                  </name>
                  <name>
                     <surname>Locascio</surname>
                     <given-names>PF</given-names>
                  </name>
                  <name>
                     <surname>Land</surname>
                     <given-names>ML</given-names>
                  </name>
                  <name>
                     <surname>Larimer</surname>
                     <given-names>FW</given-names>
                  </name>
                  <name>
                     <surname>Hauser</surname>
                     <given-names>LJ</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2010">2010</year>
               <article-title>Prodigal: prokaryotic gene recognition and translation initiation site identification</article-title>
               <source>BMC Bioinformatics</source>
               <volume>11</volume>
               <fpage>119</fpage>
               <pub-id pub-id-type="doi">10.1186/1471-2105-11-119</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-32"><label>Jeffrey et al. (1996)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Jeffrey</surname>
                     <given-names>WH</given-names>
                  </name>
                  <name>
                     <surname>Pledger</surname>
                     <given-names>RJ</given-names>
                  </name>
                  <name>
                     <surname>Aas</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Hager</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Coffin</surname>
                     <given-names>RB</given-names>
                  </name>
                  <name>
                     <surname>Von Haven</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Mitchell</surname>
                     <given-names>DL</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="1996">1996</year>
               <article-title>Diel and depth profiles of DNA photodamage in bacterioplankton exposed to ambient solar ultraviolet radiation</article-title>
               <source>Marine Ecology Progress Series</source>
               <volume>137</volume>
               <fpage>283</fpage>
               <lpage>291</lpage>
               <pub-id pub-id-type="doi">10.3354/meps137283</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-33"><label>Jensen et al. (2008)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Jensen</surname>
                     <given-names>LJ</given-names>
                  </name>
                  <name>
                     <surname>Julien</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Kuhn</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Von Mering</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Muller</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Doerks</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Bork</surname>
                     <given-names>P</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2008">2008</year>
               <article-title>eggNOG: automated construction and annotation of orthologous groups of genes</article-title>
               <source>Nucleic Acids Research</source>
               <volume>36</volume>
               <fpage>D250</fpage>
               <lpage>D254</lpage>
               <pub-id pub-id-type="doi">10.1093/nar/gkm796</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-34"><label>Johnson et al. (2006)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Johnson</surname>
                     <given-names>ZI</given-names>
                  </name>
                  <name>
                     <surname>Zinser</surname>
                     <given-names>ER</given-names>
                  </name>
                  <name>
                     <surname>Coe</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Mcnulty</surname>
                     <given-names>NP</given-names>
                  </name>
                  <name>
                     <surname>Malcolm</surname>
                     <given-names>ES</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
                  <name>
                     <surname>Woodward</surname>
                     <given-names>EMS</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2006">2006</year>
               <article-title>Partitioning among <italic>Prochlorococcus</italic> ecotypes along environmental gradients</article-title>
               <source>Science</source>
               <volume>311</volume>
               <fpage>1737</fpage>
               <lpage>1740</lpage>
               <pub-id pub-id-type="doi">10.1126/science.1118052</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-35"><label>Juhas et al. (2009)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Juhas</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Van Der Meer</surname>
                     <given-names>JR</given-names>
                  </name>
                  <name>
                     <surname>Gaillard</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Harding</surname>
                     <given-names>RM</given-names>
                  </name>
                  <name>
                     <surname>Hood</surname>
                     <given-names>DW</given-names>
                  </name>
                  <name>
                     <surname>Crook</surname>
                     <given-names>DW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2009">2009</year>
               <article-title>Genomic islands: tools of bacterial horizontal gene transfer and evolution</article-title>
               <source>FEMS Microbiology Reviews</source>
               <volume>33</volume>
               <fpage>376</fpage>
               <lpage>393</lpage>
               <pub-id pub-id-type="doi">10.1111/j.1574-6976.2008.00136.x</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-36"><label>Kashtan et al. (2014)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Kashtan</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Roggensack</surname>
                     <given-names>SE</given-names>
                  </name>
                  <name>
                     <surname>Rodrigue</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Thompson</surname>
                     <given-names>JW</given-names>
                  </name>
                  <name>
                     <surname>Biller</surname>
                     <given-names>SJ</given-names>
                  </name>
                  <name>
                     <surname>Coe</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Ding</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Marttinen</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Malmstrom</surname>
                     <given-names>RR</given-names>
                  </name>
                  <name>
                     <surname>Stocker</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Follows</surname>
                     <given-names>MJ</given-names>
                  </name>
                  <name>
                     <surname>Stepanauskas</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2014">2014</year>
               <article-title>Single-cell genomics reveals hundreds of coexisting subpopulations in wild <italic>Prochlorococcus</italic></article-title>
               <source>Science</source>
               <volume>344</volume>
               <fpage>416</fpage>
               <lpage>420</lpage>
               <pub-id pub-id-type="doi">10.1126/science.1248575</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-37"><label>Kent et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Kent</surname>
                     <given-names>AG</given-names>
                  </name>
                  <name>
                     <surname>Dupont</surname>
                     <given-names>CL</given-names>
                  </name>
                  <name>
                     <surname>Yooseph</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Martiny</surname>
                     <given-names>AC</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Global biogeography of <italic>Prochlorococcus</italic> genome diversity in the surface ocean</article-title>
               <source>The ISME Journal</source>
               <volume>10</volume>
               <fpage>1856</fpage>
               <lpage>1865</lpage>
               <pub-id pub-id-type="doi">10.1038/ismej.2015.265</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-38"><label>Kettler et al. (2007)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Kettler</surname>
                     <given-names>GC</given-names>
                  </name>
                  <name>
                     <surname>Martiny</surname>
                     <given-names>AC</given-names>
                  </name>
                  <name>
                     <surname>Huang</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Zucker</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Coleman</surname>
                     <given-names>ML</given-names>
                  </name>
                  <name>
                     <surname>Rodrigue</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Chen</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Lapidus</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Ferriera</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Johnson</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Steglich</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Church</surname>
                     <given-names>GM</given-names>
                  </name>
                  <name>
                     <surname>Richardson</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2007">2007</year>
               <article-title>Patterns and implications of gene gain and loss in the evolution of <italic>Prochlorococcus</italic></article-title>
               <source>PLOS Genetics</source>
               <volume>3</volume>
               <fpage>2515</fpage>
               <lpage>2528</lpage>
               <pub-id pub-id-type="doi">10.1371/journal.pgen.0030231</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-39"><label>Kumar et al. (2011)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Kumar</surname>
                     <given-names>V</given-names>
                  </name>
                  <name>
                     <surname>Sun</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Vamathevan</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Li</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Ingraham</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Palmer</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Huang</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Brown</surname>
                     <given-names>JR</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2011">2011</year>
               <article-title>Comparative genomics of Klebsiella pneumoniae strains with different antibiotic resistance profiles</article-title>
               <source>Antimicrobial Agents and Chemotherapy</source>
               <volume>55</volume>
               <fpage>4267</fpage>
               <lpage>4276</lpage>
               <pub-id pub-id-type="doi">10.1128/AAC.00052-11</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-40"><label>Langmead &amp; Salzberg (2012)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Langmead</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Salzberg</surname>
                     <given-names>SL</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2012">2012</year>
               <article-title>Fast gapped-read alignment with Bowtie 2</article-title>
               <source>Nature Methods</source>
               <volume>9</volume>
               <fpage>357</fpage>
               <lpage>359</lpage>
               <pub-id pub-id-type="doi">10.1038/nmeth.1923</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-41"><label>Larkin et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Larkin</surname>
                     <given-names>AA</given-names>
                  </name>
                  <name>
                     <surname>Blinebry</surname>
                     <given-names>SK</given-names>
                  </name>
                  <name>
                     <surname>Howes</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Lin</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Loftus</surname>
                     <given-names>SE</given-names>
                  </name>
                  <name>
                     <surname>Schmaus</surname>
                     <given-names>CA</given-names>
                  </name>
                  <name>
                     <surname>Zinser</surname>
                     <given-names>ER</given-names>
                  </name>
                  <name>
                     <surname>Johnson</surname>
                     <given-names>ZI</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Niche partitioning and biogeography of high light adapted <italic>Prochlorococcus</italic> across taxonomic ranks in the North Pacific</article-title>
               <source>The ISME Journal</source>
               <volume>10</volume>
               <fpage>1555</fpage>
               <lpage>1567</lpage>
               <pub-id pub-id-type="doi">10.1038/ismej.2015.244</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-42"><label>Li et al. (2009)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Li</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Handsaker</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Wysoker</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Fennell</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Ruan</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Homer</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Marth</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Abecasis</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Durbin</surname>
                     <given-names>R</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2009">2009</year>
               <article-title>The Sequence Alignment/Map format and SAMtools</article-title>
               <source>Bioinformatics</source>
               <volume>25</volume>
               <fpage>2078</fpage>
               <lpage>2079</lpage>
               <pub-id pub-id-type="doi">10.1093/bioinformatics/btp352</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-43"><label>Lorenz &amp; Eck (2005)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Lorenz</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Eck</surname>
                     <given-names>J</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2005">2005</year>
               <article-title>Metagenomics and industrial applications</article-title>
               <source>Nature Reviews. Microbiology</source>
               <volume>3</volume>
               <fpage>510</fpage>
               <lpage>516</lpage>
               <pub-id pub-id-type="doi">10.1038/nrmicro1161</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-44"><label>Makarova et al. (2006)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Makarova</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Slesarev</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Wolf</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Sorokin</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Mirkin</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Koonin</surname>
                     <given-names>E</given-names>
                  </name>
                  <name>
                     <surname>Pavlov</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Pavlova</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Karamychev</surname>
                     <given-names>V</given-names>
                  </name>
                  <name>
                     <surname>Polouchine</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Shakhova</surname>
                     <given-names>V</given-names>
                  </name>
                  <name>
                     <surname>Grigoriev</surname>
                     <given-names>I</given-names>
                  </name>
                  <name>
                     <surname>Lou</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Rohksar</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Lucas</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Huang</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Goodstein</surname>
                     <given-names>DM</given-names>
                  </name>
                  <name>
                     <surname>Hawkins</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Plengvidhya</surname>
                     <given-names>V</given-names>
                  </name>
                  <name>
                     <surname>Welker</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Hughes</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Goh</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Benson</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Baldwin</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Lee</surname>
                     <given-names>J-H</given-names>
                  </name>
                  <name>
                     <surname>Díaz-Muñiz</surname>
                     <given-names>I</given-names>
                  </name>
                  <name>
                     <surname>Dosti</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Smeianov</surname>
                     <given-names>V</given-names>
                  </name>
                  <name>
                     <surname>Wechter</surname>
                     <given-names>W</given-names>
                  </name>
                  <name>
                     <surname>Barabote</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Lorca</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Altermann</surname>
                     <given-names>E</given-names>
                  </name>
                  <name>
                     <surname>Barrangou</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Ganesan</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Xie</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Rawsthorne</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Tamir</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Parker</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Breidt</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Broadbent</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Hutkins</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>O’Sullivan</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Steele</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Unlu</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Saier</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Klaenhammer</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Richardson</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Kozyavkin</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Weimer</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Mills</surname>
                     <given-names>D</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2006">2006</year>
               <article-title>Comparative genomics of the lactic acid bacteria</article-title>
               <source>Proceedings of the National Academy of Sciences of the United States of America</source>
               <volume>103</volume>
               <fpage>15611</fpage>
               <lpage>15616</lpage>
               <pub-id pub-id-type="doi">10.1073/pnas.0607117103</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-45"><label>Malmstrom et al. (2010)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Malmstrom</surname>
                     <given-names>RR</given-names>
                  </name>
                  <name>
                     <surname>Coe</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Kettler</surname>
                     <given-names>GC</given-names>
                  </name>
                  <name>
                     <surname>Martiny</surname>
                     <given-names>AC</given-names>
                  </name>
                  <name>
                     <surname>Frias-Lopez</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Zinser</surname>
                     <given-names>ER</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2010">2010</year>
               <article-title>Temporal dynamics of <italic>Prochlorococcus</italic> ecotypes in the Atlantic and Pacific oceans</article-title>
               <source>The ISME Journal</source>
               <volume>4</volume>
               <fpage>1252</fpage>
               <lpage>1264</lpage>
               <pub-id pub-id-type="doi">10.1038/ismej.2010.60</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-46"><label>Minoche, Dohm &amp; Himmelbauer (2011)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Minoche</surname>
                     <given-names>AE</given-names>
                  </name>
                  <name>
                     <surname>Dohm</surname>
                     <given-names>JC</given-names>
                  </name>
                  <name>
                     <surname>Himmelbauer</surname>
                     <given-names>H</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2011">2011</year>
               <article-title>Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems</article-title>
               <source>Genome Biology</source>
               <volume>12</volume>
               <fpage>R112</fpage>
               <pub-id pub-id-type="doi">10.1186/gb-2011-12-11-r112</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-47"><label>Moisander et al. (2012)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Moisander</surname>
                     <given-names>PH</given-names>
                  </name>
                  <name>
                     <surname>Zhang</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Boyle</surname>
                     <given-names>EA</given-names>
                  </name>
                  <name>
                     <surname>Hewson</surname>
                     <given-names>I</given-names>
                  </name>
                  <name>
                     <surname>Montoya</surname>
                     <given-names>JP</given-names>
                  </name>
                  <name>
                     <surname>Zehr</surname>
                     <given-names>JP</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2012">2012</year>
               <article-title>Analogous nutrient limitations in unicellular diazotrophs and <italic>Prochlorococcus</italic> in the South Pacific Ocean</article-title>
               <source>The ISME Journal</source>
               <volume>6</volume>
               <fpage>733</fpage>
               <lpage>744</lpage>
               <pub-id pub-id-type="doi">10.1038/ismej.2011.152</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-48"><label>Muñoz-Marín et al. (2017)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Muñoz-Marín</surname>
                     <given-names>MDC</given-names>
                  </name>
                  <name>
                     <surname>Gómez-Baena</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Díez</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Beynon</surname>
                     <given-names>RJ</given-names>
                  </name>
                  <name>
                     <surname>González-Ballester</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Zubkov</surname>
                     <given-names>MV</given-names>
                  </name>
                  <name>
                     <surname>García-Fernández</surname>
                     <given-names>JM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2017">2017</year>
               <article-title>Glucose uptake in <italic>Prochlorococcus</italic>: diversity of kinetics and effects on the metabolism</article-title>
               <source>Frontiers in Microbiology</source>
               <volume>8</volume>
               <pub-id pub-id-type="doi">10.3389/fmicb.2017.00327</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-49"><label>Muñoz Marín et al. (2013)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Muñoz Marín</surname>
                     <given-names>MDC</given-names>
                  </name>
                  <name>
                     <surname>Luque</surname>
                     <given-names>I</given-names>
                  </name>
                  <name>
                     <surname>Zubkov</surname>
                     <given-names>MV</given-names>
                  </name>
                  <name>
                     <surname>Hill</surname>
                     <given-names>PG</given-names>
                  </name>
                  <name>
                     <surname>Diez</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>García-Fernández</surname>
                     <given-names>JM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2013">2013</year>
               <article-title><italic>Prochlorococcus</italic> can use the Pro1404 transporter to take up glucose at nanomolar concentrations in the Atlantic Ocean</article-title>
               <source>Proceedings of the National Academy of Sciences of the United States of America</source>
               <volume>110</volume>
               <fpage>8597</fpage>
               <lpage>8602</lpage>
               <pub-id pub-id-type="doi">10.1073/pnas.1221775110</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-50"><label>Nayfach et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Nayfach</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Rodriguez-Mueller</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Garud</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Pollard</surname>
                     <given-names>KS</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography</article-title>
               <source>Genome Research</source>
               <volume>26</volume>
               <fpage>1612</fpage>
               <lpage>1625</lpage>
               <pub-id pub-id-type="doi">10.1101/gr.201863.115</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-51"><label>Olson et al. (1990)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Olson</surname>
                     <given-names>RJ</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
                  <name>
                     <surname>Zettler</surname>
                     <given-names>ER</given-names>
                  </name>
                  <name>
                     <surname>Altabet</surname>
                     <given-names>MA</given-names>
                  </name>
                  <name>
                     <surname>Dusenberry</surname>
                     <given-names>JA</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="1990">1990</year>
               <article-title>Spatial and temporal distributions of prochlorophyte picoplankton in the North Atlantic Ocean</article-title>
               <source>Deep Sea Research Part A, Oceanographic Research Papers</source>
               <volume>37</volume>
               <fpage>1033</fpage>
               <lpage>1051</lpage>
               <pub-id pub-id-type="doi">10.1016/0198-0149(90)90109-9</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-52"><label>Parkhill et al. (2000)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Parkhill</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Wren</surname>
                     <given-names>BW</given-names>
                  </name>
                  <name>
                     <surname>Mungall</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Ketley</surname>
                     <given-names>JM</given-names>
                  </name>
                  <name>
                     <surname>Churcher</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Basham</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Chillingworth</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Davies</surname>
                     <given-names>RM</given-names>
                  </name>
                  <name>
                     <surname>Feltwell</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Holroyd</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Jagels</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Karlyshev</surname>
                     <given-names>AV</given-names>
                  </name>
                  <name>
                     <surname>Moule</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Pallen</surname>
                     <given-names>MJ</given-names>
                  </name>
                  <name>
                     <surname>Penn</surname>
                     <given-names>CW</given-names>
                  </name>
                  <name>
                     <surname>Quail</surname>
                     <given-names>MA</given-names>
                  </name>
                  <name>
                     <surname>Rajandream</surname>
                     <given-names>MA</given-names>
                  </name>
                  <name>
                     <surname>Rutherford</surname>
                     <given-names>KM</given-names>
                  </name>
                  <name>
                     <surname>Van Vliet</surname>
                     <given-names>AH</given-names>
                  </name>
                  <name>
                     <surname>Whitehead</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Barrell</surname>
                     <given-names>BG</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2000">2000</year>
               <article-title>The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences</article-title>
               <source>Nature</source>
               <volume>403</volume>
               <fpage>665</fpage>
               <lpage>668</lpage>
               <pub-id pub-id-type="doi">10.1038/35001088</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-53"><label>Porter et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Porter</surname>
                     <given-names>SS</given-names>
                  </name>
                  <name>
                     <surname>Chang</surname>
                     <given-names>PL</given-names>
                  </name>
                  <name>
                     <surname>Conow</surname>
                     <given-names>CA</given-names>
                  </name>
                  <name>
                     <surname>Dunham</surname>
                     <given-names>JP</given-names>
                  </name>
                  <name>
                     <surname>Friesen</surname>
                     <given-names>ML</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Association mapping reveals novel serpentine adaptation gene clusters in a population of symbiotic Mesorhizobium</article-title>
               <source>ISME Journal</source>
               <volume>11</volume>
               <fpage>248</fpage>
               <lpage>262</lpage>
               <pub-id pub-id-type="doi">10.1038/ismej.2016.88</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-54"><label>Price, Dehal &amp; Arkin (2010)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Price</surname>
                     <given-names>MN</given-names>
                  </name>
                  <name>
                     <surname>Dehal</surname>
                     <given-names>PS</given-names>
                  </name>
                  <name>
                     <surname>Arkin</surname>
                     <given-names>AP</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2010">2010</year>
               <article-title>FastTree 2—approximately maximum-likelihood trees for large alignments</article-title>
               <source>PLOS ONE</source>
               <volume>5</volume>
               <elocation-id>e9490</elocation-id>
               <pub-id pub-id-type="doi">10.1371/journal.pone.0009490</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-55"><label>Qin et al. (2010)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Qin</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Li</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Raes</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Arumugam</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Burgdorf</surname>
                     <given-names>KS</given-names>
                  </name>
                  <name>
                     <surname>Manichanh</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Nielsen</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Pons</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Levenez</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Yamada</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Mende</surname>
                     <given-names>DR</given-names>
                  </name>
                  <name>
                     <surname>Li</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Xu</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Li</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Li</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Cao</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Wang</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Liang</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Zheng</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Xie</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Tap</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Lepage</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Bertalan</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Batto</surname>
                     <given-names>JM</given-names>
                  </name>
                  <name>
                     <surname>Hansen</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Le Paslier</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Linneberg</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Nielsen</surname>
                     <given-names>HB</given-names>
                  </name>
                  <name>
                     <surname>Pelletier</surname>
                     <given-names>E</given-names>
                  </name>
                  <name>
                     <surname>Renault</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Sicheritz-Ponten</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Turner</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Zhu</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Yu</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Jian</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Zhou</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Li</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Zhang</surname>
                     <given-names>X</given-names>
                  </name>
                  <name>
                     <surname>Qin</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Yang</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Wang</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Brunak</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Doré</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Guarner</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Kristiansen</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Pedersen</surname>
                     <given-names>O</given-names>
                  </name>
                  <name>
                     <surname>Parkhill</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Weissenbach</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Consortium</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Bork</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Ehrlich</surname>
                     <given-names>SD</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2010">2010</year>
               <article-title>A human gut microbial gene catalogue established by metagenomic sequencing</article-title>
               <source>Nature</source>
               <volume>464</volume>
               <fpage>59</fpage>
               <lpage>65</lpage>
               <pub-id pub-id-type="doi">10.1038/nature08821</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-56"><label>Quince et al. (2017)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Quince</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Delmont</surname>
                     <given-names>TO</given-names>
                  </name>
                  <name>
                     <surname>Raguideau</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Alneberg</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Darling</surname>
                     <given-names>AE</given-names>
                  </name>
                  <name>
                     <surname>Collins</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Eren</surname>
                     <given-names>AM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2017">2017</year>
               <article-title>DESMAN: a new tool for de novo extraction of strains from metagenomes</article-title>
               <source>Genome Biology</source>
               <volume>18</volume>
               <fpage>181</fpage>
               <pub-id pub-id-type="doi">10.1186/s13059-017-1309-9</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-57"><label>Rambaut (2009)</label><element-citation publication-type="book">
               <person-group person-group-type="author">
                  <name>
                     <surname>Rambaut</surname>
                     <given-names>A</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2009">2009</year>
               <source>FigTree, a graphical viewer of phylogenetic trees</source>
               <publisher-name>Institute of Evolutionary Biology University of Edinburgh</publisher-name>
               <publisher-loc>Edinburgh</publisher-loc>
            </element-citation>
         </ref>
         <ref id="ref-58"><label>Read et al. (2003)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Read</surname>
                     <given-names>TD</given-names>
                  </name>
                  <name>
                     <surname>Peterson</surname>
                     <given-names>SN</given-names>
                  </name>
                  <name>
                     <surname>Tourasse</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Baillie</surname>
                     <given-names>LW</given-names>
                  </name>
                  <name>
                     <surname>Paulsen</surname>
                     <given-names>IT</given-names>
                  </name>
                  <name>
                     <surname>Nelson</surname>
                     <given-names>KE</given-names>
                  </name>
                  <name>
                     <surname>Tettelin</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Fouts</surname>
                     <given-names>DE</given-names>
                  </name>
                  <name>
                     <surname>Eisen</surname>
                     <given-names>JA</given-names>
                  </name>
                  <name>
                     <surname>Gill</surname>
                     <given-names>SR</given-names>
                  </name>
                  <name>
                     <surname>Holtzapple</surname>
                     <given-names>EK</given-names>
                  </name>
                  <name>
                     <surname>Okstad</surname>
                     <given-names>OA</given-names>
                  </name>
                  <name>
                     <surname>Helgason</surname>
                     <given-names>E</given-names>
                  </name>
                  <name>
                     <surname>Rilstone</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Wu</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Kolonay</surname>
                     <given-names>JF</given-names>
                  </name>
                  <name>
                     <surname>Beanan</surname>
                     <given-names>MJ</given-names>
                  </name>
                  <name>
                     <surname>Dodson</surname>
                     <given-names>RJ</given-names>
                  </name>
                  <name>
                     <surname>Brinkac</surname>
                     <given-names>LM</given-names>
                  </name>
                  <name>
                     <surname>Gwinn</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>DeBoy</surname>
                     <given-names>RT</given-names>
                  </name>
                  <name>
                     <surname>Madpu</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Daugherty</surname>
                     <given-names>SC</given-names>
                  </name>
                  <name>
                     <surname>Durkin</surname>
                     <given-names>AS</given-names>
                  </name>
                  <name>
                     <surname>Haft</surname>
                     <given-names>DH</given-names>
                  </name>
                  <name>
                     <surname>Nelson</surname>
                     <given-names>WC</given-names>
                  </name>
                  <name>
                     <surname>Peterson</surname>
                     <given-names>JD</given-names>
                  </name>
                  <name>
                     <surname>Pop</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Khouri</surname>
                     <given-names>HM</given-names>
                  </name>
                  <name>
                     <surname>Radune</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Benton</surname>
                     <given-names>JL</given-names>
                  </name>
                  <name>
                     <surname>Mahamoud</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Jiang</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Hance</surname>
                     <given-names>IR</given-names>
                  </name>
                  <name>
                     <surname>Weidman</surname>
                     <given-names>JF</given-names>
                  </name>
                  <name>
                     <surname>Berry</surname>
                     <given-names>KJ</given-names>
                  </name>
                  <name>
                     <surname>Plaut</surname>
                     <given-names>RD</given-names>
                  </name>
                  <name>
                     <surname>Wolf</surname>
                     <given-names>AM</given-names>
                  </name>
                  <name>
                     <surname>Watkins</surname>
                     <given-names>KL</given-names>
                  </name>
                  <name>
                     <surname>Nierman</surname>
                     <given-names>WC</given-names>
                  </name>
                  <name>
                     <surname>Hazen</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Cline</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Redmond</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Thwaite</surname>
                     <given-names>JE</given-names>
                  </name>
                  <name>
                     <surname>White</surname>
                     <given-names>O</given-names>
                  </name>
                  <name>
                     <surname>Salzberg</surname>
                     <given-names>SL</given-names>
                  </name>
                  <name>
                     <surname>Thomason</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Friedlander</surname>
                     <given-names>AM</given-names>
                  </name>
                  <name>
                     <surname>Koehler</surname>
                     <given-names>TM</given-names>
                  </name>
                  <name>
                     <surname>Hanna</surname>
                     <given-names>PC</given-names>
                  </name>
                  <name>
                     <surname>Kolstø</surname>
                     <given-names>A-B</given-names>
                  </name>
                  <name>
                     <surname>Fraser</surname>
                     <given-names>CM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2003">2003</year>
               <article-title>The genome sequence of <italic>Bacillus anthracis</italic> Ames and comparison to closely related bacteria</article-title>
               <source>Nature</source>
               <volume>423</volume>
               <fpage>81</fpage>
               <lpage>86</lpage>
               <pub-id pub-id-type="doi">10.1038/nature01586</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-59"><label>Reno et al. (2009)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Reno</surname>
                     <given-names>ML</given-names>
                  </name>
                  <name>
                     <surname>Held</surname>
                     <given-names>NL</given-names>
                  </name>
                  <name>
                     <surname>Fields</surname>
                     <given-names>CJ</given-names>
                  </name>
                  <name>
                     <surname>Burke</surname>
                     <given-names>PV</given-names>
                  </name>
                  <name>
                     <surname>Whitaker</surname>
                     <given-names>RJ</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2009">2009</year>
               <article-title>Biogeography of the Sulfolobus islandicus pan-genome</article-title>
               <source>Proceedings of the National Academy of Sciences of the United States of America</source>
               <volume>106</volume>
               <fpage>8605</fpage>
               <lpage>8610</lpage>
               <pub-id pub-id-type="doi">10.1073/pnas.0808945106</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-60"><label>Rocap et al. (2002)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Rocap</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Distel</surname>
                     <given-names>DL</given-names>
                  </name>
                  <name>
                     <surname>Waterbury</surname>
                     <given-names>JB</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2002">2002</year>
               <article-title>Resolution of <italic>Prochlorococcus</italic> and Synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences</article-title>
               <source>Applied and Environmental Microbiology</source>
               <volume>68</volume>
               <fpage>1180</fpage>
               <lpage>1191</lpage>
               <pub-id pub-id-type="doi">10.1128/AEM.68.3.1180-1191.2002</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-61"><label>Rocap et al. (2003)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Rocap</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Larimer</surname>
                     <given-names>FW</given-names>
                  </name>
                  <name>
                     <surname>Lamerdin</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Malfatti</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Chain</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Ahlgren</surname>
                     <given-names>NA</given-names>
                  </name>
                  <name>
                     <surname>Arellano</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Coleman</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Hauser</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Hess</surname>
                     <given-names>WR</given-names>
                  </name>
                  <name>
                     <surname>Johnson</surname>
                     <given-names>ZI</given-names>
                  </name>
                  <name>
                     <surname>Land</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Lindell</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Post</surname>
                     <given-names>AF</given-names>
                  </name>
                  <name>
                     <surname>Regala</surname>
                     <given-names>W</given-names>
                  </name>
                  <name>
                     <surname>Shah</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Shaw</surname>
                     <given-names>SL</given-names>
                  </name>
                  <name>
                     <surname>Steglich</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Sullivan</surname>
                     <given-names>MB</given-names>
                  </name>
                  <name>
                     <surname>Ting</surname>
                     <given-names>CS</given-names>
                  </name>
                  <name>
                     <surname>Tolonen</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Webb</surname>
                     <given-names>EA</given-names>
                  </name>
                  <name>
                     <surname>Zinser</surname>
                     <given-names>ER</given-names>
                  </name>
                  <name>
                     <surname>Chisholm</surname>
                     <given-names>SW</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2003">2003</year>
               <article-title>Genome divergence in two <italic>Prochlorococcus</italic> ecotypes reflects oceanic niche differentiation</article-title>
               <source>Nature</source>
               <volume>424</volume>
               <fpage>1042</fpage>
               <lpage>1047</lpage>
               <pub-id pub-id-type="doi">10.1038/nature01947</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-62"><label>Rusch et al. (2010)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Rusch</surname>
                     <given-names>DB</given-names>
                  </name>
                  <name>
                     <surname>Martiny</surname>
                     <given-names>AC</given-names>
                  </name>
                  <name>
                     <surname>Dupont</surname>
                     <given-names>CL</given-names>
                  </name>
                  <name>
                     <surname>Halpern</surname>
                     <given-names>AL</given-names>
                  </name>
                  <name>
                     <surname>Venter</surname>
                     <given-names>JC</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2010">2010</year>
               <article-title>Characterization of <italic>Prochlorococcus</italic> clades from iron-depleted oceanic regions</article-title>
               <source>Proceedings of the National Academy of Sciences of the United States of America</source>
               <volume>107</volume>
               <fpage>16184</fpage>
               <lpage>16189</lpage>
               <pub-id pub-id-type="doi">10.1073/pnas.1009513107</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-63"><label>Scholz et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Scholz</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Ward</surname>
                     <given-names>DV</given-names>
                  </name>
                  <name>
                     <surname>Pasolli</surname>
                     <given-names>E</given-names>
                  </name>
                  <name>
                     <surname>Tolio</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Zolfo</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Asnicar</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Truong</surname>
                     <given-names>DT</given-names>
                  </name>
                  <name>
                     <surname>Tett</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Morrow</surname>
                     <given-names>AL</given-names>
                  </name>
                  <name>
                     <surname>Segata</surname>
                     <given-names>N</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Strain-level microbial epidemiology and population genomics from shotgun metagenomics</article-title>
               <source>Nature Methods</source>
               <volume>13</volume>
               <fpage>435</fpage>
               <lpage>438</lpage>
               <pub-id pub-id-type="doi">10.1038/nmeth.3802</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-64"><label>Sharon et al. (2013)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Sharon</surname>
                     <given-names>I</given-names>
                  </name>
                  <name>
                     <surname>Morowitz</surname>
                     <given-names>MJ</given-names>
                  </name>
                  <name>
                     <surname>Thomas</surname>
                     <given-names>BC</given-names>
                  </name>
                  <name>
                     <surname>Costello</surname>
                     <given-names>EK</given-names>
                  </name>
                  <name>
                     <surname>Relman</surname>
                     <given-names>DA</given-names>
                  </name>
                  <name>
                     <surname>Banfield</surname>
                     <given-names>JF</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2013">2013</year>
               <article-title>Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization</article-title>
               <source>Genome Research</source>
               <volume>23</volume>
               <fpage>111</fpage>
               <lpage>120</lpage>
               <pub-id pub-id-type="doi">10.1101/gr.142315.112</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-65"><label>Smith et al. (1997)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Smith</surname>
                     <given-names>DR</given-names>
                  </name>
                  <name>
                     <surname>Doucette-Stamm</surname>
                     <given-names>LA</given-names>
                  </name>
                  <name>
                     <surname>Deloughery</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Lee</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Dubois</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Aldredge</surname>
                     <given-names>T</given-names>
                  </name>
                  <name>
                     <surname>Bashirzadeh</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Blakely</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Cook</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Gilbert</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Harrison</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Hoang</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Keagle</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Lumm</surname>
                     <given-names>W</given-names>
                  </name>
                  <name>
                     <surname>Pothier</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Qiu</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Spadafora</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Vicaire</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Wang</surname>
                     <given-names>Y</given-names>
                  </name>
                  <name>
                     <surname>Wierzbowski</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Gibson</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Jiwani</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Caruso</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Bush</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Reeve</surname>
                     <given-names>JN</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="1997">1997</year>
               <article-title>Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics</article-title>
               <source>Journal of Bacteriology</source>
               <volume>179</volume>
               <fpage>7135</fpage>
               <lpage>7155</lpage>
               <pub-id pub-id-type="doi">10.1128/jb.179.22.7135-7155.1997</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-66"><label>Snel, Bork &amp; Huynen (1999)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Snel</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Bork</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Huynen</surname>
                     <given-names>MA</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="1999">1999</year>
               <article-title>Genome phylogeny based on gene content</article-title>
               <source>Nature Genetics</source>
               <volume>21</volume>
               <fpage>108</fpage>
               <lpage>110</lpage>
               <pub-id pub-id-type="doi">10.1038/5052</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-67"><label>Sunagawa et al. (2015)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Sunagawa</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Coelho</surname>
                     <given-names>LP</given-names>
                  </name>
                  <name>
                     <surname>Chaffron</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Kultima</surname>
                     <given-names>JR</given-names>
                  </name>
                  <name>
                     <surname>Labadie</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Salazar</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Djahanschiri</surname>
                     <given-names>B</given-names>
                  </name>
                  <name>
                     <surname>Zeller</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Mende</surname>
                     <given-names>DR</given-names>
                  </name>
                  <name>
                     <surname>Alberti</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Cornejo-Castillo</surname>
                     <given-names>FM</given-names>
                  </name>
                  <name>
                     <surname>Costea</surname>
                     <given-names>PI</given-names>
                  </name>
                  <name>
                     <surname>Cruaud</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>D’Ovidio</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Engelen</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Ferrera</surname>
                     <given-names>I</given-names>
                  </name>
                  <name>
                     <surname>Gasol</surname>
                     <given-names>JM</given-names>
                  </name>
                  <name>
                     <surname>Guidi</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Hildebrand</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Kokoszka</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Lepoivre</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Lima-Mendez</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Poulain</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Poulos</surname>
                     <given-names>BT</given-names>
                  </name>
                  <name>
                     <surname>Royo-Llonch</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Sarmento</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Vieira-Silva</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Dimier</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Picheral</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Searson</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Kandels-Lewis</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Bowler</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>De Vargas</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Gorsky</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Grimsley</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Hingamp</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Iudicone</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Jaillon</surname>
                     <given-names>O</given-names>
                  </name>
                  <name>
                     <surname>Not</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Ogata</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Pesant</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Speich</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Stemmann</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Sullivan</surname>
                     <given-names>MB</given-names>
                  </name>
                  <name>
                     <surname>Weissenbach</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Wincker</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Karsenti</surname>
                     <given-names>E</given-names>
                  </name>
                  <name>
                     <surname>Raes</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Acinas</surname>
                     <given-names>SG</given-names>
                  </name>
                  <name>
                     <surname>Bork</surname>
                     <given-names>P</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2015">2015</year>
               <article-title>Ocean plankton. Structure and function of the global ocean microbiome</article-title>
               <source>Science</source>
               <volume>348</volume>
               <fpage>1261359</fpage>
               <pub-id pub-id-type="doi">10.1126/science.1261359</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-68"><label>Tatusov et al. (2000)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Tatusov</surname>
                     <given-names>RL</given-names>
                  </name>
                  <name>
                     <surname>Galperin</surname>
                     <given-names>MY</given-names>
                  </name>
                  <name>
                     <surname>Natale</surname>
                     <given-names>DA</given-names>
                  </name>
                  <name>
                     <surname>Koonin</surname>
                     <given-names>EV</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2000">2000</year>
               <article-title>The COG database: a tool for genome-scale analysis of protein functions and evolution</article-title>
               <source>Nucleic Acids Research</source>
               <volume>28</volume>
               <fpage>33</fpage>
               <lpage>36</lpage>
               <pub-id pub-id-type="doi">10.1093/nar/28.1.33</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-69"><label>Tettelin et al. (2005)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Tettelin</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Masignani</surname>
                     <given-names>V</given-names>
                  </name>
                  <name>
                     <surname>Cieslewicz</surname>
                     <given-names>MJ</given-names>
                  </name>
                  <name>
                     <surname>Donati</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Medini</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Ward</surname>
                     <given-names>NL</given-names>
                  </name>
                  <name>
                     <surname>Angiuoli</surname>
                     <given-names>SV</given-names>
                  </name>
                  <name>
                     <surname>Crabtree</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Jones</surname>
                     <given-names>AL</given-names>
                  </name>
                  <name>
                     <surname>Durkin</surname>
                     <given-names>AS</given-names>
                  </name>
                  <name>
                     <surname>Deboy</surname>
                     <given-names>RT</given-names>
                  </name>
                  <name>
                     <surname>Davidsen</surname>
                     <given-names>TM</given-names>
                  </name>
                  <name>
                     <surname>Mora</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Scarselli</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Margarit y Ros</surname>
                     <given-names>I</given-names>
                  </name>
                  <name>
                     <surname>Peterson</surname>
                     <given-names>JD</given-names>
                  </name>
                  <name>
                     <surname>Hauser</surname>
                     <given-names>CR</given-names>
                  </name>
                  <name>
                     <surname>Sundaram</surname>
                     <given-names>JP</given-names>
                  </name>
                  <name>
                     <surname>Nelson</surname>
                     <given-names>WC</given-names>
                  </name>
                  <name>
                     <surname>Madupu</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Brinkac</surname>
                     <given-names>LM</given-names>
                  </name>
                  <name>
                     <surname>Dodson</surname>
                     <given-names>RJ</given-names>
                  </name>
                  <name>
                     <surname>Rosovitz</surname>
                     <given-names>MJ</given-names>
                  </name>
                  <name>
                     <surname>Sullivan</surname>
                     <given-names>SA</given-names>
                  </name>
                  <name>
                     <surname>Daugherty</surname>
                     <given-names>SC</given-names>
                  </name>
                  <name>
                     <surname>Haft</surname>
                     <given-names>DH</given-names>
                  </name>
                  <name>
                     <surname>Selengut</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Gwinn</surname>
                     <given-names>ML</given-names>
                  </name>
                  <name>
                     <surname>Zhou</surname>
                     <given-names>L</given-names>
                  </name>
                  <name>
                     <surname>Zafar</surname>
                     <given-names>N</given-names>
                  </name>
                  <name>
                     <surname>Khouri</surname>
                     <given-names>H</given-names>
                  </name>
                  <name>
                     <surname>Radune</surname>
                     <given-names>D</given-names>
                  </name>
                  <name>
                     <surname>Dimitrov</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Watkins</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>O’Connor</surname>
                     <given-names>KJB</given-names>
                  </name>
                  <name>
                     <surname>Smith</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Utterback</surname>
                     <given-names>TR</given-names>
                  </name>
                  <name>
                     <surname>White</surname>
                     <given-names>O</given-names>
                  </name>
                  <name>
                     <surname>Rubens</surname>
                     <given-names>CE</given-names>
                  </name>
                  <name>
                     <surname>Grandi</surname>
                     <given-names>G</given-names>
                  </name>
                  <name>
                     <surname>Madoff</surname>
                     <given-names>LC</given-names>
                  </name>
                  <name>
                     <surname>Kasper</surname>
                     <given-names>DL</given-names>
                  </name>
                  <name>
                     <surname>Telford</surname>
                     <given-names>JL</given-names>
                  </name>
                  <name>
                     <surname>Wessels</surname>
                     <given-names>MR</given-names>
                  </name>
                  <name>
                     <surname>Rappuoli</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Fraser</surname>
                     <given-names>CM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2005">2005</year>
               <article-title>Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”</article-title>
               <source>Proceedings of the National Academy of Sciences of the United States of America</source>
               <volume>102</volume>
               <fpage>13950</fpage>
               <lpage>13955</lpage>
               <pub-id pub-id-type="doi">10.1073/pnas.0506758102</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-70"><label>Thies et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Thies</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Rausch</surname>
                     <given-names>SC</given-names>
                  </name>
                  <name>
                     <surname>Kovacic</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Schmidt-Thaler</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Wilhelm</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Rosenau</surname>
                     <given-names>F</given-names>
                  </name>
                  <name>
                     <surname>Daniel</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Streit</surname>
                     <given-names>W</given-names>
                  </name>
                  <name>
                     <surname>Pietruszka</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Jaeger</surname>
                     <given-names>K-E</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Metagenomic discovery of novel enzymes and biosurfactants in a slaughterhouse biofilm microbial community</article-title>
               <source>Scientific Reports</source>
               <volume>6</volume>
               <fpage>27035</fpage>
               <pub-id pub-id-type="doi">10.1038/srep27035</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-71"><label>Tringe et al. (2005)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Tringe</surname>
                     <given-names>SG</given-names>
                  </name>
                  <name>
                     <surname>Von Mering</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Kobayashi</surname>
                     <given-names>A</given-names>
                  </name>
                  <name>
                     <surname>Salamov</surname>
                     <given-names>AA</given-names>
                  </name>
                  <name>
                     <surname>Chen</surname>
                     <given-names>K</given-names>
                  </name>
                  <name>
                     <surname>Chang</surname>
                     <given-names>HW</given-names>
                  </name>
                  <name>
                     <surname>Podar</surname>
                     <given-names>M</given-names>
                  </name>
                  <name>
                     <surname>Short</surname>
                     <given-names>JM</given-names>
                  </name>
                  <name>
                     <surname>Mathur</surname>
                     <given-names>EJ</given-names>
                  </name>
                  <name>
                     <surname>Detter</surname>
                     <given-names>JC</given-names>
                  </name>
                  <name>
                     <surname>Bork</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Hugenholtz</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Rubin</surname>
                     <given-names>EM</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2005">2005</year>
               <article-title>Comparative metagenomics of microbial communities</article-title>
               <source>Science</source>
               <volume>308</volume>
               <fpage>554</fpage>
               <lpage>557</lpage>
               <pub-id pub-id-type="doi">10.1126/science.1107851</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-72"><label>Tyson et al. (2004)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Tyson</surname>
                     <given-names>GW</given-names>
                  </name>
                  <name>
                     <surname>Chapman</surname>
                     <given-names>J</given-names>
                  </name>
                  <name>
                     <surname>Hugenholtz</surname>
                     <given-names>P</given-names>
                  </name>
                  <name>
                     <surname>Allen</surname>
                     <given-names>EE</given-names>
                  </name>
                  <name>
                     <surname>Ram</surname>
                     <given-names>RJ</given-names>
                  </name>
                  <name>
                     <surname>Richardson</surname>
                     <given-names>PM</given-names>
                  </name>
                  <name>
                     <surname>Solovyev</surname>
                     <given-names>VV</given-names>
                  </name>
                  <name>
                     <surname>Rubin</surname>
                     <given-names>EM</given-names>
                  </name>
                  <name>
                     <surname>Rokhsar</surname>
                     <given-names>DS</given-names>
                  </name>
                  <name>
                     <surname>Banfield</surname>
                     <given-names>JF</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2004">2004</year>
               <article-title>Community structure and metabolism through reconstruction of microbial genomes from the environment</article-title>
               <source>Nature</source>
               <volume>428</volume>
               <fpage>37</fpage>
               <lpage>43</lpage>
               <pub-id pub-id-type="doi">10.1038/nature02340</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-73"><label>Van Dongen &amp; Abreu-Goodger (2012)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Van Dongen</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Abreu-Goodger</surname>
                     <given-names>C</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2012">2012</year>
               <article-title>Using MCL to extract clusters from networks</article-title>
               <source>Methods in Molecular Biology</source>
               <volume>804</volume>
               <fpage>281</fpage>
               <lpage>295</lpage>
               <pub-id pub-id-type="doi">10.1007/978-1-61779-361-5_15</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-74"><label>Vineis et al. (2016)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Vineis</surname>
                     <given-names>JH</given-names>
                  </name>
                  <name>
                     <surname>Ringus</surname>
                     <given-names>DL</given-names>
                  </name>
                  <name>
                     <surname>Morrison</surname>
                     <given-names>HG</given-names>
                  </name>
                  <name>
                     <surname>Delmont</surname>
                     <given-names>TO</given-names>
                  </name>
                  <name>
                     <surname>Dalal</surname>
                     <given-names>S</given-names>
                  </name>
                  <name>
                     <surname>Raffals</surname>
                     <given-names>LH</given-names>
                  </name>
                  <name>
                     <surname>Antonopoulos</surname>
                     <given-names>DA</given-names>
                  </name>
                  <name>
                     <surname>Rubin</surname>
                     <given-names>DT</given-names>
                  </name>
                  <name>
                     <surname>Eren</surname>
                     <given-names>AM</given-names>
                  </name>
                  <name>
                     <surname>Chang</surname>
                     <given-names>EB</given-names>
                  </name>
                  <name>
                     <surname>Sogin</surname>
                     <given-names>ML</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2016">2016</year>
               <article-title>Patient-specific bacteroides genome variants in pouchitis</article-title>
               <source>mBio</source>
               <volume>7</volume>
               <fpage>e01713</fpage>
               <lpage>16</lpage>
               <pub-id pub-id-type="doi">10.1128/MBIO.01713-16</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-75"><label>West et al. (2001)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>West</surname>
                     <given-names>NJ</given-names>
                  </name>
                  <name>
                     <surname>Schönhuber</surname>
                     <given-names>WA</given-names>
                  </name>
                  <name>
                     <surname>Fuller</surname>
                     <given-names>NJ</given-names>
                  </name>
                  <name>
                     <surname>Amann</surname>
                     <given-names>RI</given-names>
                  </name>
                  <name>
                     <surname>Rippka</surname>
                     <given-names>R</given-names>
                  </name>
                  <name>
                     <surname>Post</surname>
                     <given-names>AF</given-names>
                  </name>
                  <name>
                     <surname>Scanlan</surname>
                     <given-names>DJ</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2001">2001</year>
               <article-title>Closely related <italic>Prochlorococcus</italic> genotypes show remarkably different depth distributions in two oceanic regions as revealed by <italic>in situ</italic> hybridization using 16S rRNA-targeted oligonucleotides</article-title>
               <source>Microbiology</source>
               <volume>147</volume>
               <fpage>1731</fpage>
               <lpage>1744</lpage>
               <pub-id pub-id-type="doi">10.1099/00221287-147-7-1731</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-76"><label>Wilhelm et al. (2007)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Wilhelm</surname>
                     <given-names>LJ</given-names>
                  </name>
                  <name>
                     <surname>Tripp</surname>
                     <given-names>HJ</given-names>
                  </name>
                  <name>
                     <surname>Givan</surname>
                     <given-names>SA</given-names>
                  </name>
                  <name>
                     <surname>Smith</surname>
                     <given-names>DP</given-names>
                  </name>
                  <name>
                     <surname>Giovannoni</surname>
                     <given-names>SJ</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2007">2007</year>
               <article-title>Natural variation in SAR11 marine bacterioplankton genomes inferred from metagenomic data</article-title>
               <source>Biology Direct</source>
               <volume>2</volume>
               <fpage>27</fpage>
               <pub-id pub-id-type="doi">10.1186/1745-6150-2-27</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-77"><label>Zdobnov &amp; Apweiler (2001)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Zdobnov</surname>
                     <given-names>EM</given-names>
                  </name>
                  <name>
                     <surname>Apweiler</surname>
                     <given-names>R</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2001">2001</year>
               <article-title>InterProScan—an integration platform for the signature-recognition methods in InterPro</article-title>
               <source>Bioinformatics</source>
               <volume>17</volume>
               <fpage>847</fpage>
               <lpage>848</lpage>
               <pub-id pub-id-type="doi">10.1093/bioinformatics/17.9.847</pub-id>
            </element-citation>
         </ref>
         <ref id="ref-78"><label>Zhu et al. (2015)</label><element-citation publication-type="journal">
               <person-group person-group-type="author">
                  <name>
                     <surname>Zhu</surname>
                     <given-names>C</given-names>
                  </name>
                  <name>
                     <surname>Delmont</surname>
                     <given-names>TO</given-names>
                  </name>
                  <name>
                     <surname>Vogel</surname>
                     <given-names>TM</given-names>
                  </name>
                  <name>
                     <surname>Bromberg</surname>
                     <given-names>Y</given-names>
                  </name>
               </person-group>
               <year iso-8601-date="2015">2015</year>
               <article-title>Functional basis of microorganism classification</article-title>
               <source>PLOS Computational Biology</source>
               <volume>11</volume>
               <elocation-id>e1004472</elocation-id>
               <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004472</pub-id>
            </element-citation>
         </ref>
      </ref-list>
   </back>
</article>
