Building a genomic resource across Earth’s biomes for the community

Finding biological “gems” and an open invitation to explore
Building a genomic resource across Earth’s biomes for the community

I recall a year before I started graduate school at Scripps Institution of Oceanography, two landmark publications1,2 came out describing a new approach for “whole-genome shotgun sequencing” from natural microbial communities. It’s remarkable to think that within the span of my graduate career, metagenomics was born and has flourished into a transformative field impacting all sectors of science. Little did I know at the time that over 15 years later, I would work with an incredible team of scientists at the DOE Joint Genome Institute and KBase, along with over 200 scientists from around the world to release an expansive genomic catalog of uncultivated bacterial and archaeal genomes to support a new wave of genome-centric analyses from the vast troves of publicly-available metagenomic data.

When we initiated the project over three years ago, we set out to demonstrate the value of performing genome-resolved metagenomics at scale and develop easily accessible tools for the research community to explore and analyze these population genomes through the Integrated Microbial Genomes and Microbiomes (IMG/M, platform or perform metabolic modeling in Kbase ( But the fun really started when we began thinking: “We have this huge genomic resource from uncultivated microbes that significantly expands the known diversity of bacteria and archaea - what can we do with it?”

We decided to have a few teams showcase possible answers to that question. One analysis was headed by Dan Udwary, a computational biologist in the Secondary Metabolites group led by JGI Director Nigel Mouncey. With the decline in antibiotics discovery and the pressing need for new antimicrobials, Dan hypothesized that our large genome catalog could serve as a resource for new biosynthetic gene cluster (BGC) discovery. Dan discusses this in the genome mining primer episode of JGI’s Natural Prodcast podcast.  

In certain bacterial families and genera, secondary metabolism can be very rich - for example particular groups of Streptomyces typically contain 25-35 identifiable BGCs per genome. Other species have far fewer BGCs. We scanned the genome catalog and found more than 100,000 predicted clusters with the majority (87%) having no significant alignment to any reference sequence, suggesting they represent unique biosynthetic capabilities and thus novel secondary metabolites. Though the organisms themselves are unavailable through laboratory cultivation, synthetic biology could be used in the future to elucidate the chemical reactions and metabolites produced by these sequences. We also hope to dive into more specific analysis of BGC families and their related chemistry, using recent tools like BiG-SLICE and BiG-FAM3,4. As a part of the US Department of Energy, our mission focuses on finding solutions to energy and environmental challenges, and not drug discovery. So what else can we do with these novel secondary metabolites? We’re interested in learning more about the environmental and ecological roles for secondary metabolites. For example, plants may be benefiting from molecules produced by bacteria.

Another analysis was headed by Rekha Seshadri, a Computational Biologist in the Functional Annotation group led by Natalia Ivanova. Rekha was interested in exploring well-studied groups. One such example highlighted in the paper is a putative new Coxiella sp. reconstituted from coastal seawater samples. These genomes are closely related to the obligate intracellular human pathogen, Coxiella burnetii, the etiological agent of Q-fever and even retain some of the known virulence factors (Fig. 1). Similarly, other reconstructed genomes related to pathogens (e.g., Chlamydia spp. or Clostridium botulinum) provide opportunities to study the evolution of virulence, while relatives of species used in industrial applications (Bacillus subtilis, B. thuringiensis, Pseudomonas putida, methanotrophic taxa) can inform strategies for strain improvement. Analysis of our reconstructed genomes enables identification of known metabolic pathways (e.g., nitrogen fixation, alkane biosynthesis). This can also reveal important auxiliary genes or dependencies of these pathways, such as cofactor requirements, that could not be gleaned from unbinned metagenomes. Increasing numbers of constituent genomes for a group also contributes to statistical power enabling more robust comparisons; learn more in our IMG webinar on Metagenome Bins.

Schematic comparing the organization of genes for type IV secretion (T4SS) in an isolated strain of C. burnetii and a MAG reconstructed in our study. T4SS is an important virulence factor in C. burnetii that secretes host-modulating effector proteins directly into the host environment. Such interspecies comparisons shed light on core components with conserved organization, while also highlighting rearrangements, duplications, or deletions/insertions which might signify functional changes of the T4SS in uncultivated species.

The analyses we’ve highlighted in the paper are just the tip of the iceberg. Where do we go from here? I’ll reframe the question posited earlier: “We’ve made publicly available this huge genomic resource from uncultivated microbes that significantly expands the known diversity of bacteria and archaea - what can you do with it?”

  1. Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37-43 (2004).
  2. Venter, J. C. et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304, 66-74 (2004).
  3. Kautsar, S. A., van der Hooft, J. J. J., de Ridder, D. & Medema, M. H. BiG-SLiCE: A Highly Scalable Tool Maps the Diversity of 1.2 Million Biosynthetic Gene Clusters. bioRxiv, 2020.2008.2017.240838 (2020).
  4. Kautsar, S. A., Blin, K., Shaw, S., Weber, T. & Medema, M. H. BiG-FAM: the biosynthetic gene cluster families database. Nucleic Acids Research (2020).

Poster image: Artistic interpretation of how microbial genome sequences from the GEM catalog can help fill in gaps of knowledge about the microbes that play key roles in the Earth’s microbiomes. (Rendered by Zosia Rostomian​, Berkeley Lab)