Genomic surveillance of SARS-CoV-2 made easy and cost-effective

A new versatile and cost-effective method enables genomic surveillance of SARS-CoV-2 by preparing multiplexed next-generation sequencing (NGS) libraries from many samples in parallel

Like Comment
Read the paper

At the end of 2019 a cluster of pneumonia cases emerged in Wuhan, China. The source was a novel coronavirus identified as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) causing the coronavirus disease 2019 (COVID-19). From that point onward the virus spread all over the globe causing a still ongoing pandemic.

Since the beginning of the pandemic, thousands of SARS-CoV-2 genomes have been sequenced using various next-generation sequencing (NGS) techniques. This has enabled researchers all over the world to reconstruct the phylogenetic evolution of the virus and identify novel variants of concern (VOC) that started to emerge in 2020 and the beginning of 2021. Currently, many countries are using NGS techniques aiming at detecting emerging VOC as soon as possible.

Besides the efforts to trace the evolution of the virus, developing effective vaccines became the top priority for many researchers around the world. All existing SARS-CoV-2 vaccination strategies target the S protein (spike) of the virus, which is crucial for the viral entry into the target cells. Therefore, monitoring the evolution of the virus and the arising mutations, specifically in the S gene, is critical to ensure the effectiveness of vaccines.

For these aforementioned reasons, several NGS approaches and commercial kits have been developed. However, most of the current NGS techniques are very costly and difficult to be applied on a large-scale. Indeed, usually, one sequencing library must be prepared for each individual sample. This poses a limit to the number of samples that can be sequenced, with the risk of missing emerging VOC.

Motivated by the need of improving current methods for genomic surveillance of SARS-CoV-2, we decided to adapt our previously described CUTseq method to develop a method, COVseq, enabling the sequencing of hundreds of viral samples in parallel at an affordable cost. A description of COVseq is available here.

COVseq starts with a multiplexed PCR assay to amplify the whole SARS-CoV-2 genome and generate multiple amplicons. We successfully tested this first step by using two different approaches, one designed by the U.S. Centers for Disease Control and Prevention (CDC) that includes six different amplicon pools, and one developed by the ARTIC network (https://artic.network/) that only requires two amplicon pools.

The multiplexed PCR step is followed by CUTseq on the resulting purified amplicons. For this, two restriction enzymes, MseI and NlaIII, are used to cut the amplified viral genomic amplicons at defined locations. Next, oligonucleotide adapters that contain, among others, a sample-specific barcode sequence and the T7 promoter sequence, are ligated to the restriction sites. After ligation, up to 384 samples are pooled together and the barcoded DNA is linearly amplified using in vitro transcription prior to the final library preparation and consequent sequencing on Illumina platforms. A schematic workflow of the protocol is shown in Figure 1b of our paper.

We first tested the COVseq workflow by preparing a single library from the RNA extracted from the supernatant of a SARS-CoV-2 culture showing that our double digestion strategy enables near-complete coverage of viral genome (98.8%). To minimize the volume of reagents used and reduce the cost per sample, we performed all the barcoding reactions in nanoliter volumes using the I.DOT nanodispenser, which we previously deployed for high-throughput CUTseq.

Next, we applied COVseq to 29 SARS-CoV-2 positive RNA samples extracted from nasopharyngeal swabs and, to benchmark our method, we prepared 29 individual libraries using a commercial kit (NEBNext). We showed that the breadth of coverage of SARS-CoV-2 obtained with the two approaches was highly correlated as well as the number of single-nucleotide variations (SNVs) detected.

Motivated by these results, we demonstrated the reproducibility of COVseq and the sensitivity to detect known VOC by preparing three replicate libraries from additional 95 samples including 7 samples suspected to contain the Alpha variant based on routine PCR test. We showed extremely high correlation of the breadth of SARS-CoV-2 genome coverage between the replicates as well as SNVs identified. Furthermore, all the 7 samples suspected to contain the Alpha variant were correctly assigned to the B.1.1.7 Pangolin lineage.

Lastly, we performed a comparative cost analysis to assess the applicability of COVseq in mass-scale genomic surveillance programs. We compared COVseq to readily available commercial kits and we showed that, with our method, we can reduce the reagent costs to ~$15 per sample (excluding sequencing costs), which is around 5 times cheaper than the commercial kits that we examined.

 In conclusion, we have developed a simple and cost-effective method that enables genomic surveillance of the still ongoing SARS-CoV-2 pandemic. Our approach can be used to trace the phylogenetic evolution of the virus and to detect emerging new VOC with potentially higher infectivity/pathogenicity that might confer resistance to the existing vaccines. COVseq can also be adapted to other viruses, such as influenza viruses and DNA viruses.

So, if you want to start using COVseq and you need expert advice, all you need to do is contact us at nicola.crosetto@gmail.com. We look forward to helping you!

Magda Bienko

Principal Investigator, Karolinska Institute