Single-cell RNA-seq is worth long-read sequencing

Like Comment
Read the paper

Let’s consider the huge number of cells that makes up a human or a mouse body (~1013 and ~1010 cells, respectively). To understand physiology and pathophysiology, it is crucial to shed light onto the diversity of cells that constitute an organism. Projects, such as the Human Cell Atlas or the Tabulis muris , are aiming at cataloging all existing cell types and cell states, by measuring specific gene expression in a large number of human or murine organs. These atlases will represent global positioning systems for next-generation biologists, providing answers to many questions: How many cell types are there? What are the differences between identical cell types from different organs? How do they communicate with each other? ...

Several technological breakthroughs have made such atlases feasible: (1) progresses in microfluidics, that allow isolation of thousands of single cells[1,2,3]; (2) development of highly-sensitive protocols that enable sequencing library preparation, from the tiny amount of nucleic acid present in single cells [4]; (3) advances in computational biology, that led to the development of pipelines capable to analyze, explore and visualize large gene expression datasets, containing billions of measurements [5, 6].

There are now many protocols to analyze not only transcriptomes (scRNA-seq), but also somatic mutations, DNA methylation, chromatin accessibility, epigenetic profiles, copy number variations at a single-cell resolution. ScRNA-seq is currently the single cell omics approach with the highest throughput, it allows the analysis of transcriptomes of thousands of cells in parallel.

One limitation of high throughput scRNA-seq approaches is that they only generate 5’ or 3’ sequence information. The data resemble the SAGE data generated in the early days of high throughput transcriptome analysis. Information on splicing and sequence heterogeneity for most of the transcript is lost. Ironically, most single-cell RNA sequencing protocols do actually synthesize full length cDNA but it is then fragmented for short-read sequencing. Since only the 3’ end of the transcript is tagged with a cell barcode and an UMI during reverse transcription, internal and 5’ fragments cannot be associated with a cell and their sequence information is lost.

An obvious way to obtain full length transcriptome sequence information is to skip the cDNA fragmentation and to sequence the entire cDNA on a long read sequencer. High throughput scRNA-seq typically requires a huge number ( > 10^8) of sequencing reads. Oxford Nanopore Promethion sequencing is in our opinion currently by far the best suited technology to provide the required throughput at a reasonable cost. The principal challenge was to assign Nanopore reads, which have a mean error rate of about 5%, accurately to a cell and a UMI.

We thus developed a method entitled ScNaUmi-seq, that enables this [7]. Experiments performed on embryonic mouse brain show that this approach can generate high accuracy full length sequence information at high sequencing depths. It allows the definition of splicing and single nucleotide variations (RNA editing) at a single-cell resolution. The approach is easy to implement since it just requires long read sequencing of an aliquot of the unfragmented cDNA generated in the standard scRNA-seq workflow.

Full-length transcript sequence information should enrich single cell studies and will surely enhance newer versions of cell atlases. Another expected field of impact is cancerology where ScNaUmi-seq provides a way to analyze the mutational landscape of tumors.

Full length transcriptome sequence information at a single-cell resolution with ScNaUmi-seq. Top view: tSNE representation of single-cell expression in mouse fetal brain for the two main RNA isoforms of clathrin (Clta). Bottom view: exon structure of the 2 major isoforms, and bar graph of the abundance of different isoforms of Clta in different cell-types: 1, Cajal−Retzius; 2, radial glia; 3, cycling radial glia; 4, intermediate progenitor; 5, immature GABAergic, 6, mature GABAergic ; 7, immature Glutamatergic; 8, mature Glutamatergic).

[1] Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. (2017) doi:10.1038/ncomms14049.

[2] Wilbrey-Clark, A., Roberts, K. & Teichmann, S. A. Cell Atlas technologies and insights into tissue architecture. Biochem. J. (2020) doi:10.1042/bcj20190341.

[3] Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. (2020) doi:10.1038/s41587-020-0497-0.

[4] Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nature Reviews Genetics (2019) doi:10.1038/s41576-019-0150-2.

[5] Lähnemann, D. et al. Eleven grand challenges in single-cell data science. Genome Biology (2020) doi:10.1186/s13059-020-1926-6.

[6] Luecken, M. D. & Theis, F. J. Current best practices in single‐cell RNA‐seq analysis: a tutorial. Mol. Syst. Biol. (2019) doi:10.15252/msb.20188746.

[7] Lebrigand K, Magnone V, Barbry P, Waldmann R. High throughput error corrected Nanopore single cell transcriptome sequencing. Nature Communications (2020) 11, Article number: 4025 https://rdcu.be/b6dJw

Pascal BARBRY

Research Director, Université Côte d'Azur & CNRS

No comments yet.