Mapping DNA sequence to function enables engineering of novel genetic parts

Here I share the motivation, findings and bigger questions related to our study of the function of 1000s of designed DNA sequences. We used a novel long read sequencing and computational analysis workflow to reveal DNA design principles and engineer novel genetic parts to regulate transcription.

Like Comment
Read the paper

Finding a needle in a hyper-astronomically large haystack

Complexity is encoded into life at all levels. This is particularly true for DNA. Genetic possibility spaces frequently exceed the gigantic numbers used in astronomy, even for short DNA sequences. These spaces have therefore been described as hyper-astronomical [1]. Synthetic biologists that are trying to design functional DNA sequences de novo face a challenge: how do you design functional DNA sequences from scratch?

Designing a novel functional DNA sequence in hyperastronomical genetic space is like trying to spot a meteor: you need to look far and wide (Image via Wikimedia Commons)

One strategy, (which we took in our recent Nature Communications article:, is to develop methods to test the function of lots of variants of a sequence, with a view of using that data to predict function from sequence. For our case study, we focus on a well characterised short sequence motif: the intrinsic terminator.

Terminators are valves

Why did we focus on intrinsic terminators? Terminators regulate transcription, the first stage at which the complexity encoded in DNA manifests as physical molecules. Transcription and therefore, termination are fundamental processes that living organisms arise from. Transcription of DNA to an RNA molecule is initiated and terminated by common sequence motifs in the DNA: the promoter and the terminator respectively. Whilst the variability of promoter strength is well known, variability in terminator strength is currently in the spotlight. Lalanne et. al [2] and Sorek et. al [3] recently revealed that terminators rarely completely stop the process of transcription. Instead, termination is normally incomplete and it is actually a process that tunes the amount of RNA polymerase (RNAP) which continue to transcribe past a gene. Therefore we consider terminators as valves, which, like a valve in a pipe, influence the flow of transcribing RNAPs along DNA.

Terminators can be thought of as valves that can shape the proportion of transcribing RNA polymerases which pass by (Image courtesy of Dr. T. E. Gorochowski)

Incomplete termination results in production of diverse transcript isoforms from the promoter. Each has a different length and terminators determine the ratios of these transcripts which, along with translational regulation, dictate the protein stoichiometries in cells. Besides the usefulness of terminators, we selected them since a new technology [4] enabled us to develop a method to measure the function of large libraries of terminators simultaneously, which could provide the datasets necessary for predictive design of terminators. In this work we successfully generated such a dataset and future work could involve developing predictive algorithms. Being able to predict terminator function from sequence was important for us since terminators are used in synthetic biology (the subject of my PhD) to design of genetic circuits to engineer the behaviour of microorganisms [5]. This carries great responsibility, though I’ll come to that at the end. 

A method for multiplexed characterisation of RNA parts, devices and circuits 

The simple and fast method that we have published can be used to assemble and characterise large libraries of transcriptional genetic parts in under a week. First, the library is constructed by combinatorially assembling short DNA oligonucleotides using DNA ligase. This assembly method is fast and cheap, though it has like other DNA assembly methods it has limitations: it leaves assembly scars and the abundance of library members is not completely even. Getting an even library coverage is an emerging challenge for DNA assembly as it is a precursor for thorough library characterisation during high-throughput studies of genetic parts. 

The entire pooled library is then transcribed using T7 RNA polymerase, which we selected since it is commonly used in a multitude of bioengineering applications. The RNA transcripts are then sequenced directly using nanopore RNA sequencing, enabling the function of each design to be characterised in one sequencing run. The long reads acquired using this method makes it suitable for characterising transcription of genetic circuits which can be many kilobases in length, rendering short read sequencing unsuitable. However, we found that RNAs were frequently fragmented during library preparation, this is regularly seen in nanopore dRNA-seq studies [6]. This means that many of these truncated sequencing reads fall by the wayside: room for improvement. In the paper we propose a simple model of the fragmentation process. This model was a small part of the computational pipeline that we developed for sequencing read demultiplexing and hope to share online in due course.

The steps involved in the assembly of the modular transcriptional valve library and its pooled characterisation using nanopore-based direct RNA sequencing

The secret ingredient that makes our method viable is using the terminators as “intrinsic barcodes”. These barcodes are used to find sequencing reads which match designs in the library. Our method is one of many that use sequencing as a “molecular counter” [7] and could be extended to study other transcriptional functions. Whilst this RNA sequencing approach allows measures transcription of genetic parts well, there are limitations. Only function (phenotype) in the transcribed products can be measured and the function has to be encoded on the same RNA as the barcode. After demultiplexing sequencing reads using the barcodes, they are piled up to give a transcriptional read profile, from which the termination efficiency can be calculated. That is the proportion of transcribing RNAPs that terminate. Using our method, we measured the function of several libraries of terminators, revealing where termination occurs with incredible resolution: nucleotide by nucleotide.

We chose to develop our method for characterisation of genetic parts in vitro in order to optimise the assay in a simple setting and offer tools for bioengineering in vitro. However, yes, we are looking into characterising libraries in vivo. We would do this by expressing the library of genetic constructs in cells, sorting the cells based on their fluorescent phenotypes using flow cytometry and finally using DNA sequencing to measure the abundance of each design within each sorted cell population [8]. This approach has its own limitations: it would not give the nucleotide resolution of transcriptional termination that we have been able to measure in vitro. Furthermore, the measurements would arise from more than just termination, for example mRNA degradation would also have an influence. Nonetheless, this extension of our approach would similarly enable collection of large datasets to inform genetic engineers as they design novel functional DNA and RNA sequences. 

New genetic tools for engineering transcript isoforms in vitro

We used our method for high-throughput sequencing characterisation of genetic part transcriptional function in vitro to characterise a total of 1780 valves. These valves offer the ability to tune the stoichiometry of transcript isoforms from consecutive genes from 1:1 to 11:1. Using an iterative process we designed and tested three libraries of valves. Our study highlights the importance of the region upstream of the core-terminator hairpin for determining termination efficiency. Depending upon the sequence, this region can be used to tune termination, or to insulate it from genetic sequence further upstream. Our results indicate that sequence motifs in this region can influence termination by interacting with the core-terminator through structure or base-pairing. Synthetic biologists designing genetic circuits should bear in mind that terminator function can change as upstream genes are varied.

The libraries of valves that were characterised can be used to shape the stoichiometry of biomolecules transcribed from arrays (Image courtesy of Dr. T. E. Gorochowski)

Our results also hint at several little known yet important properties of T7 RNAP. Firstly, we observed a general decrease in transcription from the 5’- to the 3’-end of the DNA template. This is not observed in the RNA calibration strand control, which is not transcribed. This could indicate that there is a slow fall-off of T7 RNAP during transcription. Furthermore, two independent results suggest that termination decreases as T7 RNAP traffic increases. At a 5-fold lower concentration of T7 RNAP, termination increased for some terminators. Furthermore, for all terminators that were tested, termination efficiency decreases the closer that terminators are to the promoter. Taken together this is a glimpse of the molecular insights that the single nucleotide resolution offered by pooled sequencing assays can bring. It shows the unforeseeable context-dependency of the function encoded in genetic parts [9].

Using transcriptional valves to regulate an array of CRISPR guide RNAs

Finally, we use some of the engineered valves to regulate the production of biomolecules in vitro. This also uses a multiplexed assay, which could easily be extended to study the transcription of thousands of arrays simultaneously. However, DNA synthesis limitations have to be overcome for designing arrays, which are much longer sequences than valves. The main limitations are the high cost and that repeated DNA sequences cannot be synthesised, meaning that genetic parts cannot be reused in the same array. Fortunately, the latter can often be overcome in biology, where there are many alternative sequences encoding the same genetic function. In our case, we drew from the diversity of valves, CRISPR handles and guide RNAs to design arrays that could be synthesised. To overcome high costs, DNA assembly of less expensive short oligonucleotides with unique molecular identifiers [10] placed in areas unlikely to influence function may be preferable. Our experimental and computational methods were easily adapted for characterisation of the transcriptional function encoded in all arrays in a single sequencing run. This demonstrates how transcriptional valves can be used to regulate arrays transcribing many kinds of biomolecules.

Innovating responsibly

Developing applications of the transcriptional valves beckons the question of how this could be done equitably and responsibly. Throughout the PhD, my awareness of the role of technology in society and the living world has grown [11]. Engineering DNA, part of the fabric of life, carries great responsibility to all living creatures. The interdependence of life systems show that it has not just to do with me but with all living things [12]. The same goes for technologies, which influence and are influenced by politics, economic systems, ecological systems and more [13].

Technologies arise from and also influence politics [13], ecologies, economies, wellbeing and more

With so many novel biotechnologies developed in the past century, asking how they can be used and developed responsibly, though hard to contemplate, could help shape a healthy living world for all. A framework for responsible innovation exists and encourages anticipating future impacts of a technology, reflecting upon them, engaging and deliberating them and acting to influence the direction of the research [14]. To enable innovators to do this, tools for assessing how technologies can enable us to live well [15] and more equitable approaches to entrepreneurship can help.

This blog was written by Matthew Tarnowski and was inspired by the work that we have published here: Tarnowski, M.J., Gorochowski, T.E. Massively parallel characterization of engineered transcript isoforms using direct RNA sequencing. Nat Commun 13, 434 (2022).


  1. Louis, A. A. Contingency, convergence and hyper-astronomical numbers in biological evolution. Stud. Hist. Philos. Biol. Biomed. Sci. 58, 107–116 (2016).
  2. Lalanne, J.-B. et al. Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry. Cell 173, 749–761.e38 (2018).
  3. Dar, D. et al. Term-seq reveals abundant ribo-regulation of antibiotics resistance in bacteria. Science 352, aad9822 (2016).
  4. Kono, N. & Arakawa, K. Nanopore sequencing: Review of potential applications in functional genomics. Dev. Growth Differ. 61, 316–326 (2019).
  5. Nielsen, A. A. K. et al. Genetic circuit design automation. Science 352, aac7341 (2016).
  6. Grünberger, F., Ferreira-Cerca, S. & Grohmann, D. Nanopore sequencing of RNA and cDNA molecules expands the transcriptomic toolbox in prokaryotes. doi:10.1101/2021.06.14.448286.
  7. Liszczak, G. & Muir, T. W. Nucleic Acid-Barcoding Technologies: Converting DNA Sequencing into a Broad-Spectrum Molecular Counter. Angew. Chem. Int. Ed Engl. 58, 4144–4162 (2019).
  8. Gorochowski, T. E. & Ellis, T. Designing efficient translation. Nat. Biotechnol. 36, 934–935 (2018).
  9. Cardinale, S. & Arkin, A. P. Contextualizing context for synthetic biology--identifying causes of failure of synthetic biological systems. Biotechnol. J. 7, 856–866 (2012).
  10. Karst, S. M. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169 (2021).
  11. Pansera, M., Owen, R., Meacham, D. & Kuh, V. Embedding responsible innovation within synthetic biology research and innovation: insights from a UK multi-disciplinary research centre. Journal of Responsible Innovation (2020) doi:10.1080/23299460.2020.1785678.
  12. Trinh, P., Zaneveld, J. R., Safranek, S. & Rabinowitz, P. M. One Health Relationships Between Human, Animal, and Environmental Microbiomes: A Mini-Review. Front Public Health 6, 235 (2018).
  13. Winner, L. The Whale and the Reactor: A Search for Limits in an Age of High Technology, Second Edition. (University of Chicago Press, 2020).
  14. Owen, R. et al. A Framework for Responsible Innovation. Responsible Innovation 27–50 (2013) doi:10.1002/9781118551424.ch2.
  15. Vetter, A. The Matrix of Convivial Technology – Assessing technologies for degrowth. Journal of Cleaner Production vol. 197 1778–1786 (2018).

Matthew Tarnowski

PhD, University of Bristol