Back in 2014, Alex Green of the Collins lab found that he could switch messages for bacterial protein expression on and off with a simple lock-and-key mechanism1,2. These protein-coding messages are contained in messenger ribonucleic acids (mRNAs), and can be locked using portions of the mRNA molecule itself. The lock in the mRNA can then be opened again with a corresponding key. In nature this is often a small molecule or protein, but Alex designed a lock where the key was a separate, completely programmable RNA molecule. This was a vastly more versatile and engineering-friendly system. Named after its mechanism of action, the toehold switch has since become a staple of RNA-based synthetic biology.
However, the mystery of precisely how the toehold switch and RNA-based switches in general work remained incompletely understood.
Ribonucleic acids can form knots or hairpins that lock out interacting molecules. If an mRNA contains such a hairpin-lock at the beginning of a gene, the ribosome won't initiate translation. Controlled unlocking of the mRNA is an attractive way to make a genetic switch since it avoids the complexity of regulating gene expression at the level of transcription. But for the longest time, people who used toehold switches were stymied by the fact that not every lock actually turned off protein expression, and not every key opened its designed lock. Hundreds of switches were made and tested, but rules for what made a good lock-and-key pair couldn’t be clearly identified. We figured this was because of the complexity of the design. Just one switch design can have over 1018 different possible variants! This gives us a lot of flexibility but also makes it hard to know if any given switch will actually work.
To better address the issue, we designed and tested a much larger dataset of ~100,000 new lock-and-key combinations to determine the rules for building effective and useful genetic switches.
We measured the extent to which each lock turned expression off, and how much its corresponding key turned gene expression back on. The divisor of these gave us a switch’s dynamic range, or on/off ratio. What we found surprised us. Most of the common approaches for analyzing RNA knots and hairpins based on physical thermodynamic and kinetic models (ie, NUPACK) were completely useless for predicting whether a lock-and-key combination worked. For a panel of thirty predicted thermodynamic parameters, the highest R2 correlation against our measured on/off ratio was only 0.04. Even when we fitted all those features with a logistic regressor model, the R2 only reached 0.11. For all practical purposes it might as well have been zero.
To capture the variance in this highly complex data, we knew we needed more firepower and flexibility. We decided to forgo standard thermodynamic analyses of RNA molecules, and instead pulled from the field of deep learning. When we fed our data to a variety of neural networks (a class of models known for their strength in pattern recognition) and let them learn underlying patterns straight from the RNA sequence, we found that they performed ten times better than any of the individual thermodynamic parameters, with an R2 accuracy of 0.43. With the help of collaborators in an accompanying paper, we were able to use these models to experimentally improve the performance of broken lock-and-key pairs.
One downside of using neural networks is that you can’t always tell how the model is making its guesses. While physical thermodynamic models like NUPACK had failed at predicting our data, they were at least based on fully defined, rational hypothesis of how RNA folds into hairpins and knots. Neural networks by contrast tend to be black boxes – sequence in, prediction out, and it’s hard to tell how the sausage gets made. To help with this we trained our neural network models on a particular type of input, a complementarity matrix, where instead of an RNA's sequence being represented by letters, its secondary structures are represented by diagonal lines. To our delight, we found that we could see which kinds of hairpins and knots our models based their guesses on by looking at attention maps of these matrices. These maps were visually intuitive and analogous to common formats for visualizing RNA secondary structure. Here is a quick visual comparison with NUPACK's user interface:
Using this approach we discovered that for most of the locks that failed to turn off protein expression, our neural network models were focusing on secondary structures outside of the intended hairpin, meaning that unintended structures had worked their way into the design. This was something that had been previously hypothesized, but hadn’t been successfully predicted by thermodynamic models. We hope this approach can be applied to help shed light on the inner workings of other deep learning analyses in RNA biology.
Read the paper here: https://www.nature.com/articles/s41467-020-18677-1
1. A. A. Green, P. A. Silver, J. J. Collins, P. Yin. "Toehold switches: de-novo-designed regulators of gene expression". Cell vol. 159, 925-939 (2014).
2. K. Pardee, A. A. Green, T. Ferrante, D. E. Cameron, A. DaleyKeyser, P. Yin, J. J. Collins1,2,3. “Paper-based synthetic gene networks.” Cell vol. 159, 940-54 (2014).