After training as a chemist, I became interested in epigenetics during my Ph.D. in Dr. Chuan He’s lab at University of Chicago, not only because the versatile chemical groups found in epigenetic modifications offer a wonderland for a chemist to play with in an otherwise bland genome, but also because these epigenetic modifications are very important in biology. The two major epigenetic modifications in human DNA are 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC), both derived from cytosine, which play crucial roles in a broad range of biological processes from gene regulation to normal development. More importantly, changes in the abundance and location of 5mC and 5hmC are associated with various diseases, and abnormal patterns of 5mC and 5hmC are hallmarks of cancer. Therefore, as one could imagine, being able to map 5mC and 5hmC is invaluable both for basic research and clinical applications. In particular, one of my key aims is to sequence 5mC and 5hmC in circulating cell-free DNA (cfDNA) from the blood for non-invasive early cancer detection.
However, before we can do that, we need the right tools. In standard DNA sequencing, 5mC and 5hmC are both read as cytosine. Currently, the gold standard for 5mC and 5hmC sequencing is bisulfite sequencing, which is based on a 50-year old chemical reaction. Treatment of DNA with bisulfite leads to a cytosine-to-thymine (C-to-T) transition on unmodified cytosine but does not alter 5mC or 5hmC. Therefore, after the bisulfite reaction, the few cytosines left in the genome are the positions of 5mC and 5hmC. However, bisulfite sequencing has always been an unsatisfactory method for a number of reasons. First of all, this method indirectly detects 5mC and 5hmC and since unmodified cytosine accounts for 95% of total cytosines, the conversion of all these to thymine results in an imbalanced genome. The high level of thymine and low level of cytosine causes problems in the DNA sequencing and downstream data analysis, leading to errors and expensive sequencing costs. Secondly, as whoever has used bisulfite knows, it is a harsh chemical reaction that damages up to 99% of the DNA. This makes it a challenging method to use on low input samples, such as the very limited cfDNA we can get from the blood.
Ever since I started my own research group at University of Oxford three years ago, I have been thinking that there must be a better way to map 5mC and 5hmC. When a very talented chemistry postdoc, Yibin Liu, joined my group in 2017, I set out our goal to him: to develop a method to fix both problems of the bisulfite approach. In other words, we aimed to identify a mild reaction that detects 5mC and 5hmC directly without affecting unmodified cytosine. This was a very ambitious goal to challenge a decades-old gold standard, but from a chemistry point-of-view, it was very exciting. After months of trial and screens, Yibin found a new reaction, not on 5mC or 5hmC, but on another very minor epigenetic modification, 5-carboxylcytosine (5caC). He discovered that pyridine borane could lead to a C-to-T transition on 5caC. Whilst initially it seemed like this reaction did not achieve our goal of sequencing 5mC and 5hmC, luckily, nature had already solved the other half of the puzzle for us. An enzyme from human cells, called TET, naturally converts 5mC and 5hmC to 5caC. So, by simply combining the TET and pyridine borane reactions, we could achieve the direct detection of 5mC and 5hmC, without affecting unmodified cytosine. We named the new method TET-Assisted Pyridine borane Sequencing or TAPS.
After developing the core chemistry, we spent almost a year perfecting it, during which time my student Paulina Siejka-Zielińska helped a lot. We showed that TAPS is a very mild reaction with no DNA damage and can work with very low amounts of DNA, such as cfDNA. We then applied TAPS to whole-genome 5mC and 5hmC sequencing using genomic DNA from mouse embryonic stem cells. My colleague, computational biologist Benjamin Schuster-Böckler helped us to develop new computational analysis pipelines. We found TAPS generates much higher quality sequencing data, with fewer mistakes than bisulfite sequencing. As a result, TAPS reduces the sequencing cost by half compared to bisulfite sequencing and still offers more comprehensive 5mC and 5hmC sequencing. Computationally, TAPS data are also over 3 times faster to process than bisulfite data. Moreover, TAPS is very easily expandable to sequence 5mC and 5hmC separately, namely TAPSβ and CAPS. Finally, the nature of TAPS makes it compatible with genetic analysis, such as mutations and structural variations, thus enabling simultaneous epigenetic and genetic sequencing. The ability of TAPS to integrate epigenetic and genomic analysis could provide substantial reduction of sequencing cost by eliminating the need to perform standard whole-genome sequencing.
We think TAPS could directly replace bisulfite sequencing as a new standard in DNA epigenetic sequencing and beyond. In fact, it makes DNA epigenetic sequencing more affordable and accessible to wider academic research and clinical applications. Now my lab is busy working to apply TAPS in cfDNA towards our next goal of early cancer detection.
ACKNOWLEDGMENT. The author of this post is grateful to Françoise Howe for editing this blog article.
Our paper: Liu, Y., Siejka-Zielińska, P., Velikova, G., Bi, Y., Yuan, F., Tomkova, M., Bai, C., Chen, L., Schuster-Böckler, B. & Song, C.-X. Bisulfite-free direct detection of 5-methylcytosine and 5-hydroxymethylcytosine at base resolution. Nat Biotechnol. [doi: 10.1038/s41587-019-0041-2].