Software-based targeted nanopore sequencing with UNCALLED

How a class project turned into a targeted sequencing method that uses code to manipulate individual DNA molecules

Like Comment
Read the paper

In the spring of 2017 I had to propose a final project in a class taught by my advisor, Michael Schatz. I was in the first year of my PhD, and I had never really come up with a research project from scratch before. I thought nanopore sequencing was cool, and I had recently learned about DNA alignment in a class taught by Ben Langmead, the creator of Bowtie himself. A few days before the deadline I talked to my classmate, Taher Mun, about a vague idea of how to align nanopore signal to DNA references. He thought the idea was good enough, so we roped in a classmate who knew something about nanopore, Yunfan Fan, and got to work. A month or so later we cobbled together enough code, simulated data, and plots to convince my advisor that we had the beginnings of a cool algorithm that could theoretically do something useful someday. 

A DNA strand being ejected mid-sequencing via ReadUntil (source: Oxford Nanopore Technologies)

But what was that “something useful”? Well, since it was released in 2015, the Oxford Nanopore MinION has had the ability to selectively stop sequencing individual reads by ejecting them from the pore using a method known as ReadUntil, which could enable software-based targeted sequencing. In 2016 Matt Loose et al. showed that you could use ReadUntil to enrich for certain sequences using a signal-to-basepair alignment algorithm called dynamic time warping (DTW). The catch was that DTW is far too slow to enrich for anything larger than a small viral genome and nobody had used ReadUntil on anything as large as a bacterial genome. This was the useful thing that my advisor and I thought the class project could someday address.

The UNCALLED logo and outline of the algorithm

Fast forward 3.5 years and we did it! We showed that UNCALLED is able to enrich or deplete for collections of multiple bacterial genomes or large panels of human genes associated with hereditary cancer. In the process I got rare hands-on wet-lab experience, which for a CS student mostly meant watching Yunfan pipet things and waiting to press enter. Later I met Matt Loose and Alex Payne at a conference and learned they were still working on ReadUntil. They were very encouraging, but the fact that I had competition really motivated me to finish things. In the end we were ready to submit our papers within a week of each other, and agreed to co-submit to Nature Biotechnology.

Working diligently in my COVID shelter/basement office. Also featuring Electro: the computer used for all sequencing experiments in the paper. (Source: Katie Jenike)

The paper was submitted a few weeks before COVID-19 shut down all in-person lab work at Johns Hopkins. This was a great opportunity to write a ReadUntil simulator, which we added to the paper in the review process to show that we should be able to target the entire human exome. Since then I’ve been digging deeper into UNCALLED to understand what needs improvement. We plan to begin targeting RNA sequences soon, and we believe raw signal alignment could have applications outside of ReadUntil, such as identifying epigenetic modifications. I’m really excited to see where we can take this project next.

Recent work on improving the algorithm. Top: signal from the first second of the read showing mapped events (green), unmapped events (blue), and ignored events (red). Bottom: a signal-to-reference dotplot, showing the signal correctly aligned by UNCALLED (green), other alignments considered by UNCALLED (dark blue/yellow gradient), and the basecalled alignment translated into signal-space (light blue). 

Sam Kovaka

PhD Student , Johns Hopkins University

1 Comment

Go to the profile of Ammar Husami
Ammar Husami about 2 months ago

That's very cool. I hope you adopt it for RNA and base modifications