Single-cell mapping of lineage and identity in direct reprogramming.

Biddy BA, Kong W, Kamimoto K, Guo G, Waye SE, Sun T, Morris SA. Single-cell mapping of lineage and identity in direct reprogramming. Nature. 2018.
Single-cell mapping of lineage and identity in direct reprogramming.

Graphical abstract: Biddy et al., Nature 2018: Single-cell mapping of lineage and identity in direct reprogramming

The early development of all mammals had long been thought to involve a sequence of cell fate decisions along an irreversible pathway of restricted potential and increasing specialization. Over the past half-century, this notion has been challenged, leading to the birth of a discipline termed ‘reprogramming’. Reprogramming can take several forms, but at its core is the manipulation of cellular form and function, and at its most extreme guides cells toward their most immature forms capable of generating the full spectrum of cellular identities. My lab is focused on a form of reprogramming that is designed to convert cells from one distinct, fully matured identity directly into another disparate and fully-differentiated cell type. We favor this approach, branded ‘direct lineage reprogramming’, as a shortcut to bypass immature fates, in an attempt to boost the speed and efficiency of cell identity engineering. Ultimately, these approaches promise practical utility in regenerative medicine; to transform accessible and abundant cells into clinically valuable cell types. These cells can, in turn, be used to repair diseased or damaged organs, or to support studies of otherwise inaccessible cell types ‘in the dish’.

Although there is much optimism for the role of cell fate reprogramming in regenerative medicine, there are some substantial limitations that we must first overcome. First, in many cases, cells only partially reprogram to immature, embryonic-like states that lack full adult function. Second, reprogramming generally tends to be a rather inefficient process, with few cells successfully converting to different identities. Understanding and overcoming these limitations would allow us to unlock any cell type to precisely engineer cell identity. I’d worked on this problem as a postdoc in George Daley’s lab, with Patrick Cahan. In 2014, we developed a computational tool, CellNet, that assesses cell identity and identifies factors to improve existing reprogramming protocols. We found that most published protocols resulted in the production of partially specified cell types. To explore this in more experimental detail we focused on a reprogramming strategy developed in 2011 by the Suzuki lab, where fibroblasts, a connective tissue cell type, are converted to hepatocytes, the most abundant cell type in the liver and highly-prized in regenerative medicine. Sekiya and Suzuki coined these reprogrammed cells ‘induced hepatocytes’, abbreviated to ‘iHeps’. This remarkable conversion of cell identity, across germ layers, was achieved using a relatively simple cocktail of just two transcription factors, Foxa1 and Hnf4a, which had been teased apart from a much larger pool of candidate factors known to play roles in hepatic fate specification. It was clear, though, from this initial study that the iHeps seemed to more closely resemble immature hepatocytes and could self-renew in culture, a feature not typical of mature hepatocytes. Moreover, very few fibroblasts successfully converted to iHeps, making this reprogramming protocol an ideal direct conversion prototype for assessment using our CellNet platform.

We were able to quickly replicate Sekiya and Suzuki’s iHep reprogramming protocol, which we found to be very reproducible, successfully converting fibroblasts on our first attempt. After deriving these initial iHep lines, we assessed their identity using CellNet. From this analysis, based on bulk gene expression measurements collected via microarray analysis, we found that iHeps only weakly resembled liver and retained strong fibroblast gene expression signatures. The surprising result from these analyses was that iHeps also harbored intestinal identity. These findings spurred a series of experiments designed to explore the intestinal potential of the reprogrammed cells. Eventually, this led to our transplanting GFP-labelled iHeps into damaged mouse colon. It was a thrilling result to find that these cells could functionally engraft the intestine, differentiating into an array of diverse intestinal cell types. Given that Sekiya and Suzuki had previously shown that the same cells can functionally engraft liver, we proposed that these cells more closely resemble embryonic-like progenitors, renaming them ‘induced endoderm progenitors’, or 'iEPs'. The Suzuki lab has since picked up the baton to show that the addition of two further transcription factors, Gata6 and Cdx2, can drive iEPs to an intestinal stem cell state in culture, showing how versatile and exciting this reprogramming protocol is.

Mouse embryonic fibroblasts are reprogrammed to induced endoderm progenitors (iEPs) via overexpression of the transcription factors Foxa1 and Hnf4a. We previously found that transplant of iEPs into the large intestine results in functional engraftment. The image on the right shows repopulation of an intestinal crypt by GFP-labelled iEPs (Morris et al., Cell 2014).

I launched my own independent research group just over three years ago, and we’ve continued to work on reprogramming to iEPs, as well as adding a few different cell fates to our repertoire, including macrophages, cardiomyocytes, and pluripotent stem cells. I’d always been fascinated by why the iEPs have such broad potential and why so few fibroblasts successfully reprogram. Now, I’m thrilled and feel grateful to have a team engaged by the same questions. Reprogramming mechanisms have traditionally been challenging to study precisely because so few cells successfully convert. This is particularly true for fibroblast to iEP conversion, where only 1 in 100 cells will fully reprogram. It had always presented a caveat for our previous studies using bulk expression measurements: we were attempting to assess a heterogenous population of cells where the signal from successfully reprogrammed cells was drowned by the overwhelming presence of partially-converted or unconverted cells. It was clear to us then that our analyses would benefit from single-cell profiling to deconstruct this heterogeneity. Back in 2014, with Patrick Cahan, I’d sequenced 300 converted cells, but this was nowhere near enough to capture enough iEPs… presumably, only three iEPs would be captured within this sample. As I was moving forward to establish my own lab, I knew single-cell biology would form an essential part of our analytical toolkit, but the relatively low-throughput and high expenses concerned me. Fortunately, just as I was starting my lab, high-throughput and affordable single-cell technologies were emerging. Specifically, we were able to quickly establish Drop-seq in the lab thanks to the open nature of the McCarroll lab in posting their working protocols and seeding a lively community. It felt like overnight that we had powered-up to be able to affordably sequence thousands of individual cells – it was an incredibly exciting time to build a new lab.

After establishing Drop-seq in the lab, we quickly applied it to fibroblast to iEP reprogramming. In hindsight, generating the data was straightforward – analyzing the data was much more challenging. Using analytical platforms such as Seurat, we were able to deconstruct the heterogeneity arising during the reprogramming process. The surprise was the many different transcriptional states that were revealed by these analyses. Although we had gained a better insight into this landscape of possible cell identities, we had no way to trace successfully reprogrammed cells from their origins, or to distinguish potential reprogramming dead-ends from transition states. To this end, we needed a way to connect related cells over time, so we could better ‘join-the-dots’ through the reprogramming process. We found inspiration from the Drop-seq pipeline, where each single-cell transcriptome is labelled by a cell barcode that is introduced during cDNA synthesis. What if we could introduce a second barcode to label a living cell, a barcode inherited by all the descendants of that cell, so we could track their behavior over time?

Corresponding author, Sam Morris and co-author, MD PhD student Chuner Guo, working on single-cell library preparation. Photo credit: Matt Miller

In late 2015, we began our cell labeling pilot experiments. We opted to use a lentivirus-based approach to integrate short random DNA barcodes that would be expressed as transcripts within each individual cell transcriptome. Unique combinations of these barcodes label each cell and all its progeny, where we named this approach ‘CellTagging’. We found that the CellTags are easily captured as part of each single-cell transcriptome, without any modification to the existing library preparation protocols. An important goal for us has been to make this method as easy and flexible as possible to implement, across diverse cell types and species, and lentivirus-based integration seemed a natural choice to achieve this. The flexibility of CellTagging led to rapid developments in our own protocol where we CellTag cells throughout the course of reprogramming to enable the construction of lineage trees, leading to some surprising results. 

In Biddy et al., using our CellTagging system, we’ve been able to define two distinct reprogramming trajectories, one leading to a successfully reprogrammed state, and one leading to a dead-end where cells start re-expressing fibroblast genes. The cells on the successful reprogramming trajectory express a previously undescribed methyltransferase, Mettl7a1, where when we add this to our Foxa1 and Hnf4a reprogramming cocktail, we increase the yield of iEPs threefold. Without the CellTags and therefore the ability to track the change in the identity of these cells back to their ancestors and origins, it would have been incredibly difficult to define these trajectories. Overall, it was a surprise to us how similarly the related clones behaved – they have a strong ‘family resemblance’. This was curious since most reprogramming is thought to be stochastic – i.e. the genes required for successful reprogramming are rarely engaged, thought to explain why cell fate conversion is so inefficient. In contrast, we find that subclones of cells derived from the same common ancestor will reprogram at the same efficiency. This suggests that at the time of reprogramming factor expression, cells can exist in a ‘privileged state’ where their reprogramming outcome, along with that of their progeny is determined. If we can unlock this permissive state, it will have the potential to reveal new ways to improve the efficiency and fidelity of the reprogramming process.

"We hope to demonstrate and expand the applicability of CellTagging, and in turn, generate interest in CellTagging among the community."
Brent Biddy, First Author.

So what is next? Within our own lab, we are interested in adopting and developing new technologies to record events early in the reprogramming process. Current single-cell RNA-sequencing approaches are limited in that we have to destroy each cell in order to analyze it. Thus, we are looking to record events such as transcription factor binding at the earliest stages of reprogramming, measuring it later in the process once we know whether or not a cell has successfully reprogrammed… at single-cell resolution, of course. In the more immediate future, we are applying CellTagging to a range of reprogramming strategies to probe the general mechanisms underpinning cell fate conversion. More broadly, we hope that CellTagging will be adopted by many labs. In this respect, we are committed to sharing this technology. This started with our publication of CellTagging as a preprint on BioRxiv in 2017. As the first paper from the lab, preprinting proved to be a fantastic decision – I believe that this enabled us to integrate into the single-cell community much faster and has provided us with valuable feedback. We have also made our CellTag pooled libraries available via Addgene, working protocol at, code and tutorials at Github, along with other resources that can be found via our CellTagging portal. CellTagging is a tool that we hope to develop further with the community and look forward to seeing it evolve.