FLAMINGO: Understanding genome structures in the 3D space
We developed a highly scalable algorithm, FLAMINGO, to reconstruct the 3D chromosome structures at unprecedented resolution and unraveled the structural basis of chromatin interactions, long-range QTLs and high-order multi-way interactions.
What is the magic behind fitting a two-meter-long DNA sequence into a tiny human cell? The answer is relatively straightforward if you have ever seen a woolen ball: the DNA sequences are densely packaged into the cells with highly complex 3D structures. The intricate 3D chromosome structures bring distal genomic loci in the 1D space into small 3D neighborhoods, thus providing the structural basis of various biological events and playing an essential role in cell development, cell differentiation, and gene regulation. In recent years, the toolbox of characterizing the genome-wide chromatin interaction landscape has been largely expanded, including ChIA-PET, Capture-C and Hi-C. These methods measured the contact frequencies between any two genomic loci and characterized the chromatin interactions in the 2D space. However, the 2D chromatin contact maps cannot directly demonstrate the 3D chromosome structures. Massive structural information must be mined from the 2D contact maps by reconstructing the 3D chromosome structures from 2D contact maps.
We realized the accurate reconstruction of 3D chromosome structures at high resolution is challenging. Firstly, limited by the sequencing depth, the 2D chromatin contact maps are super sparse in high resolution (more than 99% of entries of 2D contact maps are missing at 1kb resolution), which calls for advanced algorithmic development to mitigate the high missing rates (Figure 1.a). Secondly, reconstructing the 3D structures in high resolution involves assigning the 3D coordinates of tens of thousands of genomic loci (Figure 1.a). As a representative example, human chromosome 1 has around 44,000 genomic loci at 5kb resolution (Figure 1.b). The simultaneous allocation of such a large number of points is computationally expensive, thus requiring a highly scalable and efficient algorithm. To address these problems, we developed FLAMINGO to reconstruct the high-resolution 3D chromosome structures with superior accuracy and scalability. Remarkably, FLAMINGO successfully reconstructs the chromosome structures for all 23 human chromosomes in 1kb resolution, the highest resolution to date.
The key algorithmic design of FLAMINGO borrows the idea from the low-rank matrix completion, where the matrix under consideration can be fully represented in a low dimensional space. Interestingly, the 2D chromatin contact maps enjoy the low-rank property because the 2D chromatin contact maps are induced by the 3D coordinate matrices and have a rank of no more than five (Figure 1.a). Therefore, the entries of the chromatin contact maps are redundant, and the 3D chromosome structures can be accurately reconstructed from the highly sparse 2D chromatin contact maps, even with 1% of the measured contacts. Based on this idea, we designed an optimization-based framework and further boosted its scalability with parallel computing.
The implementation of the low-rank property significantly boosts the accuracy and scalability of FLAMINGO over existing algorithms. Using various orthogonal experimental observations, we extensively benchmarked the 3D chromosome structures predicted by FLAMINGO and five state-of-art algorithms. Across all comparisons, FLAMINGO demonstrates the highest accuracy. In practice, FLAMINGO speeds up the reconstruction up to 10 times over existing algorithms and completes the 3D reconstruction of chromosome 1 in only 29 minutes.
Surpassing the 2D contact maps, the 3D chromosome structures can help to explain the long-range association between genetic variants and gene expressions (eQTLs) and identify the multi-way chromatin interactions. Since the long-range eQTLs are mainly identified by association-based methods, the underlying mechanisms of SNP-gene interactions remains largely unknown. Given the predicted 3D chromosome structures, we observed that the statistically associated genetic variants are placed to the spatial proximity of target gene promoters in the 3D space by reconstructed chromatin loops. We foresee the 3D characterization of the chromosome configuration can substantially improve the interpretability of eQTLs.
Another important usage of the reconstructed 3D chromosome structure is to identify higher-order multi-way interactions. Compared with 2D contact maps, which can only measure the pairwise interactions between two genomic anchors, the profiling of the 3D structures enables the joint consideration of multiple anchors, leading to the prediction of multi-way chromatin interactions. Compared to 2D contact maps, the predicted 3D structures by FLAMINGO are better supported by experimentally validated multi-way chromatin interactions (SPRITE), suggesting the great potential of FLAMINGO in understanding the complex high-order chromatin conformations.
To further enlarge the usability of FLAMINGO, we developed iFLAMINGO, an integrative version of FLAMINGO, to make cross cell-type predictions of 3D structures and enhance the resolution of chromatin contact maps by incorporating the cell-type specific epigenomic signals. For cell types with no chromatin contact maps, iFLAMINGO predicts the 3D chromosome structures by utilizing the chromatin contact maps from other relevant cell types and the cell-type specific epigenomic datasets. Based on the systematic evaluation of 30 cross cell-type prediction tasks, iFLAMINGO consistently demonstrates improved performance. Furthermore, iFLAMINGO can improve the resolution of the chromatin contact maps. Taking advantage of the high-resolution epigenomic signals, iFLAMINGO can predict high-resolution chromatin structures from low-resolution chromatin contact maps. Combining both parts, the development of iFLAMINGO expands the capability of analyzing chromatin conformations to understudied cell types.
In summary, FLAMINGO and iFLAMINGO allow comprehensive characterization of the 3D chromosome structures, mechanistic interpretation of the long-range QTLs, and identification of high-order chromatin conformations. The rich resource of predicted 3D chromosome structures and the advanced algorithmic design of FLAMINGO will largely contribute to the understanding complex 3D genome organizations.