Tensor-cell2cell: Unraveling the complex cell-cell communication patterns driving phenotype

We recently introduced Tensor-cell2cell, an unsupervised method based on tensor decomposition. It uncovers context-driven patterns of intercellular communication associated with different phenotypic states. In this post, we explain how we devised our approach and discuss future directions.
Tensor-cell2cell: Unraveling the complex cell-cell communication patterns driving phenotype
Like

With the adoption of single-cell technologies, the field of studying cell-cell communication from gene expression has been rapidly growing in the last few years. It was around 4 years ago when we started to explore ideas about this emerging field in the Lewis Lab at UC San Diego, leading us to write a review article summarizing the main approaches for inferring intercellular communication1. We began bouncing around the idea of how to model changes in cell-cell interactions from time-series datasets. Here is where a collaborative-oriented environment and adapting ideas from disparate fields played a crucial role; at the student seminars during the second year of our PhD program, Cameron Martino presented his project in the Knight Lab using tensor factorization to obtain dynamics of microbial composition across different time points and subjects2. This sort of approach was exactly what we were looking for to study cell-cell communication across multiple samples to find their relationship across time. We soon realized that this concept could be extended beyond time points to any context variable of interest. Here, existing methods can infer communication for each sample1, which are then arranged into a tensor structure for decomposition and identification of communication patterns across samples. We felt this could be a powerful approach to gain mechanistic insights to multicellular systems, as intercellular communication is not static but forms context-dependent patterns. Thus, we began to implement different tensor decomposition methods to identify the best approach for our purpose. 

A natural question that emerges is why tensor decomposition is appropriate for extracting patterns of intercellular communication? One way to answer this relies on how tensor decomposition works. Arranging samples as a multidimensional tensor preserves the correlation-structure across the data better than matrices. Thus, we can decompose a tensor and extract latent patterns in a more robust manner. In particular, the CANDECOMP/PARAFAC (CP) decomposition method used by Tensor-cell2cell summarizes a tensor into a determined number of factors (R factors as in Fig. 1a), each of which contains key patterns representing properties of the data. In other words, the tensor decomposition approximates the original tensor through a compressed tensor by adding up the factors (Fig. 1b), thus summarizing the most prominent features in the dataset in an easily interpretable manner. To simplify this idea, we can use the analogy of a picture as represented by a tensor (Fig. 1c). If we decompose the picture into three factors using a non-negative CP method, we get three tensors of rank-1, each of which summarizes a distinctive part of the original picture (Fig. 1d). When added together, these three parts are able to reconstruct the whole picture in an approximated manner that captures the most prominent components of the original picture (Fig. 1e). Tensor-cell2cell analogously arranges communication scores of multiple ligand-receptor and cell pairs across different contexts into a tensor, and extracts factors that each represent a distinct communication pattern. Each factor encompasses a part of the communication tensor, representing different communication mechanisms, biological processes, or signaling pathways involving few or multiple cells and mediators (e.g. during interleukin secretion, antigen presentation, and immune-response regulation, as in Fig. 1f), while simultaneously accounting for the weight of the contribution of these parts in each of the contexts. In this sense, each factor captures the combination of ligand-receptor and cell-cell pair interactions across contexts that represent one distinct module of communication.

Fig. 1. How the tensor decomposition used by Tensor-cell2cell works. (a) An original tensor can be approximated by a sum of multiple other tensors, with the mathematical property of being of rank-1. (b) When adding up these rank-1 tensors, a compressed tensor is built, which approximately reconstructs the original tensor. (c) A color pictured can be represented as a 3D-tensor. The first two dimensions are the XY plane containing the pixels, and the third dimension contains the values of each color channel (usually values between 0 and 255 for the red, green, and blue channels). (d) This picture could be decomposed into multiple factors, each representing distinctive parts of the picture. Elements captured in one factor could be also present in another, as occurs with the pillar structures in factors 1 and 2. Similar behaviors should be expected when running Tensor-cell2cell. (e) When summing up these parts, an approximated reconstruction of the original picture is recovered. Here, the compressed tensor contains main properties of the original data, but it also loses some details (decreasing the resolution of the approximated picture); an effect that can be compensated by increasing the number of factors in the decomposition. (f) Analogously, a communication tensor carrying immune cells in a sample and their ligand-receptor interactions can be built and decomposed into multiple factors. These factors are expected to represent different biological processes.

This idea was originally inspired by a desire to capture communicatory dynamics, but we designed Tensor-cell2cell to be as flexible as possible in the type of questions it could be applied for. We first assessed the potential for the method to work by simulating a tensor embedded with four distinct temporal patterns and assessing whether decomposition could recover those latent signals at increasing levels of noise (Fig. 2). Once we saw that tensor decomposition could capture simulated patterns, we moved on to real-world datasets. At this point, it was late 2020 and the COVID-19 pandemic was ongoing. There was a major world interest in understanding the infection to improve treatment, and thus, an unprecedented surge of publications regarding COVID-19. From single-cell transcriptomic data of patients with different severities to experimental validations of immune mechanisms, we had a wealth of information to assess our method and the opportunity to contribute new findings regarding immune cell communication in the context of SARS-CoV-2 infection. We found literature-supported cell-cell communication patterns that are correlated with disease severity, and other patterns that distinguished one severity group from the others, providing molecular insights on how communication of immune cells are associated with patient phenotypes. Our lab also has multiple ongoing projects regarding Autism Spectrum Disorder (ASD), so we also assessed an ASD dataset3. In this case, we focused on how Tensor-cell2cell’s outputs can be leveraged to perform downstream analyses, i.e. enrichment, multi-factor communication networks, etc., that extend and facilitate the interpretation of the results. Downstream analyses demonstrated that a combination of multiple dysregulated cell-cell communication patterns distinguishes subjects with and without ASD, and affects key signaling pathways in the brain, potentially shaping the neuronal circuit.

Fig. 2. Simulated communication tensor and its decomposition. Simulation of a system of three interacting cells using multiple ligand-receptor interactions. Unique combinations of cell-cell and ligand-receptor pairs were assigned one of four communication patterns across time, and used to build a 4D-communication tensor with time as the context dimension. Using Tensor-cell2cell, all simulated patterns, involving specific cell-cell pairs and their ligand-receptor mediators, were properly recovered as reflected in the loadings of each factor.

While we have observed that Tensor-cell2cell is quite robust in identifying biologically relevant communication patterns, we envision a number of future developments that can improve its capabilities. The majority of single-cell RNA-sequencing methods have been geared towards understanding the intricacies of one sample. More recently, multi-sample datasets representing two or more contexts have emerged, and methods to analyze these contexts are being developed. While these methods initially focused on pre-processing, e.g. appropriate batch correction, the field is now beginning to develop downstream analyses to understand the effect of specific contexts4. As these methods become more amenable to assessing multiple contexts beyond pairwise comparisons, they will enhance the analytical capabilities of Tensor-cell2cell. One such example is in compositional analysis across multiple contexts5,6. Differences in cell population will affect cellular communication via ligand-receptor binding dynamics, particularly in local microenvironments and when communication scoring is modeled by physical laws7. For example, two cell types may express the same average levels of a receptor, but if one cell type is much more abundant than the other, it could competitively sequester the ligand, decreasing the total number of binding events with the other cell type. Thus, considering compositional changes across context when running Tensor-cell2cell may help improve the accuracy of the extracted patterns. 

Other facets of Tensor-cell2cell can similarly be improved by merging it with computational and technological improvements. One such example is in how the algorithm handles sparsity. One must consider how to deal with elements that are not present across all contexts. In its current implementation, Tensor-cell2cell will simply take the intersection of all LR pairs and cell types across all contexts, dropping those that are not uniformly present. If one takes the union instead, there is an interesting question of what values to give those elements that are not present in certain contexts. It may be reasonable to assume that an entire missing cell type represents a true, biological zero whereas a missing ligand or receptor may instead represent sequencing dropouts. Technologically, as sequencing depth improves, one can imagine dropouts becoming less of a problem. Computationally, we can improve how these missing elements are handled by implementing decomposition algorithms that can handle varying levels of sparsity. Regardless, Tensor-cell2cell will assign a loading to all elements, and in this sense, could potentially serve as an imputation method for missing values. Similarly, Tensor-cell2cell could be improved by making the decomposition aware of the relationships between different elements across contexts (e.g., cell type lineages over time).

References

  1. Armingol, E., Officer, A., Harismendy, O. & Lewis, N. E. Deciphering cell-cell interactions and communication from gene expression. Nat. Rev. Genet. 22, 71–88 (2021).
  2. Martino, C. et al. Context-aware dimensionality reduction deconvolutes gut microbial community dynamics. Nat. Biotechnol. 39, 165–168 (2021).
  3. Velmeshev, D. et al. Single-cell genomics identifies cell type-specific molecular changes in autism. Science 364, 685–689 (2019).
  4. Petukhov, V. et al. Case-control analysis of single-cell RNA-seq studies. bioRxiv 2022.03.15.484475 (2022) doi:10.1101/2022.03.15.484475.
  5. Reshef, Y. A. et al. Co-varying neighborhood analysis identifies cell populations associated with phenotypes of interest from single-cell transcriptomics. Nat. Biotechnol. 40, 355–363 (2022).
  6. Burkhardt, D. B. et al. Quantifying the effect of experimental perturbations at single-cell resolution. Nat. Biotechnol. 39, 619–629 (2021).
  7. Jin, S. et al. Inference and analysis of cell-cell communication using CellChat. Nat. Commun. 12, 1088 (2021).