Inferring pattern-driving intercellular flows from single-cell and spatial transcriptomics

FlowSig uses gene expression measurements and output from cell–cell communication inference to learn intercellular flows that describe directed dependencies. These dependencies are oriented from inflowing intercellular signals to intracellular GEMs, which could be individual TFs or cellwise enrichment for correlated gene sets, and from GEMs to outflowing intercellular signals (Fig. 1a). We model the intercellular flows using graphical causal models, where nodes represent the flow variables—inflowing signals, GEMs and outflowing signals—and learn a directed graph using conditional independence testing and the unknown target interventional greedy sparsest permutation algorithm (UT-IGSP)24. Considering that one can use statistical conditional independence relations to infer, at best, a set of equivalent directed acyclic graphs (DAGs) with the same undirected skeleton graph and directed v-structures (connected node triplets (x, y, z) with the directed edges x→ y← z)25, we use UT-IGSP to learn an initial CPDAG, which can contain both directed arcs and undirected edges. We then construct the intercellular flow network by reorienting undirected edges and removing biologically unrealistic arcs so that edges are directed from inflowing signals to GEMs, between GEMs and from GEMs to outflowing signals.Fig. 1: Description of the FlowSig model.a, We model intercellular flows to be directed from inflowing intercellular signals to GEMs that capture intracellular regulatory responses and that drive outflowing intercellular signals. FlowSig outputs an intercellular flow network describing directed edges from inflow signal variables (receptor gene expression weighted by the average expression of downstream TF gene set, Ri × TFi), to GEMs (cell membership to latent GEM factors, GEMi) to outflow variables (signal ligand gene expression, Li). b, FlowSig uses additional perturbation data and pathway knowledge of immediate downstream TF targets to learn accurate intercellular flows resulting from cell–cell communication. c, From spatial transcriptomics data, we can infer the amount of inflowing signals received at each spatial location more accurately, enabling us to infer intercellular flows without additional perturbation data. FlowSig outputs an intercellular flow network describing directed edges from inflow signal variables (inferred amount of received signal ligand from COMMOT, rec. Li) to spatially resolved GEMs (membership to GEMs, GEMi), to outflow variable (ligand gene expression, Li).Although the core steps in using FlowSig to analyze non-spatial scRNA-seq and ST data are the same, there are several differences. For non-spatial scRNA-seq data, we must overcome a fundamental issue: it is not possible to directly measure the intercellular signals that each cell received. Therefore, we impose two constraints (Fig. 1b). First, we consider only studies comparing a ‘control’ condition against one or more perturbed conditions, for example, healthy versus diseased. We use the additional information gained from perturbation data through conditional invariance testing to narrow down the set of possible flow graphs, reducing the occurrence of false positive edge discovery. Second, for each ligand–receptor interaction inferred from cell–cell communication inference, we extract downstream TF targets from the OmniPath database26 to measure signal inflow. Receptor gene expression quantifies the potential for a cell to receive an intercellular signal, and downstream TF expression quantifies the extent to which the cell actually received the signal; we define signal inflow as the product of receptor gene expression and the average expression of downstream TF targets.ST technologies are currently in their infancy, so there are relatively fewer control versus perturbed ST studies than scRNA-seq studies. However, we can use communication methods such as COMMOT14 to spatially constrain and measure the amount of inflowing signal more accurately (Fig. 1c). Therefore, FlowSig uses the greedy sparsest algorithm (GSP)27, which does not use perturbation data, to analyze ST data.Synthetic validation of FlowSigWe first benchmarked FlowSig using synthetic data generated from mathematical models of intercellular flows (see ‘Generating synthetic data from model simulations’ in the Supplementary Notes). For simplicity, we modeled GEMs as individual TFs. We considered three scenarios. In the first scenario, we examined unidirectional intercellular flow induced by SHH signaling that generates outflow of BMP4 through FOXF1 (ref. 28), with flows learned over a set of five nodes: SHH ligand, unbound PTCH1 receptor, SHH inflow due to SHH–PTCH1 binding, FOXF1 TF and BMP4 ligand (Fig. 2a). The second scenario involved SHH-induced tissue patterning, characterized by the expression of NKX2.2, OLIG2, PAX6 and IRX3 (ref. 29). Flows were inferred over a set of seven nodes: SHH ligand, unbound PTCH1 receptor, SHH inflow (SHH–PTCH1 complex), NKX2.2 TF, OLIG2 TF, PAX6 TF and IRX3 TF (Fig. 2b). In the third scenario, we explored competition between SHH and BMP4 in driving dorsoventral patterning30. Flows were learned over a set of nine nodes, including SHH ligand, unbound PTCH1 receptor, inflowing SHH (SHH–PTCH1 complex), BMP4 ligand, unbound BMP1A and BMPR2 receptor, inflowing BMP4 (BMP complex) and three GEM variables, dorsal, intermediate and ventral (Fig. 2c). We wanted to validate two core FlowSig assumptions. The first is that accurate measurement of inflowing signal is needed to infer intercellular flows. For all models, we compared the use of bound ligand–receptor complex as signal inflow to total receptor expression (free receptor plus bound complex), the latter of which is directly measured from scRNA-seq and ST data. The second is that including perturbation data increases the accuracy of intercellular flow inference. We quantified the accuracy of FlowSig by measuring the true positive rate (TPR) and true negative rate (TNR) for each scenario. For all scenarios (Fig. 2d–f), we found that the average TPR does not change if we use bound receptor expression to measure signal inflow, or if perturbation data are introduced. However, measuring inflow using bound receptor increases the average TNR. This is especially true for the models describing SHH-driven patterning and competition between SHH and BMP4, in which flows are more complex and multidirectional (Fig. 2e,f). Incorporating perturbation data through conditional invariance testing reduces the variation in TNR values, both in terms of the interquartile range and outliers, resulting in ‘tighter’ estimates of intercellular flows. These results suggest that FlowSig reduces the number of false positive discoveries inferred from baseline GSP and UT-IGSP algorithms.Fig. 2: Synthetic validation of FlowSig.a–c, Causal diagrams representing activation (green arrow) or inhibition (red arrow) of unidirectional activation from inflowing SHH to outflowing BMP4 (a), SHH-inflow-driven patterning of NKX2.2, OLIG2, PAX6 and IRX3 (b) and competition between SHH inflow and BMP4 inflow to drive dorsoventral (dorsal (D), intermediate (I) and ventral (V)) patterning (c). d–f, The TPR and TNR of FlowSig output for a–c, respectively. We considered the effect of additional perturbation data and the effect of applying our biological flow model to constrain edges. In d–f, plots were generated over 500 simulations. Light blue boxes indicate the cases when total receptor expression (free plus bound receptor) was used as the inflow variable, while dark blue boxes indicate the cases when bound receptor expression was used as the inflow variable. Box plot whisker bounds are: d, minimum (TPR: control only 0.5, 0.5; control + perturbation: 0.5, 0.5; TNR: control only 0.56, 0.64; control + perturbation: 0.62, 0.72), maximum (TPR: control only 0.5, 0.5; control + perturbation: 0.5, 1.0; TNR: control only 0.79, 0.85; control + perturbation: 0.79, 0.85). Horizontal lines are defined by Q1 (TPR: control only 0.5, 0.5; control + perturbation: 0.5, 0.5; TNR: control only 0.56, 0.72; control + perturbation: 0.69, 0.77), median (TPR: control only 0.5, 0.5; control + perturbation: 0.5, 0.5; TNR: control only 0.69, 0.77; control + perturbation: 0.72, 0.77) and Q3 (TPR: control only 0.5, 0.5; control + perturbation: 0.5, 0.5; TNR: control only 0.77, 0.79; control + perturbation: 0.77, 0.79). e, Minimum (TPR: control only 0.18, 0.18; control + perturbation: 0.36, 0.45; TNR: control only 0.56, 0.60; control + perturbation: 0.56, 0.76), maximum (TPR: control only 0.73, 0.73; control + perturbation: 0.73, 0.73; TNR: control only 0.80, 0.92; control + perturbation: 0.76, 0.88). Horizontal lines are defined by Q1 (TPR: control only 0.55, 0.55; control + perturbation: 0.55, 0.55; TNR: control only 0.64, 0.76; control + perturbation: 0.76, 0.84), median (TPR: control only 0.55, 0.55; control + perturbation: 0.55, 0.55; TNR: control only 0.72, 0.84; control + perturbation: 0.76, 0.88) and Q3 (TPR: control only 0.55, 0.55; control + perturbation: 0.55, 0.55; TNR: control only 0.76, 0.88; control + perturbation: 0.76, 0.88). f, Minimum (TPR: control only 0.3, 0.3; control + perturbation: 0.4, 0.4; TNR: control only 0.56, 0.64; control + perturbation: 0.62, 0.72), maximum (TPR: control only 0.8, 0.8; control + perturbation: 0.8, 0.8; TNR: control only 0.79, 0.85; control + perturbation: 0.79, 0.85). Horizontal lines are defined by Q1 (TPR: control only 0.4, 0.4; control + perturbation: 0.4, 0.4; TNR: control only 0.67, 0.72; control + perturbation: 0.69, 0.77), median (TPR: control only 0.5, 0.5; control + perturbation: 0.5, 0.5; TNR: control only 0.69, 0.77; control + perturbation: 0.72, 0.77) and Q3 (TPR: control only 0.6, 0.6; control + perturbation: 0.6, 0.6; TNR: control only 0.77, 0.79; control + perturbation: 0.77, 0.79). Diamonds indicate outliers (less than Q1 – 1.5 × IQR or greater than Q3 + 1.5 × IQR).Benchmarking FlowSig against multicellular representation methodsTo provide additional insight into FlowSig’s capabilities, we benchmarked it against methods that construct multicellular program representations from scRNA-seq and ST data, including DIALOGUE5, scITD20, MOFAcellular22, MOFAtalk22, MultiNicheNet23 and Tensor-cell2cell21. We also compared FlowSig with direct CellChat output (Supplementary Table 1). All methods were benchmarked using an scRNA-seq dataset of stimulated peripheral blood mononuclear cells sampled from people with lupus, which was generated by Kang et al.31. We summarize key points here (see ‘Comparison to other methods’ in the Supplementary Results for a full discussion). We also evaluated FlowSig’s robustness to different inputs constructed by alternative cell–cell communication and GEM construction methods (see ‘Robustness of FlowSig to different input methodologies’ in the Supplementary Results) and found that different cell–cell communication methods can result in different sets of intercellular flows, owing to discrepancies in inferred ligand–receptor interactions; however, FlowSig will infer intercellular flows through GEMs constructed by different methods that are enriched for the same regulatory TFs.Analyzing CellChat output directly suggested there were 6,886 potential inflow-to-outflow relationships. Of these, 3,167 were shared across both conditions, 1,511 were unique to the control condition and 2,208 were unique to the stimulated condition. From CellChat results alone, we cannot infer which of these relations are truly intercellular flows, that is, whether the second interaction depends on the first, and we cannot infer the intracellular mediators of these intercellular flows. By contrast, FlowSig inferred only 44 intercellular flows across 6 signal inflow variables, 20 GEMs and 12 signal outflow variables (see ‘Comparison to other methods’ in the Supplementary Results and Supplementary Fig. 1).DIALOGUE identified four multicellular programs (MCPs) from the Kang et al. dataset. MCP1 was enriched across CD14+ monocytes, CD8+ T cells and B cells, suggesting that there was coordination through intercellular flows between these cell types (Supplementary Fig. 2a). In MCP4, CD8+ T and CD14+ cells exhibited significant differential expression between conditions (Supplementary Fig. 2b). DIALOGUE identified upregulation of the signal ligand CCL4 (in CD8+ T cells), which FlowSig inferred to drive signal outflow. scITD decomposed the dataset into two latent factors (Supplementary Fig. 3a): Factor 1 was significantly enriched for FlowSig signal outflow ligands CXCL10, CXCL11 and TNFSF10 (Supplementary Fig. 3b) and intercellular-flow-driving interactions (Supplementary Fig. 3c). MOFAcellular decomposed the dataset into five factors (Supplementary Fig. 4a): Factor 1 was enriched for signal outflow variables CXCL11 and TNFSF10 (Supplementary Fig. 4b). Applying MOFATalk to the ligand–receptor interaction scores inferred from LIANA32 yielded four factors (Supplementary Fig. 4c): Factor 1 was enriched for the interactions CCL2–CCR1 and CCL8–CCR1 (between CD14+ cells, dendritic cells (DCs) and FGR3+ cells) and signal outflow of TNFSF13B (Supplementary Fig. 4d). Tensor-cell2cell extracted six factors from ligand–receptor interaction scores inferred from LIANA (Supplementary Fig. 5a): CD14+ cells, DCs and FGR3+ cells were identified as key signal receiver groups (Supplementary Fig. 5b). Clustering ligand–receptor interactions identified that CCL2–CCR1, CCL3–CCR1, CCL4–CCR1 and CCL8–CCR1 were upregulated after stimulation (Supplementary Fig. 5c). Finally, MultiNicheNet identified CCL2–CCR1, CCL3–CCR1, CCL4–CCR1 and CCL8–CCR1 as differentially expressed between conditions (Supplementary Fig. 6a). MultiNicheNet also identified outflow of CXCL10, CXCL11 and FASLG and inflow into CCR1 (Supplementary Fig. 6b).Validating FlowSig using a cortical organoid systemWe tested FlowSig against new scRNA-seq data generated from an organoid model of cortical development, for which fibroblast growth factor (FGF) and bone morphogenetic protein (BMP) signaling are known to drive patterning33. We generated cortical organoids from human embryonic stem cells and collected the organoids at day 18 (D18) and D35 in culture for scRNA-seq analysis. In the organoid system, the cell fate for cortical identity is determined by D18, and signal responses to FGF and BMP, as measured by graded TF expression, are established by D35. The continual exposure of FGF and BMP signaling drives drastic changes in gene expression, and thus between D18 and D35 there are transcriptional changes and changes in cell type composition as the organoids mature. Hence, when applying FlowSig to this dataset, rather than assume the D18 and D35 populations are sampled from the same underlying ‘steady state’ distributions of gene transcription, we treat the D35 data as a ‘perturbed’ form of the ‘control’ D18 data due to exposure to FGF and BMP signaling.We identified differentially flowing signals from the 77 unique ligand–receptor interactions identified by CellChat34 analysis. FlowSig identified 26 differentially inflowing signals (Fig. 3a) and 16 differentially outflowing signals (Fig. 3b), including FGF and BMP (see ‘Identifying differentially flowing signal variables’ in the Methods). We used pyLIGER35 to construct 20 GEMs from 2,793 highly variable genes (Fig. 3c and Supplementary Fig. 8a–c). Cells from the D18 timepoint were more enriched for GEM-2 through GEM-4, GEM-7, GEM-10, GEM-18 and GEM-19, whereas cells from the D35 timepoint were enriched for GEM-8, GEM-11, GEM-12, GEM-16 and GEM-20. Altogether, FlowSig constructed 62 variables for intercellular flow inference. After inference, we aggregated signal inflow variables by their parent signaling pathway. For example, we classified both FGFR1 and FGFR3 inflows under the FGF signaling pathway, which were activated by received FGF2 ligand.Fig. 3: Experimental validation of FlowSig using a cortical organoid model.a, Differentially inflowing signals between D18 and D35 timepoints. b, Differentially outflowing signals between D18 and D35 timepoints. c, Constructed gene expression modules capture time-specific and time-shared gene expression patterns. Some modules have been highlighted by the timepoint for which they are more enriched. d, Inferred intercellular flows due to FGF signaling. We speculated that EOMES is a key regulatory TF downstream of FGFR1 inflow. e, Inferred intercellular flows due to BMP signaling. We speculated that NR2F1 (CoupTF1) and PAX6 are downstream TF targets of BMP inflow. f, Measurement of the FC of EOMES gene expression using RT–qPCR following bath application of FGF, with four biological replicates and two technical replicates. Unpaired two-tailed t-tests were performed (t = 3.135, d.f. = 14, P = 0.0073). g, Measurement of the FC of PAX6 and NR2F1 (CoupTF1) gene expression using RT–qPCR following bath application of BMP, with two biological replicates and two technical replicates each. One-way ANOVA using Tukey’s multiple-comparisons test was used to calculate adjusted P values. For PAX6, control versus 10 ng ml–1: mean diff. 0.50 with 95% confidence interval (CI) (0.40, 0.61), adjusted P < 1 × 10−4. For PAX6, control versus 50 ng ml–1: mean diff. 0.57 with 95% CI (0.47, 0.68), adjusted P < 1 × 10−4. For NR2F1, control versus 10 ng ml–1: mean diff. −0.91 with 95% CI (−1.88, 0.06), adjusted P = 0.066. For NR2F1 control versus 50 ng ml–1: mean diff. −1.43 with 95% CI (−2.40, −0.45), adjusted P = 0.0068. In f, *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001, ****P ≤ 0.0001. In g, *adjusted P ≤ 0.05, **adjusted P ≤ 0.01, ***adjusted P ≤ 0.001, ****adjusted P ≤ 0.0001. Error bars represent s.d.To determine the dominant drivers of intercellular flow, we ranked signal inflow variables by their total edge frequency. We found that FGF, midkine (MK), pleiotrophin (PTN) and neuregulin (NRG) were drivers of intercellular flow. FGF inflow, in particular, drove signal outflow, including BMP4, insulin-like growth factor-II (IGF-II), nerve growth factor (NGF), NRG1 and NRG3, through numerous GEMs (Fig. 3d). By examining the top GEM-specific TFs mediated by FGF-induced flow (see ‘Interpreting gene expression modules’ in the Methods), we found that EOMES could be a potential regulatory candidate of FGF inflow. We observed that BMP inflow was regulated through many fewer GEMs (Fig. 3e) and could be mediated by PAX6 or NR2F1.To verify FlowSig analysis, we analyzed a perturbed organoid culture in which we activated the FGF and BMP signaling pathways by adding FGF8b and BMP4, respectively, between D15 and D21. We collected organoid samples at D35 and subjected them to quantitative reverse transcription PCR (RT–qPCR) for gene expression analysis (Fig. 3f,g). Compared with the non-exposed control organoids, we observed that activating FGF signaling significantly downregulated the expression of EOMES (Fig. 3f), whereas elevating BMP signaling simultaneously downregulated PAX6 and upregulated NR2F1 (Fig. 3g). These experimental data demonstrate that FlowSig accurately captures the dominant drivers of intercellular flows from real biological datasets.FlowSig identifies changes in intercellular flows due to stimulationTo demonstrate how FlowSig recovers intercellular flows driven by an external perturbation, we analyzed scRNA-seq data of human pancreatic islets stimulated by interferon-γ (IFN-γ)36. We constructed ten GEMs using pyLIGER that aligned with the five cell-type clusters, alpha, beta 1 to 3, and delta, that we identified independently (Fig. 4a and Supplementary Fig. 9a–c). We used these cell type annotations as input for preliminary CellChat analysis; that is, for each condition, CellChat infers significant pairwise ligand–receptor interactions between the cell groups defined by these cell-type labels.Fig. 4: Application of FlowSig to perturbed non-spatial scRNA-seq of pancreatic islets.a, Constructed GEM modules align primarily with cell types identified from clustering. b,c, Differentially inflowing (b) and outflowing (c) signals due to IFN-γ stimulation. d, Identified global intercellular flow network inferred by FlowSig, capturing condition-shared and condition-specific flows. e,f, Intercellular flows regulating outflowing signals upregulated (e) and downregulated (f) by IFN-γ stimulation.IFN-γ stimulation increased inflow of the FGF signaling pathway through FGFR1 (through ligands FGF7 and FGF9, specifically), interleukin-6 (IL-6) through IL-6R and IL-6ST, MIF through CD74 and CD44, MDK through NCL and SST through SSTR2 (Fig. 4b). IFN-γ stimulation increased outflow of GCG, INHBA and NAMPT, and decreased outflow of ANGPTL2, SPP1, transforming growth factor β1 (TGFβ1), tumor necrosis superfactor family member 12 (TNFSF12) and UCN3 (Fig. 4c). FlowSig identified that FGF, IL-6, MDK and SST were the dominant drivers of intercellular flows that drove the outflow of GCG, INHBA, NAMPT, SPP1, TGFB1, TNFSF12 and UCN3 through GEM-1, GEM-3, GEM-5 and GEM-6 (Fig. 4d). We observed that GEM-1 is enriched in both the alpha and beta 1 clusters, GEM-3 and GEM-5 are enriched in the alpha cluster, GEM-4 is enriched in the beta 2 cluster and GEM-6 is enriched in the beta 1 cluster (Fig. 4a), suggesting that intercellular flows are driven by cell type. These results agree with previous work establishing that, in the pancreas, alpha cells are the main secretors of GCG and beta cells are the main secretors of UCN3, and that SST regulates both GCG and UCN3 (ref. 37). We observed that the same TFs contributed to all of these GEMs—ID1, NR1D1, TFF3 and ZNF419—suggesting that these TFs mediate intercellular flows across both conditions.To further explore the effects of IFN-γ stimulation, we split the global intercellular flow network into two networks. First, we constructed a network corresponding to outflow signals upregulated by IFN-γ stimulation by taking outflowing signals that were differentially expressed for the IFN-γ condition (adjusted P < 0.05 and log2(fold change (FC)) > 0.5), the GEMs that connected to these outflow variables and the signal inflows nodes connected to these GEMs. From this node set, we then extracted the subgraph from the global intercellular flow network (Fig. 4e). The second network corresponded to the intercellular flow network of outflowing signals downregulated by IFN-γ (adjusted P < 0.05 and log2(FC) < –0.5) and was constructed in a similar manner (Fig. 4f). Both networks contain the same signal inflow nodes and share near-identical GEMs. However, GEM-3, which drives GCG and NAMPT outflow and is itself regulated by SSTR2 (SST) signaling, is present only in the ‘upregulated’ network, suggesting that it has a specialized role activated by IFN-γ stimulation. GEM-3 is primarily enriched within alpha cells, suggesting that stimulation drives outflow of GCG and NAMPT from alpha cells. All other inflowing signals and GEMs are shared across both conditions, suggestive of dual regulatory roles. For example, IL-6 signaling drives both upregulation of INHBA and NAMPT and downregulation of SPP1, TGFB1 and UCN3 (through GEM-4).FlowSig uses multiple perturbations to find disease-driven changesTo demonstrate that FlowSig can handle multiple perturbations, we analyzed scRNA-seq of human bronchoalveolar lavage fluid (BALF) cells sampled from healthy controls and from people with either moderate or severe COVID-19 (ref. 38). We used CellChat and the cell-type annotations from the original study to infer significant ligand–receptor interactions, and found 46, 55 and 54 active signaling pathways for healthy controls and the moderate and severe COVID-19 groups, respectively.We constructed 20 GEMs using pyLIGER (Supplementary Fig. 10a b) that captured differences across both condition (Fig. 5a) and cell type (Fig. 5b). FlowSig identified differentially inflowing and outflowing signals specific to each COVID-19 condition with respect to healthy controls (Fig. 5c and Supplementary Fig. 10c,d). We note the differential expression of many inflammatory CC chemokines (CCLs) in severe COVID-19, including CCL2, CCL3, CCL8, CCL3L1 and CCL7, and CXC chemokines such as CXCL2 and CXCL8 (Supplementary Fig. 10d). In moderate COVID-19, we observed differential outflow of fewer inflammatory cytokines, including CCL5 and CCL23.Fig. 5: Application of FlowSig to scRNA-seq of human BALF sampled from people with moderate or severe COVID-19.a,b, Constructed GEM modules align with both COVID-19 conditions (a) and cell types identified from the original study (b). NK, natural killer; mDC, myeloid dendritic cell; pDC, plasmacytoid dendritic cell. c, Differential expression analysis identifies distinct sets of outflowing signal ligands for each condition. d–f, Identified intercellular flows driving outflow signals differentially expressed in healthy controls (d) and individuals with moderate (e) or severe (f) COVID-19. g,h, Upset plots quantifying the inferred inflow signals (g) and mediating GEMs (h) that are shared across COVID-19 conditions. The vertical bars represent the sizes of the intersection sets (number of ligand-receptor interactions and GEMs, respectively), and the horizontal bars represent the number of inflow signals (g) and GEMs (h) in each condition. In (g), some intersection sets are annotated with the constituent ligand-receptor interactions, of the form L–R, where L is the ligand and R is the receptor. In the case where multiple ligands competitively bind to the same receptor, different ligands are separated by a slash (/). For example, the interaction L1/L2–R denotes that L1 and L2 are the ligands that can separately bind to the receptor, R.To analyze the intercellular flows driving these differential outflows, for each set of differentially outflowing signals, we extracted the upstream inflowing signals for which there was a directed path to at least one of the outflowing signals and the corresponding GEMs from the inferred FlowSig network (Fig. 5d–f). Despite the number of differentially outflowing signals increasing with COVID-19 severity, the number of inferred signal inflows decreased from 37 to 32 (loss of AXL, CD4, F2RL1, ITGAX and ITGB2, TNFRSF12A and TNFRSF14; gain of CAP1) to 25 (loss of CD27, CXCR3, FPR1, IL-6R and IL-6ST, LTBR, NCL, NRP2 and PLXNA2, SDC1, TNFRSF13B, TNFRSF17 and TNFRSF25; gain of AXL, CD4, F2RL1 and TNFRSF14). GEMs showed a similar trend: the number of regulatory GEMs decreased from 16 to 13 between healthy and moderate COVID-19 (loss of GEM-4, GEM-10, GEM-12 and GEM-14; gain of GEM-7). The results from Figure 5a,b suggest that the shift from healthy to moderate COVID-19 is associated with a downregulation in intercellular flows through epithelial cells (GEM-4), plasma and T cells (GEM-10) and macrophages and neutrophils (GEM-12), but an upregulation of intercellular flows through mast cells (GEM-7). From moderate to severe COVID-19, there was a decrease from 13 to 8 (loss of GEM-1, GEM-2, GEM-5, GEM-11, GEM-13, GEM-18 and GEM-19; gain of GEM-12 and GEM-14).We also calculated the intersections between the signal inflow sets (Fig. 5g) and GEM sets (Fig. 5h) driving signal outflows. We observed that 20 out of 37 signal inflows are shared across all three conditions. There were no signal inflows unique to either moderate or severe COVID-19 alone, whereas inflow through TNFRSF12A (due to TNFSF12) and ITGAX and ITGB2 (due to C3) drive outflows in only healthy controls. Only inflow through CAP (from RETN1) is shared between moderate and severe COVID-19 but is absent in healthy controls. There were more signal inflows shared between the healthy and moderate COVID-19 groups than between the healthy and severe COVID-19 groups or between the moderate and severe COVID-19 groups. We observed a similar trend amongst inferred regulatory GEMs. The most shared GEMs were between only the healthy and moderate COVID-19 groups (7 out of 17) and across all three conditions (5 out of 17). GEM-4 and GEM-10, which are associated with epithelial cells and T cells, respectively, mediated signal outflows in only healthy individuals. Only GEM-7, which is associated with mast cells, was shared between the moderate and severe COVID-19 groups but not healthy controls. No GEMs that regulate the differential outflows in severe COVID-19 were unique to the severe COVID-19 group. These results demonstrate how FlowSig can use multiple perturbations to identify trends in intercellular flows. Here, FlowSig identified that increasing severity of COVID-19 is associated with (1) a gradual loss of regulatory intercellular inflows and (2) an increase of inflammatory chemokine outflow that is driven by macrophages and neutrophils.FlowSig identifies regulators of spatial intercellular flowWe applied FlowSig to spatial Stereo-seq data of mouse embryogenesis sampled at stage E9.5 of embryogenesis39. We used non-negative spatial factorization4 to construct 20 spatially resolved GEMs from 712 spatially variable genes (Fig. 6a and Supplementary Fig. 11a). We identified Shh outflow to be highly spatially variable (Moran’s I = 0.37; adjusted P = 0.014; Supplementary Fig. 11b), and inferred Shh inflow across the tissue (Supplementary Fig. 11c), in line with Shh’s importance in development40. FlowSig identified several upstream drivers of Shh outflow, including Bmp4, Cxcl12, Fgf15, Mdk, Ptn and Wnt5a, which regulate Shh outflow through GEM-2, GEM-5, GEM-11 and GEM-14 (Fig. 6b) and inferred that received Shh inflow (denoted for brevity as r-Shh) drives outflow of several signal ligands through GEM-2, GEM-5, GEM-9, GEM-11, GEM-12, GEM-14, GEM-15 and GEM-17 (Fig. 6c).Fig. 6: Application of FlowSig to spatial Stereo-seq data of E9.5 mouse embryo.a, Twenty identified spatial GEMs. Spatial spots are labeled by the GEM to which the spot membership is highest. b, Inflowing signals that drive Shh outflow. c, Identified downstream outflowing signals that are driven by inflowing Shh (r-Shh). d, Top upstream TFs ranked by their regulatory effects on outflowing Shh, as measured by random forest feature (Gini) importance. Feature importances were calculated over 10 runs. Data are shown as mean ± s.d. e, Potential downstream TFs regulated by inflowing r-Shh, ranked by Spearman correlation. The heatmap shows the scaled (z-transformed) values of the fitted gene expression values as a function of r-Shh. f, The top upstream TFs of outflowing Wnt5a that are also downstream targets of inflowing r-Shh. Feature importance was averaged over 10 runs. Data are shown as mean ± s.d. g, Potential downstream TFs regulated by Wnt5a that are also upstream regulators of outflowing Shh. h, Suggested activator–inhibitor feedback between Shh and Wnt5a, as implicated by d–g.We used these spatially resolved measurements to infer both specific upstream regulators of Shh outflow and downstream targets of r-Shh inflow. For each GEM, we extracted the top 10 TFs by module membership (see ‘Interpreting gene expression modules’ in the Methods). We identified potential upstream TFs of Shh outflow using random forest models41, where we ranked TFs by feature (Gini) importance relative to all potential upstream TFs of Shh (see ‘Inferring upstream TF regulators of spatial signals’ in the Methods; Fig. 6d). We identified Foxa2, Foxp2, Myc, Zc3h7a and Foxa1 as the top five upstream regulatory TFs of Shh outflow. Of these, Foxa1 and Foxa2 have been established to regulate Shh42, as has Foxp2 (ref. 43). Although Myc has been established to be regulated downstream of Shh signaling44,45, its role as an upstream regulator is less clear.To identify downstream targets of r-Shh inflow, we used pyGAM46 (cubic splines, a gamma error distribution, and log link) to fit expression of the top 10 TFs of each inferred downstream GEM as a function of r-Shh inflow. We ranked TFs by the Spearman correlation between predicted TF expression and r-Shh itself (Fig. 6e). The downstream TFs that correlated least with r-Shh included known downstream targets Barhl1 (ref. 47) and Nkx2-1 (ref. 48), as well as Meox1, Tcf21 and Foxp2, whereas the TFs that were most correlated included known targets like Foxe1 (ref. 49) and Nkx2-2 (ref. 50), as well as Pou3f1, Tlx2 and Nkx2-4. We observe that Foxa2 is implicated both upstream and downstream of Shh outflow and inflow, respectively, suggesting that Foxa2 could drive self-production of Shh.We observed potential bidirectional flows between Shh and Bmp4, Cxcl12, Igf2, Mdk and Wnt5a. To validate these flows further, we performed the following analysis. For each ligand, we extracted the top GEM-specific TFs that were both upstream of the ligand and downstream of r-Shh. We used random forest modeling to calculate feature importance for each TF to ligand outflow. Only Wnt5a was significantly regulated by TFs that were also downstream targets of r-Shh inflow through GEM-5 (Fig. 6f). Furthermore, outflowing Wnt5a and inflowing r-Wnt5a were spatially colocalized with inflowing r-Shh and outflowing Shh (Supplementary Fig. 11d,e). Foxa2, Nkx6-1 and Sox21 were the top upstream regulators of Wnt5a through GEM-5, in which Foxa2 is known to regulate Wnt5a51. To infer whether inflowing r-Wnt5a regulated Shh outflow, we used pyGAM to fit the top TFs of GEM-11 as functions of r-Wnt5a inflow and ranked them by Spearman correlation of the predicted values with r-Wnt5a (Fig. 6g). We observed that Myc, one of the top upstream regulators of Shh outflow, negatively correlated with r-Wnt5a inflow.These observations suggested the following bidirectional flow between Shh and Wnt5a (Fig. 6h). First, outflow and diffusion of Shh drives inflow of r-Shh, self-amplifying Shh outflow through Foxa2. Inflow of r-Shh also drives Wnt5a outflow through Foxa2, Nkx6-1 and/or Sox21. Inflow of r-Wnt5a through spatial diffusion downregulates Shh outflow through Myc. This module resembles an activator–inhibitor system that can generate potential Turing patterns52, with three key features. First, one or both signals can propagate—here, both Shh and Wnt5a ligands diffuse. Second, one of the signals—Shh—upregulates both itself through Foxa2 and the other signal, Wnt5a through Foxa1, Nkx6-1 and Sox21. Third, the other signal, Wnt5a, inhibits the activating signal, Shh. We found that Wnt5a inhibits Shh by downregulating Myc, an upstream regulator of Shh. It has been shown that activator–inhibitor systems can generate Turing patterns, which are defined by their complex spatial variation and are known to drive cell fate patterning in development53,54,55, suggesting that at E9.5, Shh and Wnt5a play similar roles.

Inferring pattern-driving intercellular flows from single-cell and spatial transcriptomics

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Hot Topics

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Popular Articles

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models