Automated spatial omics landscape analysis approach reveals novel tissue architectures in ulcerative colitis

Data set acquisitionCyTOF and segmented and fluorescence-assigned CODEX datasets were acquired from the original authors23,24. The scRNAseq dataset was downloaded from GEO26.CODEX follicle extractionFollicle ROIs were identified in FIJI based on CD19 + staining aggregation in the original images. Cell centers within each ROI were extracted in R, using the original .csv fluorescent intensity text file as well as the follicle-determined spline exported from FIJI to rapidly determine which cells were within the ROI and which were outside the ROI.CODEX data pre-processingManual inspection of fluorescent image channels revealed that imaging “artifacts” were derived from two primary sources: eosinophils, which exhibited broad cytoplasmic binding to the majority of antibodies but membrane-staining for CD15 and CD66 and smaller, punctate sources that persisted across the majority of channels. True (non-eosinophil) artifacts were identified based on broad, high-level expression of many markers except CD15 and CD66; cell center positions were retained but all marker intensities were set to 0. Due to high expression of many markers including membrane-stain signatures for CD15 and CD66, eosinophils were identified then temporarily removed from analysis. Residual artifact impacts, likely derived from segmentation, were identified by selecting a representative channel with minimal signal other than artifact, determining average signal within a non-artifact-impacted stripe, flagging all cells with signal higher than three times average background, then setting all marker intensities in flagged cells to 0.SADIE analysisPer fluorescent channel and using a threshold of 0.33, all cells with fluorescent signal exceeding 0.33*max were set to a value of 1 (marker-positive); all others were set to a value of 0 (marker-negative). The threshold of 0.33 was selected due to the need to remove residual signal bleed from image artifact, as well as the multimodal intensity distribution of CD45RA. SADIE analysis was then performed once for each fluorescent channel, coupled with the cell center-associated x and y coordinates, using the epiphy package implementation of SADIE and the following parameters: index = “Perry”, nperm = 100, method = “shortsimplex”, verbose = TRUE.CODEX feature selectionAfter SADIE analysis comparing the spatial distribution of marker-positive cells to 100 random distributions using the same number of marker + cells17 to generate a p-value (Pa), a core set of canonical cell markers was supplemented by non-extracellular matrix markers with a Pa value below 0.05 (Supplemental Table S2).CODEX cell clusteringInitial cell clustering was performed in R, using the built-in k-means function. The initial number of clusters was determined using the fviz_nbclust implementation of elbow plots in factoextra36. After normalization, architecture-associated features that were marker-positive in fewer than 25 cells were given additional weight. Clusters were then displayed on the original multiplexed fluorescent image for validation using code developed by Goltsev et al.8, with parameters tuned accordingly.CODEX cell annotation refinementCell annotation refinement was performed using in-house software developed by M. Ferenc using tensorflow37.We built meaningful vector spaces with which we could observe well-separated clusters. Briefly, data were encoded using a one-hot approach for input into the MLP model. Principal Component Analysis, tSNE, and UMAP were used to visualize the new vector space for cluster discrimination. Data clusters were re-annotated based on the UMAP visualization.Initial annotations for clustering were compiled into a single matrix per follicle in which each row is a single cell and the columns correspond to feature-extracted arcsinh-transformed marker fluorescent intensities, initial cluster ID, X, and Y coordinate. For initial data preprocessing, in order to address imbalances in dataset cluster sizes (Supplemental Fig. S6) 150 cells were randomly sampled from each cluster.Model training was performed using a basic MLP (Multi Layer Perceptron) model with an input of 15 neurons corresponding to 13 fluorescent features plus the x and y coordinates, a hidden layer of 16 neurons, and an output layer of 10 neurons corresponding to the 10 desired classes for cellular annotation refinement. We performed fivefold cross validation using stochastic gradient descent with early stopping to avoid model overfitting. Cross-validation scores are shown in (Supplemental Table S4).The internal representation latent space corresponding to our 16-neuron hidden layer that emerged over the training process was then visualized using PCA, UMAP, and tSNE for dimensional reduction from 16 to 3 dimensions. Final annotation refinement was performed based on the UMAP visualization’s clustering.Generation of Voronoi imagesVoronoi diagrams were created using custom code developed in Goltsev et al.8.Identification of CODEX NeighborhoodsFor each cell in the follicle, the 10 nearest cells, including the original cell, were determined based on the annotations from CODEX cell clustering and refinement. The composition of these microenvironments was clustered using X-shift clustering with supervised annotation, using publicly available software from the Nolan lab Github. Neighborhood annotations were then re-displayed in Voronoi images.CODEX secondary feature selection for cross-platform comparisonFeatures for cross-platform comparison were selected based on association with architectural sub-structures. In the event that these features were not present in the CyTOF or scRNA-seq datasets, functionally similar markers were used instead–for example, PCNA instead of Ki67. Where this was not possible, we used markers that demonstrated spatial colocalization, either directly or as a superset.scRNAseq analysisThe R package Seurat38 was used for analysis of scRNA-seq datasets, which had already been subjected to standard pre-processing26. The selection of initial parameters was guided using elbow-plots, with an initial clustering resolution of 1.5, then further tuned based on the visualization of canonical markers on UMAP featureplots. Automated cell annotation was performed using ScType39.CyTOF analysisProcessed CyTOF data were obtained from the original authors and analyzed as previously described24. In brief, FlowJo software was utilized to gate cellular events and calculate statistics according to published conventions, and GraphPad PRISM 9 was utilized for conducting additional statistical tests and plotting figures. P-values were computed by unpaired Student’s T-test.

Hot Topics

Related Articles