Quality control for single-cell analysis of high-plex tissue profiles using CyLinter

CyLinter softwareCyLinter software is written in Python3, version controlled on Git/GitHub (https://github.com/labsyspharm/cylinter)39, validated for Mac, PC, and Linux operating systems and archived on the Anaconda package repository. The tool can be installed at the command line using the Anaconda package installer (see the CyLinter website https://labsyspharm.github.io/cylinter/ for details) and is executed with the following command: cylinter cylinter_config.yml, where cylinter_config.yml is an experiment-specific YAML configuration file. An optional –module flag can be passed before specifying the path to the configuration file to begin the pipeline at a specified module. More details on configuration settings can be found at the CyLinter website and GitHub repository. The tool uses the Napari image viewer for image browsing and annotation tasks. The tool also uses numerical and image-processing routines from multiple Python data science libraries, including pandas, numpy, matplotlib, seaborn, SciPy, scikit-learn and scikit-image. OME-TIFF files are read using tifffile and processed into multi-resolution pyramids using a combination of Zarr and dask routines that allow for rapid panning, zooming and processing of large (hundreds of gigabytes) images. The CyLinter pipeline consists of multiple QC modules, each implemented as a Python function, that perform different visualization, data filtration or analysis tasks. Several modules return redacted versions of the input spatial feature table, while others perform analysis tasks such as cell clustering. CyLinter is freely available for academic reuse under the MIT license. A minimal example dataset consisting of four tissue cores from the EMIT TMA (Dataset 3) used in this study can be downloaded from the Synapse data repository (Synapse ID: syn52859560) by following instructions at the CyLinter website (https://labsyspharm.github.io/cylinter/exemplar/). All CyLinter analyses presented in this work were performed on a commercially available 2019 MacBook Pro equipped with eight 2.4 GHz Intel Core i9 processors (5.0 GHz Turbo Boost) and 32 GB 2,400 MHz DDR4 memory. Imaging data analyzed in this study were stored on and accessed from an external hard drive with 12TB capacity. Implemented software versions were as follows: CyLinter v0.0.46–v0.0.49, Python v3.8–v3.11.t-CyCIFThe CyCIF approach to multiplex imaging involves iterative cycles of antibody incubation with tissue, imaging and fluorophore deactivation as described previously2; protocols and methods related to CyCIF are available on Protocols.io (see ‘Detailed experimental protocols’ section). Briefly, multiplex CyCIF images were collected using a RareCyte CyteFinder II HT Instrument equipped with a 20× (0.75 numerical aperture (NA)) objective and 2 × 2 pixel binning. This setup allowed for the acquisition of four-channel image tiles with dimensions 1,280 × 1,080 pixels and a corresponding pixel size of 0.65 μm per pixel. All four channels are imaged during each round of CyCIF, one of which is always reserved for nuclear counterstain (Hoechst or 4′,6-diamidino-2-phenylindole (DAPI)) to visualize cell nuclei. RCPNL files containing 16-bit imaging data were generated (one per image tile) during each imaging cycle.Image processingRaw microscopy image tiles (RCPNL files) for the datasets described in this study were processed into stitched, registered and segmented OME-TIFF40 files using the MCMICRO image-processing software19. Corresponding cell × feature CSV files (that is, spatial feature tables) were also generated by MCMICRO. Specific algorithms implemented in MCMICRO for the processing of each dataset are as follows: BaSiC41 (v1.0.1)—a Fiji/ImageJ plugin for background and shading correction used to perform flatfield and darkfield image correction; ASHLAR32 (v1.11.1)—a program for seamless mosaic image processing across imaging cycles; Coreograph (v2.2.0)—a program for dearraying TMA images into individual TIFF and CSV files per tissue core (https://github.com/HMS-IDAC/UNetCoreograph); UnMICST33 (v2.4.7)—an implementation of semantic cell segmentation based on the U-Net architecture42; S3segmenter (v1.2.0)—a watershed algorithm used in conjunction with UnMICST (https://github.com/HMS-IDAC/S3segmenter); and MCQuant (v1.3.1)—an algorithm used for per cell feature extraction including X,Y spatial coordinates, segmentation areas, mean marker intensities and nuclear morphology attributes (https://github.com/labsyspharm/quantification).Automated artifact detection in CyLinter with classical algorithmsAn algorithm consisting of classical image analysis steps was designed to automatically identify prevalent artifacts commonly found in highly multiplexed images (for example, illumination aberrations, antibody aggregates and tissue folding). The model is applied on a channel-by-channel basis and works on downsampled versions of each channel, rescaling pixel values to uint8 bit depth for efficient processing. A series of operations in mathematical morphology consisting of erosion and local mean smoothing followed by dilation are applied to transform each downsampled image channel. These three steps utilize a disk kernel, where the kernel size is a user-defined parameter assumed to have a diameter on the order of three to five single cells, conditional on image pixel size. This kernel is then expanded to find local maxima seed points corresponding to putative artifacts. Each artifact is extracted via a flood fill operation according to a specific tolerance parameter that is adjusted in real time by the user. The union of the flood fill regions produces a binary artifact mask that is resized to the original image dimensions; cells falling within mask boundaries are then dropped from the corresponding spatial feature table.Deep learning-based automated artifact detectionThe machine learning artifact detection model implemented in this study derives from the Feature Pyramid Network (FPN)43, a fully convolutional encoder–decoder architecture designed for object detection tasks applicable to semantic image segmentation. The encoder network is implemented using a ResNet34 backbone44 with model parameters initialized from the pretraining weights on ImageNet. Input image tiles of size 2,048 × 2,048 pixels (acquired at a nominal resolution of 0.65 µm per pixel) were downsampled to 256 × 256 pixels and fed into the encoder network to produce low-resolution feature maps. Resulting feature maps were then decoded into feature pyramids through iterated upsampling using a bilinear interpolation and combined with the original feature maps. Each layer of the feature pyramid was upsampled to the same resolution and segmented such that all resulting predicted artifact masks were combined to yield the final composite prediction mask. The FPN architecture is implemented using the Segmentation Models library for image segmentation based on the Python and PyTorch frameworks45. The model was trained using the Adam optimizer with a Dice similarity coefficient loss function and a fixed learning rate (1 × 10−4) using a batch size of 16 image tiles for 10 epochs.Dataset 1 (TOPACIO, CyCIF)The TOPACIO dataset used in this study consists of 25 deidentified FFPE tissue sections (5 μm thick) of TNBC from patients enrolled in the TOPACIO clinical trial (ClinicalTrials.gov Identifier: NCT02657889). Specimens were collected via one of three different biopsy methods: fine needle, punch needle or gross tumor resection and procured from Tesaro and Merck & Co. as part of the recently completed trial. Slides were mounted onto Superfrost Plus glass microscope slides (Fisher Scientific, 12-550-15) then dewaxed and antigen-retrieved using a Leica BOND RX Fully Automated Research Stainer before multiplex data acquisition by CyCIF. The TOPACIO dataset was collected during this study using a CyteFinder slide scanning fluorescence microscope and its built-in image acquisition software (RareCyte). Images were acquired at 20× magnification with 2 × 2 binning (0.65 μm per pixel nominal resolution) over 10 CyCIF cycles using 27 markers (19 plus Hoechst evaluated in this study); see Supplementary Table 1 for further details. The following antibodies were used in the acquisition of this dataset (name, clone, vendor, catalog number, RRID, dilution):Donkey anti-Rat A488 (secondary only), polyclonal, Invitrogen, A21208, AB_2535794, 1:1,000Donkey anti-Rabbit A555 (secondary only), polyclonal, Invitrogen, A31572, AB_162543, 1:1,000Donkey anti-Goat A647 (secondary only), polyclonal, Invitrogen, A21447, AB_2535864, 1:1,000CD3 (secondary conjugated), CD3-12, Abcam, ab11089, AB_2889189, 1:200PDL1 (secondary conjugated), E1L3N, Cell Signaling Technology, 13684S, AB_2687655, 1:20053BP1 (secondary conjugated), polyclonal, Bethyl Laboratories, A303-906A, AB_2620256, 1:200E-Cadherin(A488), 24E10, Cell Signaling Technology, 3199S, AB_2291471, 1:400panCK(e570), AE1/AE3, EBioscience, 41-9003-82, AB_11218704, 1:800PD1(A647), EPR4877(2), abcam, ab201825, AB_2728811, 1:200CD8a(A488), AMC908, EBioscience, 53-0008-82, AB_2574413, 1:200CD45(PE), 2D1, R&D, FAB1430P, AB_2237898, 1:100GrB(A647), 2C5, Santa Cruz, sc-8022AF647, AB_2232723, 1:200CD163(A488), EPR14643-36, Abcam, ab218293, AB_2889155, 1:400CD68(PE), D4B9C, Cell Signaling Technology, 79594S, AB_2799935, 1:200CD20(e660), L26, EBioscience, 50-0202-80, AB_11151691, 1:400CD4(A488), polyclonal, R&D Systems, FAB8165G, AB_2728839, 1:200FOXP3(e570), 236A/E7, EBioscience, 41-4777-82, AB_2573609, 1:100SMA(e660), 1A4, EBioscience, 50-9760-82, AB_2574362, 1:800CD11b(A488), C67F154, EBioscience, 53-0196-82, AB_2637196, 1:150pSTAT1(A555), 58D6, Cell Signaling Technology, 8183S, AB_10860600, 1:200yH2AX(A647), 2F3, BioLegend, 613407, AB_2295046, 1:200CD57(FITC), NK-1, BD, 561906, AB_395986, 1:100Ki67(e570), 20Raj1, EBioscience, 41-5699-82, AB_11220278, 1:100MHCII/HLA-DPB1(A647), EPR11226, Abcam, ab201347, AB_2861375, 1:400STING(A488), EPR13130, Abcam, ab198950, AB_2889208, 1:400pTBK1(A555), D52C2, Cell Signaling Technology, 13498S, AB_2943237, 1:200pSTAT3(A647), D3A7, Cell Signaling Technology, 4324S, AB_10694637, 1:200PCNA(A488), PC10, Cell Signaling Technology, 8580S, AB_2617115, 1:400HLA-A(A555), EP1395Y, Abcam, ab207872, AB_2889202, 1:400cPARP(A647), D64E10, Cell Signaling Technology, 6987S, AB_10699459, 1:100Dataset 2 (CRC, CyCIF)The CRC dataset was previously published15 and consists of a whole-slide section (1.6 cm2) of human colorectal adenocarcinoma tissue (section# 097) from a 69-year-old white male imaged at 20× magnification with 2 × 2 binning (0.65 μm per pixel nominal resolution) over 10 CyCIF cycles using 24 markers across 10 CyCIF cycles (21 plus Hoechst evaluated in the current study) collected as part of the Human Tumor Atlas Network (HTAN) and is available through the HTAN Data Portal (https://data.humantumoratlas.org). See Supplementary Table 1 for further details and associated identifiers.Dataset 3 (EMIT TMA22, CyCIF)The EMIT TMA dataset was previously published19 and consists of human tissue specimens from 42 patients organized as a multi-tissue array (HTMA427) under an excess tissue protocol (clinical discards) approved by the institutional review board (IRB) at Brigham and Women’s Hospital (BWH IRB 2018P001627). Two 1.5-mm-diameter cores were acquired from each of 60 tissue regions with the goal of acquiring one or two examples of as many tumors as possible (with matched normal tissue from the same resection when feasible). Overall, the TMA contains 123 cores including 3 ‘marker cores’ consisting of normal kidney cortex that were added to the TMA in an arrangement that makes it possible to orient the overall TMA image. Not including the marker cores, 44 cores were from males and 76 were from females between 21 and 86 years of age. The EMIT TMA22 dataset was acquired at 20× magnification with 2 × 2 binning (0.65 μm per pixel nominal resolution) over 10 CyCIF cycles using 27 markers (20 plus Hoechst evaluated in the current study) and is available for download from the Synapse data repository (https://www.synapse.org/#!Synapse:syn22345750); see Supplementary Table 1 for further details.Dataset 4 (HNSCC, CODEX)The HNSCC CODEX dataset consists of two sections of the same deidentified specimen of head and neck squamous carcinoma (HNSCC) imaged at 20× magnification with 2 × 2 binning (0.65 μm per pixel nominal resolution) over 9 imaging cycles using 15 markers plus DAPI. These data were collected by the laboratory of Kai Wucherpfennig at Dana-Farber Cancer Institute; see Supplementary Table 1 for further details.Dataset 5 (normal tonsil, mIHC)The mIHC dataset was previously published19 and consists of a deidentified whole-slide tonsil specimen from a 4-year-old female of European ancestry procured from the Cooperative Human Tissue Network (CHTN), Western Division, as part of the HTAN SARDANA Trans-Network Project and imaged at 20× magnification with 2 × 2 binning (0.5 μm per pixel nominal resolution) over 5 mIHC cycles using 18 markers plus Hoechst; see Supplementary Table 1 for further details.Dataset 6 (normal large intestine, CODEX, specimen 1)A single section of deidentified human tissue from a 78-year-old African American male imaged at 20× magnification (0.75 NA, 0.38 μm per pixel nominal resolution) over 23 imaging cycles using 59 markers (58 evaluated in this study, as DRAQ5 was excluded due to its overlap with Hoechst). These data were collected at Stanford University as part of the Human BioMolecular Atlas Program (HuBMAP); see Supplementary Table 1 for further details.Dataset 7 (normal large intestine, CODEX, specimen 2)The large intestine CODEX dataset consists of a single section of deidentified human tissue from a 24-year-old white male imaged at 20× magnification (0.75 NA, 0.38 μm per pixel nominal resolution) over 24 imaging cycles using 54 markers (53 evaluated in this study, as DRAQ5 was excluded due to its overlap with Hoechst). These data were collected at Stanford University as part of the Human BioMolecular Atlas Program (HuBMAP); see Supplementary Table 1 for further details.Detailed experimental protocols

FFPE Tissue Pre-treatment on Leica Bond RX V.2 (https://doi.org/10.17504/protocols.io.bji2kkge)

Tissue Cyclic Immunofluorescence (t-CyCIF) V.2 (https://doi.org/10.17504/protocols.io.bjiukkew)

Ethics and IRB statementThe TOPACIO clinical trial (ClinicalTrials.gov Identifier: NCT02657889) was conducted in accordance with the ethical principles founded in the Declaration of Helsinki and received central approval by the Dana-Farber IRB and/or relevant competent authorities at each treatment site. All patients provided written informed consent to participate in the study. Tissue specimens and metadata were deidentified for the work performed at Harvard Medical School, which complied with all relevant ethical regulations and was reviewed and approved under IRB protocol 19-0186. The research described in this study is considered non-human subjects research.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Quality control for single-cell analysis of high-plex tissue profiles using CyLinter

Why experimental variation in neuroimaging should be embraced

Non-invasive assessment of programmed cell death ligand-1 expression using 18F-FDG PET-CT imaging in esophageal squamous cell carcinoma

A long-context language model for deciphering and generating bacteriophage genomes

AI predictive modeling of survival outcomes for renal cancer patients undergoing targeted therapy

Biosynthetic enzyme analysis identifies a protective role for TLR4-acting gut microbial sulfonolipids in inflammatory bowel disease

Hot Topics

Why experimental variation in neuroimaging should be embraced

Non-invasive assessment of programmed cell death ligand-1 expression using 18F-FDG PET-CT imaging in esophageal squamous cell carcinoma

A long-context language model for deciphering and generating bacteriophage genomes

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Why experimental variation in neuroimaging should be embraced

Non-invasive assessment of programmed cell death ligand-1 expression using 18F-FDG PET-CT imaging in esophageal squamous cell carcinoma

A long-context language model for deciphering and generating bacteriophage genomes

AI predictive modeling of survival outcomes for renal cancer patients undergoing targeted therapy

Popular Articles

Why experimental variation in neuroimaging should be embraced

Non-invasive assessment of programmed cell death ligand-1 expression using 18F-FDG PET-CT imaging in esophageal squamous cell carcinoma

A long-context language model for deciphering and generating bacteriophage genomes