High-throughput image processing software for the study of nuclear architecture and gene expression

siRNA oligos transfection and immunofluorescenceHCT116-Cas9 cells46 were grown in RPMI-1640 medium (ATCC, Cat. No. 30-2001) supplemented with 10% fetal bovine serum (FBS, Gibco, Cat. No. 10-082-147) and maintained at 37°C in 5% CO2. siRNA oligos reverse transfection and immunofluorescence staining were performed in 384-well glass bottom plates (CellVis, Cat. No. P384-1.5H-N). The siRNAs used were siNCAPH2 (Thermo Fisher Scientific, Cat. No. 4392420, Assay ID s26585), and siScrambled (Thermo Fisher Scientific, Cat. No. 4390847). For reverse transfection, 150 nl of 5 M siRNA and 50 nl of Lipofectamine RNAiMAX reagent (Invitrogen, Cat. No. 13778075) were individually diluted in 20 µl of serum-free OptiMEM medium (Thermo Fisher Scientific, Cat. No. 31985070) and sequentially added to each well. The siRNA and RNAiMAX mix was incubated for 30 min at RT. Cells were trypsinized, and a cell suspension (2000 cells in a volume of 20 µl) was prepared in OptiMEM supplemented with 20% FBS. 20 µl of the cell suspension was added to each well containing the RNAiMAX/siRNA oligo complexes. Transfected cells were grown for 72 h in a cell incubator at 37°C and then fixed with 2% paraformaldehyde (PFA, Electron Microscopy Sciences, Cat. No. 15710) in PBS. Fixed cells were washed three times with PBS. Cells were then permeabilized using a 0.5% Triton X-100 (Milipore Sigma, Cat No. 9036-19-5) solution in PBS for 15 min at RT, washed three times with 50 µl of PBS, and blocked in a 5% BSA (Milipore Sigma, Cat No. 9048-46-8) solution in PBS for 15 min at RT. Immunofluorescence staining against the centromere protein CENPC was performed using a primary CENPC antibody (MBL Bio Science, Cat. No. PD030, raised in Guinea pig) at 1:1000 dilution for 1 h at RT, and a Goat Anti-Guinea pig IgG H&L secondary antibody (AlexaFluor 488, Abcam, Cat. No. Ab150185) at 1:500 dilution for 1 h at RT. For nuclear staining, 40 µl of a 5 mg/ml solution of 4′,6-diamidino-2-phenylindole (DAPI, Thermo Fisher Scientific, Cat. No. 62248) in PBS were added to each well.High-throughput DNA FISHHCT116 RAD21-mAID-mClover (HCT116-RAD21-AID) cells27 were grown at 37 °C in 5% CO2 in McCoy’s 5A medium supplemented with 10% FBS, 2 mM L-glutamine, 100 U/ml penicillin, and 100 µg/ml streptomycin. For FISH experiments, cells were plated at a density of 8000 cells per well in 384-well imaging plates (PhenoPlate 384-well, Revvity, Cat. No. 6057500) and grown overnight. The following day, the medium was replaced with either supplemented medium containing 170 mM Auxin (Sigma-Aldrich, Cat. No. I3750) to induce the degradation of RAD21, or with medium with an equivalent amount of DMSO alone as vehicle control. The cells were then incubated with or without Auxin for 3 h and fixed in 4% PFA (Electron Microscopy Sciences, Cat. No. 15710) in PBS for 10 min. After fixation, the plates were rinsed three times in PBS and stored in PBS at 4 °C.We conducted high-throughput fluorescence in situ hybridization (hiFISH) as previously described4,47. BAC FISH probes were selected to hybridize to the boundary regions of the topologically associated domain (TAD) on chromosome 7 containing the EGFR gene. Fluorescently labeled BAC probes were generated by nick translation at 14 °C for 1 h and 20 min. The reaction mixture included 40 ng/ml DNA, 0.05 M Tris–HCl pH 8.0, 5 mM MgCl2, 0.05 mg/ml BSA, 0.05 mM dNTPs (including fluorescently tagged dUTP), 1 mM β-mercaptoethanol, 0.5 U/ml E. coli DNA Polymerase, and 0.5 mg/ml DNase I. The reaction was stopped by adding 1 µl of EDTA per 50 µl reaction volume, followed by a heat shock at 72 °C for 10 min. Probes were labeled either with DY549P1-dUTP (Dyomics, Cat. # 549P1-34) or with DY647P1-dUTP (Dyomics, Cat. # 647P1-34). The reaction was then stored at -20 °C overnight. Next, the two probes (0.5 mg per probe) were combined, precipitated with ethanol, and resuspended in 14 µl of hybridization buffer (50% formamide pH 7.0, 10% dextran sulfate, and 1% Tween-20 in 2X SSC) per well. Cells were rinsed twice with PBS and subjected to permeabilization. Permeabilization was performed at room temperature for 20 min using 0.5% w/v saponin/0.5% v/v Triton X-100 in PBS. After rinsing the cells with PBS twice, cells were deproteinated for 15 min in 0.1 N HCl and neutralized for 5 min in 2X SSC at room temperature. Cells were equilibrated overnight in 50% formamide/2X SSC at 4 °C. The probe mix was warmed to 72 °C prior to the hybridization reaction. Next, 14 µl/well of resuspended probe mix was added to the plate and denatured at 85 °C for 7 min, followed by immediate transfer to a 37 °C incubator for a 48-h hybridization period. Post-hybridization, plates were rinsed once at room temperature with 2X SSC, followed by three rinses with 1X SSC and 0.1X SSC, all warmed to 45 °C. Cells were stained with 3 µg/ml DAPI for 15 min, then rinsed and mounted in PBS and subsequently imaged on a high-throughput confocal microscope.High-throughput live cell imaging of transcriptionFor the live cell transcription assay, human bronchial epithelial cell lines (HBEC3-KT) with a monoallelic insertion of an MS2 array in the intron of the model genes were used, as previously described15. To enable visualization of the nascent RNA, the viral MS2 capsid protein (MCP) fused to GFP and to an NLS (nuclear localization signal) tag was stably introduced into the cells using lentiviral expression vectors48. Cells were grown in Keratinocyte serum-free medium (Thermo Fisher Scientific, Cat. No. 17005042) supplemented with bovine pituitary extract (Thermo Fisher Scientific, Cat. No. 13028014) and human growth hormone (Thermo Fisher Scientific, Cat. No. 1045013). For imaging experiments, cells were cultured in 384 well plates (PhenoPlate 384-well, Revvity, Cat. No. 6057500) and imaged at 37 °C, 5% CO2, 80% humidity.High-throughput image acquisitionHigh-throughput imaging was performed using either a Yokogawa CV7000 or a CV8000 high-throughput spinning disk confocal microscopes.For DNA FISH experiments, we used 405 nm (DAPI Channel), 561 nm (Probe A channel), or 640 nm (Probe B channel) excitation lasers. In addition, we used a 405/488/561/640 nm excitation dichroic mirror, a 60X water objective (NA 1.2), and 445/45 nm (DAPI Channel), 600/37 nm (Probe A channel), or 676/29 nm (Probe B channel) bandpass emission mirrors in front of a 16-bit sCMOS camera (2048 × 2048 pixel, binning 1X1, pixel size: 0.108 microns). Z-stacks of 7 microns were acquired at 1 micron intervals and maximally projected on the fly. Images were acquired in 32 fields of view (FOV) per well.For IF experiments, we used 405 nm (DAPI Channel) or 488 nm (CENPC channel) excitation lasers, a 405/488/561/640 nm excitation dichroic mirror, a 60X water objective (NA 1.2), 445/45 (DAPI Channel) or 525/50 nm (CENPC Channel) bandpass emission mirrors, and a 16-bit sCMOS camera (2048 × 2048 pixel, binning 1X1, pixel size: 0.108 microns). Z-stacks of 14 microns were acquired at 1 micron intervals and maximally projected on the fly. Images were acquired in 22 FOV per well.For live cell imaging experiments, we used a 488 nm excitation laser, a 405/488/561/640 nm excitation dichroic mirror, either a 40X air objective (NA 0.95) or a 40X water objective (NA 1.15), a 525/50 nm bandpass emission mirror, and a 16-bit sCMOS camera (2048 × 2048 pixel, binning 2X2, pixel size: 0.325 microns). Z-stacks of 0.5 microns were acquired at 100 s intervals and maximally projected on the fly for 10 h.In all cases, images were corrected on the fly with Yokogawa proprietary software to subtract the camera dark background, to compensate for illumination artifacts (vignetting), and for chromatic aberrations and cameras alignment.HiTIPS implementationHiTIPS uses the PyQt5 Python module, which offers a user-friendly GUI enabling interactive data analysis. Its architecture implements multiprocessing to optimize computational efficiency, a critical aspect when dealing with large-scale bioinformatics datasets. The parallel processing scheme in HiTIPS is designed to completely analyze (nuclei segmentation, spot detection etc.) each FOV in a separate thread. Depending on the available hardware, parallel processing in HiTIPS can reduce the analysis time by 5- to 8-fold, depending on the analysis workflow.HiTIPS depends on several Python scientific computing libraries, including numerical computation and data manipulation (SciPy, Pandas), image processing (Pillow, Matplotlib, imageio, scikit-image, and OpenCV), dynamic cell tracking (btrack), machine learning-based image segmentation and classification (DeepCell49 and Cellpose50), image input/output and format conversion (aicsimageio, nd2reader), and Hidden Markov Model fitting (hmmlearn). At least 8 GB of RAM are required to run HiTIPS, but having 32 GB of RAM may be required for larger FOVs and for 3D volumes. In addition, when using deep learning based nuclear segmentation or cell tracking models in HiTIPS (i.e., Cellpose and DeepCell), the availability of graphical processing units (GPUs) substantially improves the inference speed of these models. Additional details about the specifications for typical hardware configurations can be found in Supplementary Note 1, while details about the implementation of the pipeline in code can be found in Supplementary Note 4.Nucleus segmentationNucleus segmentation using images of nuclei stained with a fluorescent dye or a recombinant fluorescent nuclear protein is the key first step in the vast majority of HTI analysis pipelines. Given the high relevance of this step for HTI, a substantial amount of work in the field has been devoted to making nuclear segmentation algorithms fast, precise, and robust to fluctuations in cell confluency and to heterogeneity in nucleus morphology across different cells51. For this reason, and to take advantage of previous advances made by other groups, we focused on integrating existing state-of-the-art nucleus segmentation algorithms into HiTIPS so that end users can easily access them and modify their parameters if needed. Accordingly, the HiTIPS GUI allows users to choose among a traditional CPU based method (Supplementary Note 2, Algorithm 1) for segmentation in the nucleus segmentation module, which we have developed to handle cases that do not necessarily require deep learning models, which require high-end graphical processing units (GPUs). In addition, HiTIPS also adopts two recent deep learning-based methods for nuclear segmentation, Cellpose50,52 and DeepCell53. Deep learning-based nuclei segmentation models do not involve time consuming parameter optimization, and they generally provide excellent segmentation performance on a variety of different cell lines “out-of-the-box”. On the other hand, the speed performance of segmentation models really benefits from access to GPUs, which tend to be expensive and difficult to setup for end users. Traditional image processing algorithms for nucleus segmentation can be fast if properly optimized and can handle a variety of edge cases upon expert parameter optimization. The watershed-based segmentation method is the CPU-based approach integrated into HiTIPS. It starts with image padding and noise reduction via median filtering, followed by image binarization using Li’s iterative method54. The binary image is then processed using morphological operations and a Gaussian kernel to connect fragmented nuclei. The method labels connected components and it calculates the center of mass for each, creating a new mask image. A watershed transform55 is applied using this mask and the distance-transformed image to separate adjacent nuclei effectively. Finally, a boundary image is created, resized to the original size, and any holes are filled to generate the final mask. By providing an easy selection of different nuclear segmentation methods via a GUI, HiTIPS allows users to choose and optimize the method that works best on their images and in the context of the available computational hardware infrastructure.Spot detectionHiTIPS includes morphologic, intensity, and filtering-based approaches for fluorescent spot detection. Currently, HiTIPS incorporates four different spot detection methods: Direct Thresholding, Gaussian Filter, Gaussian Laplacian, and Enhanced Gaussian Filter and Laplacian. The spot detection methods offered as part of HiTIPS have their own set of strengths and limitations, which need to be considered when choosing the appropriate spot detection method for a given type of biological sample and imaging assay.The Direct Thresholding method applies a direct thresholding technique for spot segmentation without any filtering. It is a straightforward and computationally efficient approach suitable for scenarios where spots have large contrast. However, this method may be less effective when dealing with spots that have low contrast or are close to the background intensity level. Additionally, it has limited capability to handle spots with varying intensity gradients. The Gaussian Filter method utilizes a Gaussian filter to reduce noise and enhance spots. This method performs well when spots have a relatively uniform intensity distribution and works better when spots are close together or overlapping compared to the Gaussian Laplacian method. However, it may be less effective in enhancing spots with sharp intensity variations or irregular shapes. Careful consideration of parameters such as the Gaussian filter size (sigma) and thresholding parameters is necessary. The Gaussian Laplacian method enhances spots by applying a Gaussian Laplacian filter to the input image and then segments the spots using thresholding. By utilizing the negative lobes of the Gaussian Laplacian kernel, this method not only enhances the spots but also removes the background around the spot, improving the effectiveness of automatic thresholding. It is a relatively simple and computationally efficient method. However, it may face challenges when spots are closely located or overlapping due to limited resolution. Sensitivity to parameters such as the Laplacian filter size (sigma) and thresholding parameters should be considered. The Enhanced Gaussian Filter and Laplacian method, combines the strengths of both the Gaussian Filter and Gaussian Laplacian methods. It first applies a Gaussian filter to the input image, followed by a Gaussian Laplacian filter on the filtered image, and it finally uses fluorescence thresholding for spot segmentation. This method provides enhanced capabilities for detecting spots with varying intensity gradients and can improve overall spot detection accuracy. However, achieving optimal results may require careful tuning of filter sizes (sigma), and the choice of thresholding parameters may still impact its performance.The spot detection methods provided in HiTIPS enable the detection and localization in the X and Y dimensions of fluorescent spots generated by DNA/RNA FISH staining, or from other biological structures in maximally projected 3D z-stacks microscopy images. Subsequently, maximum intensity or Gaussian-fitted maximum intensity can be employed to estimate the spot center positions in the Z dimension of the z-stack.Nuclei trackingNuclei tracking can be framed as a linear assignment problem in which Ni objects in frame i are matched up with Ni+1 objects in frame i + 1. Shadow objects can be introduced to account for births (i.e. from cell division events) or deaths (i.e. cells leaving the field of view). We incorporated two cell tracking methods in HiTIPS to accommodate HTI assays using cell lines with different levels of confluency and mobility.The first method we adopted56 revisits and updates the Kalman filtering algorithm57 and uses a Bayesian framework to improve the cell tracking accuracy and reliability. At the onset, the algorithm constructs tracklets, which are links between consecutive cell detections that do not exhibit cell division events. These tracklets from a prior frame are paired with observed cells in the current frame to form a Bayesian belief matrix, which initially holds a uniform probability of associations. Crucially, each tracklet deploys its own Kalman filter to predict the future state of a cell, basing its predictions on motion models and information from a cell state classifier. This classifier discerns nuclear morphological variations and chromatin condensation levels, which are crucial visual features in tracking. Belief updates in the matrix consider both motion evidence (using a constant velocity model) and appearance evidence (through a cell state transition matrix). The motion aspect focuses on the estimated future position of a cell, while the appearance aspect evaluates linking probabilities based on the state transitions determined by the classifier. Notably, this combination method aids in accurately identifying instances like cell divisions. After forming these tracklets, a global optimization approach employing multiple hypothesis testing is used. It constructs a large number of hypotheses based on the appearance and motion features, with the aim of identifying the most optimal track-linking solution. Hypotheses account for diverse cell behaviors, including cell divisions, false-positive tracks, or apoptosis events. The optimal set of hypotheses is determined using a maximization function, resulting in the amalgamation of tracklets into final tracks. Ultimately, a graph-based approach is leveraged to assemble these tracks into lineage trees, outputting a set of additional measurements such as generational depth and cell lineage.The second tracking method adopted by HiTIPS is DeepCell53,58. DeepCell employs a fully connected neural network that considers various features of each cell, including its appearance, local neighborhood, morphology, and motion. These pieces of information are fed into the neural network, which then processes and summarizes them into a vector representation using a deep learning sub-model. To determine the relationship between cells in consecutive frames, the information from the past frame and the current frame is utilized. The Hungarian algorithm59 is employed for this purpose, which is a combinatorial optimization algorithm that assigns the best possible associations between cells across frames. It determines whether the current cell is the same as a cell in the previous frame, a different cell, or a child cell derived from the cell in the previous frame. By combining the DeepCell neural network with the Hungarian algorithm, this tracking method aims to accurately track and link cells across frames, considering their various characteristics and relationships.Nuclei/RNA spot image patch generationEach cell track, representing the same nucleus monitored across the time-lapse movie, is precisely cropped from the full FOV into 128 × 128 pixel image patches. At each time point, the cropping algorithm positions the center of mass of the segmented nucleus ROI at the center of the cropped image, thus optimizing the positioning and the orientation of the nuclei, which is further refined in the subsequent nucleus registration step. While it is possible to segment and track partial nuclei ROIs from the moment they enter the full FOV, until they partially exit it, this approach carries the risk of missing crucial objects or events in the nucleus. This can also potentially decrease the precision of frame-to-frame nuclei registration. Accordingly, HiTIPS provides the option to select only nuclei which are entirely within the frame, thereby improving the accuracy of tracking and RNA spot detection.Nuclei/RNA spots image patch registrationAccurate frame-to-frame rigid and rotational registration of the nucleus ROI is indispensable to track the dynamics of spots signals in the same nucleus over time. For example, in the time-lapse images of MS2/MCP-GFP labeled transcription sites, nuclei segmentation is often performed using the nucleoplasmic fluorescence background of the NLS-tagged MCP-GFP protein, which contains very little to no information about other sub-nuclear structures, such as nucleoli or chromocenters. As a consequence, feature-based image registration methods such as SIFT or SURF60,61,62, which rely on the presence of prominent texture features that remain consistent over time, cannot be utilized. To overcome this limitation, HiTIPS incorporates two novel registration methods that correct for nucleus translation and rotation across time-lapse movies. This is achieved by taking into account subtle variations in nuclear shape across the cropped ROI time-lapse movie in a two-step process (Supplementary Note 2, Algorithm 2 and Algorithm 3) that have been specifically designed for and implemented in HiTIPS. These methods provide an effective solution for tracking nuclei positioning across frames, thus facilitating the successful tracking of transcription sites in live cells.The first method that we developed (Supplementary Note 2, Algorithm 2) starts with setting the angle between the major axis of the nucleus (α) and the horizon as zero. For each iteration greater than zero, two variables, α’ and α”, are initialized at 0 and 180 degrees, respectively. The algorithm then computes eight specific features, namely, Cosine Similarity Index, Mutual Information, Structural Similarity Index, Mean Square Error, Variation of Information, Adaptive Random Error, and Peak Signal to Noise Ratio for these initial values of α’ and α”. If the majority of the features for α’ exceed those for α”, then the value of α for this iteration is set as α’, else it is set as α”. This is followed by a process where α’ and α” are set to α ± i (where i varies from 1 to 5) and a similar comparison of features is performed to update α. The algorithm then increments the iteration count (n), and repeats these steps until the final step, where the spot coordinates are mapped to the nuclei patch and used for spot tracking.The second method that we developed (Supplementary Note 2, Algorithm 3) begins by assigning an initial rotation of 15 degrees to α to prevent cumulative drift. Next, a series of transformations is performed on the centered nuclei including median filtering, upsampling, and polar warping. Following these transformations, subpixel image translation registration is carried out by cross-correlation in the polar Fourier domain. The algorithm then corrects for the initial rotation assigned in step one by subtracting it. The final step involves mapping the spot coordinates to the nuclei patch and tracking these spots using Algorithms 5 and 6 (Supplementary Note 2,).The intensity-based registration approach (Supplementary Note 2, Algorithm 2) has been proven to work better when cells shape changes during along the movement on their trajectory, however, large frame-to-frame intensity variations can introduce angle shift or translation in the registration results. On the other hand, the Fourier phase transform-based approach (Supplementary Note 2, Algorithm 3) is more robust to intensity change and less robust to frame-to-frame shape deformation.Assignment of transcription spots to timelapse tracksThe assignment of individual fluorescent spots detected at different time points to common tracks using hierarchical clustering integrates several algorithms for optimal results. We have developed a two-step process to effectively identify and organize spatial clusters of spots within each cell in projected stacks of images across the time dimension. The first step (Supplementary Note 2, Algorithm 4) calculates the pairwise Euclidean distance between all transcription spots in the time-projected image, and then uses single-linkage hierarchical clustering to generate an initial set of labels for the clusters. This algorithm also identifies the centroids of each unique cluster label, it calculates the standard deviation of distances from the centroid for each cluster, and it identifies outlier spots that exceed a user-defined threshold distance from the centroid of the clusters. Once the outliers are identified they are arbitrarily labeled with a label of “zero”. The second algorithm (Supplementary Note 2, Algorithm 5) first determines the size of each cluster (i.e. the number of spots in the cluster) and separates them into “large” and “small” categories based on a user-defined threshold. Then, the algorithm calculates the Euclidean distances between points for each small cluster and all points in all the other large clusters. If the distance between the closest point in a large cluster is less than a user defined distance, the label of that closest point is assigned to the points in the small cluster. If not, it is labeled as an outlier. Through this process, small clusters and outliers are effectively merged into larger, more significant clusters, which streamlines the data structure and improves the interpretability of the results. The updated cluster labels, which correspond to tracks, for example of MS2/MCP-GFP representing sites of active gene transcription, across time, are then returned.Integrated intensity measurementThe integrated fluorescence intensity measurement of spots includes two components: Local background estimation and Gaussian mask fitting63. The Local Background Estimation Algorithm (Supplementary Note 2, Algorithm 6) first addresses the preprocessing of the images. This algorithm utilizes a least squares method to fit a background 2D plane using the fluorescence intensity values of the pixels at the border of an 11 × 11 pixel matrix centered around the location of the spot. The estimated background plane is then subtracted from the original image, thus locally correcting for potential non-uniform illumination, and compensating for systematic imaging noise.Following the background correction, the Gaussian Mask Fitting Algorithm (Supplementary Note 2, Algorithm 7) fits a Gaussian mask to the image to isolate and analyze individual spots within the image. The Gaussian mask can be either statically applied based on a given centroid or it can be iteratively adjusted to improve the accuracy of the fitting. The process involves iterative computation and adjustment of the centroid coordinates of the Gaussian mask until the difference between the old and new centroids becomes negligible, or until the maximum iteration count is reached. The final output from this process is the centroid coordinates of the Gaussian mask and the estimated photon number, which can then be used for further intensity track analysis.CENPC clustering score calculationIn our analysis, we employ a derivative of Ripley’s K-function 32, specifically designed to estimate the degree of spatial clustering at the single-cell level. This statistical measure, denoted as K(r), is defined as:$$K\left(r\right)=\frac{A}{N(N-1)}\sum_{i\in \{C\}}\sum_{j\in \left\{C\right\},j\ne j}I({d}_{ij}<r)$$In this equation, \(A\) represents the nucleus area for each cell, \(N\) is the total number of CENPC spots in the nucleus, \({d}_{ij}\) stands for the Euclidean distance between the i-th and j-th spots, and \(r\) is a predefined radius within which we evaluate the clustering. The indicator function, \(I()\), returns 1 if \(({d}_{ij}<r)\), and 0 otherwise. The calculation of K(r) involves the summation over all unique pairs of points (i, j) in the cell (C). The resulting sum is then normalized by multiplying it to the ratio between A and the product of N and N−1.To correct for edge effects, which can potentially bias results for spots in proximity of the nucleus ROI periphery, we employ Ripley’s edge-corrected K-function. The correction to the K-function adds a weighting term for each point that is inversely proportional to the area of the region accessible to other points within the specified radius, \(r\), without crossing the boundary of the study area. This results in a correction factor that adjusts for the reduced probability of finding neighboring points near the edges of the region under study. The K(r) calculations were performed using the Astropy package in a Jupyter notebook separate from HiTIPS. Finally, the difference between a Poisson point process (representing complete spatial randomness) and the actual data from the cell is computed. The percentage of radii where the measured value of K(r) is higher than the K(r) for the Poisson process is then calculated as a clustering score on a per cell basis.Statistical analysisStatistical analysis for the DNA FISH and for the CENPC clustering data was performed using the R statistical programming language, and these R packages: tidyverse, data.table, fs, and ggthemes.Statistical analysis for the MS2-GFP live cell data was performed in Python 3.9 using these libraries: pandas, Seaborn and Matplotlib.

Hot Topics

Related Articles