UNSEG: unsupervised segmentation of cells and their nuclei in complex tissue samples

Generation of GIT dataset and other imagesFor GIT dataset, formalin-fixed paraffin-embedded (FFPE) tissue microarray (TMA) slides were obtained from Pantomics (Pantomics, DID381) Tissue TMA samples for Supplementary Figs. 12–14 were obtained from the Department of Pathology at University of Pittsburgh Medical Center Presbyterian Hospital. The slides went through cyclic immunofluorescence antigen retrieval protocol10. The corresponding figure slides were stained in cycles with 1:200 dilution of Anti-Sodium Potassium ATPase antibody (Abcam ab198367, clone EP1845Y), 1:100 dilution of P53 antibody (Abcam ab270192, clone SP5), 1:50 dilution of EZH2 antibody (CST 45638, clone D2C9), and 1:100 dilution of \({{{\rm{LAMIN}}}}\,{{{\rm{A}}}}/{{{\rm{C}}}}\) antibody (CST 8617, clone 4C11) overnight at 4 °C in the dark, followed by staining with Hoechst 33342 (CST 4082S) for 10 min at room temperature in the dark. TMA images were acquired using a 0.95 NA and a 40× objective on a Nikon Ti2E microscope.Seventy-five, 1000 × 1000 high-quality regions were identified and extracted from the TMA images and saved as tiff images. Expert pathologists independently annotated these images. The annotations were done using Cellthon, a Python-based cell annotation graphical user interface (GUI) we created using Tkinter toolkit62. Together these 75 images and their cell and nucleus annotations comprise the GIT dataset.UNSEG algorithmInput imageThe input to our algorithm is a two-channel image. An example is illustrated in the “input” panel of Fig. 1 and Supplementary Fig. 15, as well as in Fig. 3 and Supplementary Fig. 13. Channel one, depicted in blue, and channel two shown in red, are respectively associated with nucleus and cell membrane marker expressions. Each channel of the image is independently scaled to 0 and 1, such that Ii: Ω → [0, 1]. Here Ii is the normalized image intensity for ith channel, Ω is the image domain, and i = 1, 2 is the indexing representing the two channels.The algorithm performs nucleus and cell segmentation utilizing a Bayesian framework: the posterior probability estimates of nucleus and cell masks are obtained from their a priori and likelihood estimates that UNSEG computes from the normalized two-channel image. These posterior estimates are then used to obtain the final nucleus and cell segmentations. UNSEG implements this framework through four processing stages detailed below and illustrated in Fig. 1 and Supplementary Fig. 15.Processing stage 1: computing a priori nucleus and cell membrane masksIn Stage 1, we compute a priori estimates of the image foreground for each channel. The estimates are computed at the global and local scale as described below.A priori probability: Each channel, Ii(x, y), i = 1, 2, is first pre-processed using a combination of a Gaussian filter63 and multi-level Otsu63,64,65. The standard deviation of the Gaussian filter kernel, σ is a parameter of the algorithm that allows the user to control the degree of smoothing. This and other algorithm parameters are summarized in Supplementary Table 2. Our default setting is σ = 3. A three-level Otsu is next applied to the smoothed image, and the lowest level is selected as the threshold to obtain the initial estimate of the channel foreground.We use the initial, per-channel foreground estimate to compute the cumulative distribution function (CDF), \({{{{\mathcal{F}}}}}_{i}\) of Ii using intensity values, Ii(x, y), of pixels (x, y) within this estimate. Two examples of CDFs are presented in Supplementary Fig. 15. Using the monotonically non-decreasing property of CDF, we map Ii to its cumulative probabilistic representation \({P}_{i}^{e}\), where \({P}_{i}^{e}(x,y)={{{{\mathcal{F}}}}}_{i}\left({I}_{i}(x,y)\right)\). We define \({P}_{i}^{e}(x,y)\) to be the a priori probability of the pixel being the nucleus (i = 1) or cell membrane (i = 2). We note that this definition quantifies the intuition that stronger the marker intensity at a particular pixel, the higher its a priori probability. Examples of a priori probabilities for nuclei (\({P}_{1}^{e}\)) and cell membranes, (\({P}_{2}^{e}\)) are presented in Fig. 1 and Supplementary Fig. 15.A priori global mask: We compute the a priori global mask \({M}_{i}^{g}(x,y)\), i = 1, 2 using \({P}_{i}^{e}\) and a simple filter called local mean suppression filter (LMSF) that we have developed. The foreground pixels (x, y) where \({M}_{i}^{g}(x,y)=1\) are designed to be a superset of the pixels belonging to the true nucleus (i = 1) and cell membrane (i = 2) compartments of cells in Ii(x, y), i = 1, 2. \({M}_{i}^{g}\), therefore, ensures that no pixels belonging to the cells are missed.LMSF is designed to identify the valleys (or space) that exist between nuclei (or cell membranes) of closely located cells that nevertheless have some spillover marker expression, and are therefore, difficult to identify as background. We define LMSF as,$${\hat{I}}_{i}(x,y)= \left\{\begin{array}{ll}0,\quad &{{{\bf{if}}}}\,\,\frac{{I}_{i}(x,y)}{{\bar{I}}_{i}(x,y)} < \, {t}_{0}\\ {I}_{i}(x,y),\quad &{{{\bf{otherwise}}}}\end{array}\right.,\,\, \\ {{{\rm{where}}}}\,\,{\bar{I}}_{i}(x,y)= \frac{1}{{\left(2{n}_{0}+1\right)}^{2}}{\sum}_{\xi =x-{n}_{0}}^{x+{n}_{0}}{\sum}_{\eta =y-{n}_{0}}^{y+{n}_{0}}{I}_{i}(\xi ,\eta ).$$
(1)
The above definition states that for a given pixel (x, y) ∈ Ω, LMSF replaces the original intensity value with 0 only if the ratio of the pixel intensity to the average intensity, computed locally around the pixel neighborhood, is below the threshold parameter t0. The size of the kernel defining the neighborhood over which the local mean intensity is computed is parameterized by n0. We set t0 = 0.5. Consequently, all pixels with intensity value less than half the mean intensity in their respective neighborhoods are replaced with zeros, allowing us to identify valleys between cells. By varying n0 we can identify valleys and gaps of different widths. UNSEG performs LMSF filtering for n0 = 5, 10, 20, 40. If \({\hat{I}}_{i}(x,y)=0\) for any value of n0, then the final pixel value is set to 0 and assigned to be background in the global mask, \({M}_{i}^{g}(x,y)\). Thus, LMSF allows us to capture valleys of different widths. The values of n0 are user-defined and can be optimized according to complexity of individual images.We refine the global mask \({M}_{i}^{g}(x,y)\) by reassigning those pixels currently in the foreground that have a priori probability \({P}_{i}^{e}(x,y) \, < \, {p}_{i}\), i = 1, 2 to the background. This refinement is particularly useful for images with highly heterogeneous tissue with varying marker expression. The threshold value pi should be small and by default is set to 0.01.An example of a priori global mask is presented in Supplementary Fig. 15.A priori local mask: Complementing \({M}_{i}^{g}(x,y)\), we next compute \({M}_{i}^{l}(x,y)\), the a priori local mask corresponding to image Ii(x, y). \({M}_{i}^{l}(x,y)\) captures the local peculiarities of the compartments—nuclei or cell membranes – associated with their local structure and morphology.First, Ii(x, y) is filtered by applying a single iteration of gradient adaptive smoothing (GAS)45,66,$${\tilde{I}}_{i}(x,y) = \frac{1}{{N}_{i}(x,y)}{\sum}_{\xi =-1}^{1}{\sum }_{\eta =-1}^{1}{I}_{i}(x+\xi ,y+\eta ){w}_{i}(x+\xi ,y+\eta ),\,\, \\ {{{\rm{where}}}}\quad {N}_{i}(x,y) = {\sum}_{\xi =-1}^{1}{\sum}_{\eta =-1}^{1}{w}_{i}(x+\xi ,y+\eta ),\\ {w}_{i}(x,y) = \exp \left[-\frac{{d}_{i}^{2}(x,y)}{2{k}_{0}^{2}}\right],\quad {d}_{i}(x,y)=\sqrt{{\left[\frac{\partial {I}_{i}(x,y)}{\partial x}\right]}^{2}+{\left[\frac{\partial {I}_{i}(x,y)}{\partial y}\right]}^{2}}.$$
(2)
This GAS-filtered image, \({\tilde{I}}_{i}(x,y)\) smooths the original image, Ii(x, y), while preserving the local variations within and around cell nuclei and membranes. The local neighborhood is defined via a 3 × 3 kernel, wi, that also performs variation preserving smoothing. Here, variation is quantified via computation of local gradient and the degree of smoothing is controlled by k0, which is an algorithmic parameter. Its default setting is 1.To obtain \({M}_{i}^{l}(x,y)\), a two-level, local Otsu is applied to \({\tilde{I}}_{i}(x,y)\) based on disk kernel whose radius r0 is an algorithmic parameter. Its default setting is 5 pixels. The Otsu output faithfully captures the local structure but is also noisy, particularly in image regions where no tissue samples are present and the gradients are being computed on the background noise. As \({M}_{i}^{g}(x,y)\) can accurately identify such background, the output of the local Otsu is restricted to where \({M}_{i}^{g}(x,y)=1\), resulting in local foreground mask \({M}_{i}^{l}(x,y)\).An example of a priori local mask is presented in Supplementary Fig. 15.Processing stage 2: computing a posteriori nucleus and cell membrane masksThe a priori global and local binary masks are computed independently for both channels. As a result, non-negligible probability exists for a pixel to be classified as being both in the nucleus and cell membrane. This is particularly true in tissue regions with crowded cells, or when the nature of the tissue section is such that cell membrane is laying over the nucleus. This processing stage reconciles these overlaps and generates a posteriori global and local nucleus and cell membrane masks.Contrast-based likelihood function: Human visual perception of cell membranes and nuclei is based on inherent contrast between the two channels. Usually this contrast is visualized via imbuing the individual intensity-based channels with colors. Here, we adapt this notion to compute a visual contrast function based on nucleus and cell membrane marker-specific expression to quantify the likelihood of pixel belonging to either the nucleus or cell membrane. The first step computes the contrast function for each pixel in the a priori local mask as follows,$${L}_{0}(x,y)=\left\{\begin{array}{ll}\frac{{I}_{2}(x,y)-{I}_{1}(x,y)}{{I}_{2}(x,y)+{I}_{1}(x,y)},\quad &{{{\bf{if}}}}\,\,{I}_{1}(x,y) \, > \, {i}_{1}\,\,{{{\bf{or}}}}\,\,{I}_{2}(x,y) \, > \, {i}_{2}\\ 0, \hfill \quad &{{{\bf{otherwise}}}}\end{array}\right.,$$where \({i}_{i}={\min }_{(x,y)\in {{{\Omega }}}_{i}}{I}_{i}(x,y),\,{{{\Omega }}}_{i}=\left\{(x,y)\in {{\Omega }}\,| \,{M}_{i}^{l}(x,y)=1\right\},\,i=1,2\). The second step ensures that this function is consistent with the a priori global mask for each channel, resulting in the contrast-based likelihood function,$${{{\bf{L}}}}(x,y)=\left\{\begin{array}{ll}{L}_{0}(x,y),\quad &{{{\bf{if}}}}\,\,{L}_{0}(x,y) \, < \, 0\,{{{\bf{and}}}}\,{M}_{1}^{g}(x,y)=1\,\,{{{\bf{or}}}}\,\,{L}_{0}(x,y) \, > \, 0\,{{{\bf{and}}}}\,{M}_{2}^{g}(x,y)=1\\ 0, \hfill \quad &{{{\bf{otherwise}}}} \hfill \end{array}\right..$$
(3)
L(x, y) is bounded between [ − 1, 1], with the contrast of  − 1 indicating the strong likelihood that the pixel (x, y) belongs to the nucleus, while 1 indicating the pixel most likely belongs to the cell membrane. Two examples of likelihood function are presented in Fig. 1 and Supplementary Fig. 15.A posteriori global mask: We combine the a priori probability with the contrast-based likelihood function to compute the a posteriori global mask Mg(x, y), such that Mg : Ω → {0, 1, 2}, where the labels 0, 1, and 2 correspond to the background, nuclei, and cell membranes, respectively. However, before performing this combination, we enhance \({P}_{i}^{e}(x,y)\) as follows,$$\begin{array}{r}{P}_{i}^{s}(x,y)=\left\{\begin{array}{ll}1,\quad &{{{\bf{if}}}}\,\,{M}_{i}^{l}(x,y)=1\\ {P}_{i}^{e}(x,y),\quad &{{{\bf{otherwise}}}}\end{array}\right.,\end{array}$$
(4)
where i = 1, 2. This enhancement, saturates \({P}_{i}^{e}(x,y)\) —that is, sets \({P}_{i}^{e}(x,y)=1\) —where the a priori local mask is 1. It ensures graceful performance of our algorithm in the global context, when computing a posteriori global mask Mg(x, y). We then compute the a posteriori global probability \({P}_{i}^{g}(x,y)\), via \({P}_{i}^{s}(x,y)\)-weighted convex combination of the likelihood and a priori belief,$$\begin{array}{r}{P}_{1}^{g}(x,y)=\left\{\begin{array}{ll}{P}_{1}^{s}(x,y)+\left(1-{P}_{1}^{s}(x,y)\right)\,| {{{\bf{L}}}}(x,y)| ,\quad &{{{\bf{if}}}}\,\,{{{\bf{L}}}}(x,y) \, < \, 0\\ 0, \hfill \quad &{{{\bf{otherwise}}}}\end{array}\right.,\\ {P}_{2}^{g}(x,y)=\left\{\begin{array}{ll}{P}_{2}^{s}(x,y)+\left(1-{P}_{2}^{s}(x,y)\right)\,| {{{\bf{L}}}}(x,y)| ,\quad &{{{\bf{if}}}}\,\,{{{\bf{L}}}}(x,y) \, > \, 0\\ 0, \hfill \quad &{{{\bf{otherwise}}}}\end{array}\right..\end{array}$$
(5)
The final posterior global mask is obtained by either applying k-means clustering, with k = 3, or argmax operation45 on \({P}_{i}^{g}(x,y)\), i = 1, 2 (Eq. (5)) to compute Mg(x, y). The default setting is argmax. We note that k-means (or argmax) is performed under the constraint that pixel (x, y) ∈ Ω is assigned to the common background if both global probabilities have zeros values, i.e., \({P}_{i}^{g}(x,y)=0\), i = 1, 2. Examples of the a posteriori global mask are presented in Fig. 1 and Supplementary Fig. 15.A posteriori local mask: We define the a posteriori local mask, Ml: Ω → {0, 1, 2}, simply by restricting the a priori probability \({P}_{i}^{e}(x,y)\) to the local mask \({M}_{i}^{l}(x,y)\),$${P}_{i}^{l}(x,y)=\left\{\begin{array}{ll}{P}_{i}^{e}(x,y),\quad &{{{\bf{if}}}}\,\,{M}_{i}^{l}(x,y)=1\\ 0,\quad &{{{\bf{otherwise}}}}\end{array}\right.,$$
(6)
where i = 1, 2. This restriction allows us to optimally capture the local a posteriori structure of the nuclei and cell membranes in a self-consistent manner.Similar to computing the a posteriori global mask, we either apply k-means clustering or argmax (default setting) operation on \({P}_{i}^{l}(x,y)\), i = 1, 2 (Eq. (6)) to obtain the a posteriori local mask Ml(x, y). As mentioned above for the a posteriori global mask, the same constraint for the common background is also applied here. Examples are presented in Fig. 1 and Supplementary Fig. 15.Processing stage 3: nucleus segmentationThe a posteriori global and local masks provide a semantic segmentation of image pixels comprising the tissue into nuclei and cell membranes. This, and the following processing stages are designed to obtain every instance of individual nucleus and its cell from the semantic segmentation of the tissue. Specifically, in this stage, we first segment all nuclei, and use them as a basis to identify their cells in the next stage. These steps ensure that the nucleus and cell segmentations are internally consistent with the latter always bounding the former.To segment nuclei we process the a posteriori global mask for the nuclei, \({{{{\bf{M}}}}}_{nuc}^{g}(x,y):= {{{{\bf{M}}}}}^{g}(x,y){| }_{{{{\rm{label}}}} = 1}\) with help from the a posteriori local mask for the cell membrane, \({{{{\bf{M}}}}}_{cell}^{l}(x,y):= {{{{\bf{M}}}}}^{l}(x,y){| }_{{{{\rm{label}}}} = 2}\). Particular examples of these two masks are presented in Supplementary Fig. 15.Convexity analysis: Nucleus segmentation begins with convex analysis of every connected component of \({{{{\bf{M}}}}}_{nuc}^{g}(x,y)\). As a part of this analysis, we compute area and the steepest concave point (SCP)37 of every component. SCP is a boundary point of the component with the largest deviation from its convex hull. The area parameter allows us to filter out exceedingly small objects that are not nuclei, while SCP helps us determine if the component is nucleus cluster (NC) or not. The component is kept for further analysis only if the area of the component exceeds a0. Otherwise it is removed. Each component that passes the area threshold, is either classified as an NC or non-NC depending on whether SCP is above or below the threshold d0. Both a0—default set to 20 pixels—and d0—default value is 4 pixels—are the primary algorithm parameters (Supplementary Table 2). The non-NC components are statistically analyzed to obtain the initial segmentation for all individual nuclei, along with a small component (SC) list comprising of small convex objects that we are less confident about being nuclei.Convexity analysis of \({{{{\bf{M}}}}}_{nuc}^{g}(x,y)\), is illustrated in Supplementary Fig. 15.Perturbed watershed and virtual cuts: We process the NC components using perturbed watershed (PW) and virtual cut (VC) algorithms that we have developed. Their goal is to partition the NC into individual nuclei.PW steps are illustrated in Fig. 2. Briefly, the NC component mask (Fig. 2d) is first modified by \({{{{\bf{M}}}}}_{cell}^{l}\) (Fig. 2e). Specifically, cuts are introduced in the NC component mask where the local cell membrane is indicated in the \({{{{\bf{M}}}}}_{cell}^{l}\) spatially corresponding to the NC component (Fig. 2f). We next apply distance transform (DT) on the modified NC component and use the resulting DT image (Fig. 2g) to compute davr—the average of all non-zero DT values in the DT image. davr is used to threshold the distance transform to identify n sub-regions with large DT values indicative of interior of the sub-regions—putative nuclei—making up the NC splitting (Fig. 2h). Within every sub-region we identify a pixel with the maximal distance-transform value as the watershed seed point (marker) for that sub-region. We perform watershed segmentation of NC based on these n seed points to obtain our initial estimate of the nuclei comprising the NC (Fig. 2i). If these estimates are correct, then perturbing the markers does not affect segmentation of the NC. However, if the estimates are incorrect, then sub-region estimates are not stable on perturbation. We exploit this perturbation-based stability to identify the correct segmentation of the NC. Specifically, we perturb the marker location and recompute the watershed-based segmentation. The perturbations are implemented by shifting each watershed marker location sequentially in the horizontal and vertical directions by ± ⌊davr⌋, resulting in four perturbations: (xj ± ⌊davr⌋, yj) and (xj, yj ± ⌊davr⌋) with j = 1, …, n (Fig. 2j–m). Here, ⌊ ⋅ ⌋ stands for the floor function. If during any of the four scenarios, the size of any of the n putative nuclei collapses to a point object with an area size bounded to a few pixels (Fig. 2j, l, m), we deem them as unstable and remove their corresponding seed points from the list of n seed points, and recompute the watershed-based segmentation with the remaining seed points (Fig. 2n). If the segmentation results remain stable for all four shifts, then the estimate is considered correct. To ensure that each of the segmented sub-regions are indeed nuclei and not smaller NCs, we recursively perform convexity analysis and PW on each sub-region. An example of this recursion is illustrated in Supplementary Fig. 16.The above recursive segmentation of an NC can sometimes result in a specific pathological situation, where the convex analysis identifies a sub-region as an NC, but PW does not segment it into sub-regions. For this specific scenario, we have developed the virtual cuts (VC) approach, where a virtual cut is defined through the SCP of the NC component mask to identify virtual sub-regions. We use “virtual” to emphasize that this cut and the resulting sub-regions are only used to identify their respective watershed seed points based on which we perform the actual segmentation. The hypothesis driving the VC method is based on the idea of PW method: although the locations of the respective watershed markers identified using virtual cuts might not exactly coincide with their true locations, they do represent a perturbed version of the true location. Thus, they yield stable and accurate segmentation into the two sub-regions. These sub-regions follow the same recursive logic of the PW method detailed above. VC method is illustrated in Supplementary Fig. 15.Finally, we process the small components in the SC list in a context-dependent manner, with small isolated SCs included in the final nucleus segmentation result. Multiple examples of nucleus segmentation are presented in Figs. 1, 4, and 6 as well as in Supplementary Figs. 7–10, 12, 13, and 15, where the contours of nuclei are outlined in white.Processing stage 4: cell segmentationWe segment cells via the joint use of a posteriori global mask for the cell membranes \({{{{\bf{M}}}}}_{cell}^{g}(x,y):= {{{{\bf{M}}}}}^{g}(x,y){| }_{{{{\rm{label}}}} = 2}\) and the segmented nuclei.We begin by initializing the segmented cell mask as the segmented nucleus mask. The cell mask is then expanded till its boundary coincides with that of the closest cell membrane around it. It is possible that the cell membrane marker used for cell segmentation is not expressed by all cells. Therefore, for cells without any cell membrane marker expression, the nucleus mask is morphologically dilated a small amount u0 (1–10 pixels) to obtain an estimate of the cell membrane. u0 with its 9 pixels default value is one more algorithm parameter (Supplementary Table 2). In the opposite scenario, where due to the nature of the tissue section, a cell is present with a membrane but without a nucleus, we utilize \({{{{\bf{M}}}}}_{cell}^{g}\). Specifically, the skeleton of \({{{{\bf{M}}}}}_{cell}^{g}\) is computed and subtracted from \({{{{\bf{M}}}}}_{cell}^{g}\) itself. This operation naturally reveals the cell membrane contour within \({{{{\bf{M}}}}}_{cell}^{g}\), which we identify via computing the Euler number of its connected component. When the Euler number is zero and the area of the connected component exceeds half of the average area of nuclei, the connected component is identified as the segmented cell. Examples of cell segmentation are presented in Figs. 1, 4, and 6 as well as in Supplementary Figs. 4–10, 12, 13, and 15, where the contours of the segmented cells are outlined in green.Performance evaluationTo evaluate UNSEG performance and compare it with Cellpose25 and Mesmer26 results, we used the F1 score (or Dice coefficient) as the accuracy metric46. To compute the F1 score, we first estimated the true positive (TP), false positive (FP) and false negative (FN) values by comparing the predicted segmentation with the expert annotated ground truth and using intersection over union (IoU) as the threshold value46. The IoU threshold, ranging from 0 to 1, indicates how much of an overlap between the predicted segmentation and ground truth is considered a match, which is then used to estimate the number of TP, FP, and FN segmented objects. The F1 score is then given by$${F}_{1}=\frac{2\,TP}{2\,TP+FP+FN}.$$
(7)
Varying the IoU threshold from 0 to 1, gives us the corresponding F1 curve as a function of the IoU threshold.Statistics and reproducibilityStatistical robustness of UNSEG, and its reproducibility has been exhaustively tested on the GIT dataset and three publicly available datasets47,48.

Hot Topics

Related Articles