Rational selection of morphological phenotypic traits to extract essential similarities in chemical perturbation in the ergosterol pathway

Strains and mediasSaccharomyces cerevisiae Y8835 (MATα ura3∆0:: natMX4 can 1∆:: STE2pr-Sp_his5 lyp1∆ his3∆1 leu2∆0 met15∆0 LYS2), which is derived from S288C, was used as wild-type strain25. C. albicans ATCC24433 was obtained from Medical Mycology Research Center, Chiba University. Media and growth condition for S. cerevisiae is as described previously19. C. albicans was grown at 35°C on Sabouraud agar media (1% (w/v) Bacto-peptone (BD Biosciences, CA, USA), 4% (w/v) Glucose (Fujifilm Wako Pure Chemical Corporation, Osaka, Japan), and 1.5% (w/v) Agar). For Candida susceptibility testing, RPMI 1640 (Fujifilm Wako Pure Chemical Corporation) was adjusted to pH 6.9 by addition of 165 mM 3-(N-morpholino) propanesulfonic acid (Fujifilm Wako Pure Chemical Corporation).Drug and reagentsTBF (Tokyo Chemical Industry, Tokyo, Japan), FCZ (Tokyo Chemical Industry), AMF (Tokyo Chemical Industry), MFG (Funakoshi, Tokyo, Japan) and echinocandin B (ECB, Chugai Pharmaceutical Corporation, Tokyo, Japan) were all dissolved in dimethyl sulfoxide (DMSO; Fujifilm Wako Pure Chemical Corporation). The concentrations of the drugs are described in Supplementary Table S1.Antifungal susceptibility testing for S. cerevisiae
Antifungal susceptibility against S. cerevisiae wild-type strain Y8835 was examined using a method described previously25. The concentrations of each drug for Y8835 testing were as follows: FCZ, 0, 1.56, 3.13, 6.25, 12.5, 25, 50, and 100 μg/mL; TBF, 0, 8, 16, 32, 64, 128, 256, and 512 μg/mL; AMF, 0, 0.0313, 0.0625, 0.125, 0.25, 0.5, 1, and 2 μg/mL; ECB, 0, 0.5, 1, 2, 4, 8, 16, and 32 μg/mL. The optical density at 600 nm was measured using the SpectraMax plus 384 plate reader (Molecular Devices, San Jose, CA, USA). The half maximal inhibitory concentration was estimated using the Markov Chain Monte Carlo method (5000 iterations including the first 2000 iterations as warm-up in eight chains each) in the rstan package (https://mc-stan.org/users/interfaces/rstan) using reparameterizing of the four-parameter log-logistic equation implemented in the drc package in R26.Checkerboard assayThe synergy of test compounds and the FIC index was examined by the checkerboard method27. Test dilutions were selected based on the minimum inhibitory concentration (MIC) of each substance. The strains were exposed to various concentrations of FCZ (0, 1.56, 3.13, 6.25, 12.5, 25, 50, and 100 μg/mL) in combination with TBF (0, 8, 16, 32, 64, 128, 256, and 512 μg/mL), AMF (0.0313, 0.0625, 0.125, 0.25, 0.5, 1, and 2 μg/mL) or ECB (0, 0.5, 1, 2, 4, 8, 16, and 32 μg/mL.). The FIC index was calculated using the following formula: FIC index = FIC index (A) + FIC index (B) = [MIC (combination) / MIC (A alone)] + [MIC (combination) / MIC (B alone)]. Synergy was defined as a FIC index of ≤ 0.5. The test was performed in 96-well microtiter plates containing yeast peptone dextrose supplemented with drugs in serial concentrations. Fungal suspensions were inoculated at a cell density of 1 × 105 cells/mL. Plates were read after incubation for 18–24 h at 30°C. Each test was performed in triplicate.Resazurin cell viability assaySusceptibility tests with C. albicans were performed using the Clinical and Laboratory Standards Institute document M60. Briefly, C. albicans was grown on Sabouraud glucose agar for 24 h at 35°C, washed with 0.85% saline, and diluted in RPMI 1640 to 5 × 103 cells/mL. Diluted cells and 3% DMSO (with/without drugs) were added to 96-well round-bottom microplates and incubated at 35 °C in a static incubator. The drug concentration was used as same as checkerboard assay. The fungal cell killing was determined by the resazurin cell viability assay as described previously28. After incubation of C. albicans ATCC24433 with drugs at various concentrations for 24 h, 10 μL (2.1 mg/mL) resazurin was added to each well and incubated for 24 h. The color of each well was determined visually; a blue or purple color was interpreted as the absence of metabolic activity and dead cells, whereas pink indicated the presence of living fungal cells.Fluorescence stainingFluorescence staining was performed as described previously19. Briefly, cells of yeast strains were cultivated until the early log phase (< 5.0 × 106 cells/mL) and fixed in medium containing 3.7% (w/v) formaldehyde (Fujifilm Wako Pure Chemical Corporation). The fixed cells were triple-stained with fluorescein isothiocyanate-conjugated concanavalin A (Sigma-Aldrich Co. LLC, St. Louis, MO, USA) for the cell wall, rhodamine–phalloidin (Invitrogen, Carlsbad, CA, USA) for the actin cytoskeleton, and 4,6-diamidino-2-phenylindole (Sigma-Aldrich Co. LLC) for nuclear DNA.Microscopy and image acquisitionFluorescence image of the cells were acquired using a microscope (Axioplan 2, Carl Zeiss, Oberkochen, Germany) equipped with 6100 Ecplan-Neofluar (Carl Zeiss), a cooled-charge-coupled device camera (CoolSNAP HQ; Roper Scientific Photometrics, Tucson, AZ, USA), and AxioVision software (Carl Zeiss).Image quantification and analysisCalMorph (version 1.2) was used for quantitative analysis of the images. Only those experiments containing at least 123 cells were considered for subsequent analysis. CalMorph generates 501 morphological parameters related to the cell cycle phase, actin cytoskeleton, cell wall, and nuclear DNA (see descriptions for each parameter at http://www.yeast.ib.k.u-tokyo.ac.jp/CalMorph/index.html). The appropriate probability distributions were reported previously for only unimodal parameters (490) using the the UNImodal MOrphological [UNIMO23] data pipeline. Briefly, CalMorph parameters were first categorized according to their data type (including the non-negative, ratio, coefficient of variation, and proportion parameters). Then, the best probability distribution was defined for each parameter using the Akaike information criterion (AIC). In this analysis, we used these 490 morphological unimodal parameters because the optimal probability distributions have been defined for them, allowing for highly accurate statistical analysis.Statistical modeling to assess the effects of drugsTwo datasets were used to analyze the morphological information of cells treated with AMF, FCZ, and TBF. Data Set 1 has relatively high correlation because it is data from 10 repeated experiments. Data Set 2, on the other hand, has relatively low correlation because it is data from five repeated experiments. All statistical analyses were performed using R (http://www.r-project.org). To assess the effects of each drug on cell morphology, a regression GLM29 was built. The cell morphological traits were compared with the corresponding WT distribution (i.e., null distribution) in each model. The Wald test (two-sided) was used to transform each morphological parameter to a Z value using the summary function in the gamlss package30. The false discovery rate (FDR) was finally estimated using the qvalue package31.Projection of morphological dataTo normalize the morphological data, PCA was performed based on the Z values (Wald test) of 109 WT replicates23 in 490 unimodal parameters using the prcomp function in the stats package (Supplementary Fig. 2A). Then, to obtain orthogonal morphological data, the Z values of the unimodal parameters of 4708 non-essential mutants (Supplementary Fig. 2B)23, ergosterol synthesis inhibitors (Supplementary Fig. 2C), and glucan synthase inhibitors (Supplementary Fig. 2D) were projected by PCA of the 109 WT replicates. Orthogonal PC space was used to check the similarity of morphological profiles induced by the inhibitors by calculating the Pearson’s correlation coefficients (r) of the PC scores.Selection of important parametersTo select important parameters, reliable PC spaces were first defined. For this purpose, a ratio of variance of 4708 non-essential mutants to 109 WT replicates was used to extract biologically meaningful signals (Supplementary Fig. 3A). A locally estimated scatterplot smoothing (LOESS) regression was fitted to the ratio values (Supplementary Fig. 3B). The best smooth span for the regression model was defined as follows30: gamlss (y~lo(~x, span = f), data = data, family = NO), where y is the ratio of variance of 4708 non-essential mutants to 109 WT replicates, x is the ordinal number of PC spaces, and f is the smooth span of the LOESS regression (ranging from 0.10 to 0.99). The best fit was chosen using the AIC. Then, the slope of the LOESS regression was calculated between successive PC spaces. Finally, the first 40 PCs (CCR = 83.64%) were considered reliable PC spaces in which the slope of the regression line was not constantly positive (Supplementary Fig. 3C). That is, the variance of 4708 non-essential mutants was not always higher than that of WT between PC1 and PC40, was always higher than that of WT from PC41 onward. Morphological similarity in fact increased after removal of noise (Supplementary Fig. 4). Next, a set of PC spaces that best distinguished drug-treated cells from WT cells were extracted. To avoid overfitting, each observation was considered as an independent entry (n = 130 and 70 for ergosterol synthesis inhibitors and 1,3-β-glucan synthase inhibitors, respectively), and Z values were calculated for each, as described above in the Statistical modeling, to assess effects of drugs subsection and projection of morphological data.To determine the importance of each PC space to differentiate drug-treated cells from WT cells, two logistic regression models (one for ergosterol synthesis inhibitors and one for 1,3-β-glucan synthase inhibitors) were built using the brglm2 package32 using 60% of the data to train the model and 40% to test the model. The process was repeated 5000 times with random sampling in each model, and the best model was selected using the AIC in a stepwise algorithm (step function in R) in each iteration. The frequency of each PC space (Fig. 4A) and accuracy of the model (Supplementary Fig. 5) were calculated in these 5000 final models. Finally, the best frequency was chosen by defining various thresholds (t) and fitting a logistic model with the PCs that were observed at least t times. The best threshold was selected based on the AIC of fitted logistic regression. This threshold was 1930 times for ergosterol synthesis inhibitors including 19 PCs (CCR = 41.51%; Supplementary Fig. 6A) and 750 times for 1,3-β-glucan synthase inhibitors including 7 PCs (CCR = 30.91%; Supplementary Fig. 6B).Canonical correlation analysisCCA was used to explore the relationships between two multivariate sets of variables (morphological profile and functional profile) to find linear combinations of phenotypic traits and gene function features, as described previously22,23. CCA also helps to avoid possible overfitting that may occur by PC space selection.Morphological profileTo examine common defects in the ergosterol pathway without bias against specific drug effects, we chose erg28∆. First, 10 replicates of erg28∆ (ERGosterol biosynthesis gene) and 10 replicates of WT cells were stained (see Fluorescence staining). Then, microscopy images (see Microscopy and image acquisition) were quantified to extract morphological traits (see Image quantification and analysis). Z values of these replicates were estimated as described in Statistical modeling to assess drug effects section. Finally, the morphological profile of 3917 non-essential mutants (see below) were subjected to PCA to provide phenotype principal components [pPCs] and Z values of 10 replicates of erg28∆ and WT replicates were projected using the PCA of the non-essential mutants. The non-essential data set also contains one replicate of erg28∆, thus, in total, 11 replicates of erg28∆ mutants were used.Functional profileThe basic GO files were first downloaded from the GO Consortium and gene annotations were downloaded from the Saccharomyces Genome Database (SGD). Then, a Boolean matrix of GO terms was generated with a value of TRUE if a gene was annotated by a GO, otherwise the value was FALSE. Next, a GO slimmer process was performed as follows: Removal of global GO terms (i.e., annotated for more than 200 genes); removal of GO terms with identical sets of annotated ORFs; removal of unique GO terms (i.e., annotated for less than two genes); and exclusion of genes with no annotations. Finally, 3,568 GO terms of unique annotations for 3,917 genes were obtained. Next, PCA was performed to reduce the dimensionality of the functional profiles33, with some modifications to provide GO term principal components. This method preserved the structure of the functional relationships among the genes while reducing the dimensionality33. Finally, a zero matrix of the WT replicates was projected by the obtained PCA. CCA was finally performed to compress phenotype principal components (CCR = 95%) and GO term principal components (CCR = 99%) of the 3,917 genes into linear combinations using the cancor function of the R stats package. Bartlett’s Chi-squared test was used to check the significance of the canonical correlation coefficients. Ultimately, 28 morphological features (pCVs) and 28 gene function features (gCVs) were obtained (P < 0.05; Supplementary Fig. 7).To obtain the best set of CVs to differentiate 11 replicates of erg28∆ mutants from WT replicates and other non-essential mutants, a logistic regression model was built using the brglm2 package32 as mentioned above. This was conducted using the morphological data of the ergosterol synthesis inhibitors as training data and 6000 times random sampling from the inhibitor data as a training data set and applying a stepwise algorithm (step function in R) in each iteration to select the best set of CVs. Finally, the best set of 7 CVs were chosen by the AIC (Supplementary Fig. 8). The precision was 0.905 with 11 replicates of erg28∆ and 10 replicates of WT as a test data set (Supplementary Fig. 8). For visual presentation, pCV1 and pCV2 among the 7 CVs were selected as having clear separation of erg28∆ replicates in this seven-dimensional space (Fig. 5A).

Hot Topics

Related Articles