AutoGater: a weakly supervised neural network model to gate cells in flow cytometric analyses

Experiments to generate flow cytometry and CFU training dataAll experiments used the S288C wild-type strain of S. cerevisiae23. In order to train the model, we treated cell populations to two stressful conditions that are capable of killing cells in the population: high ethanol levels and high temperature. We varied the level of these stressors in hopes that we would see increasing levels of cell death within the population as the stress was increased. Before treatment with various ethanol concentrations or temperatures, yeast cells were grown in liquid rich YEP medium (1% yeast extract, 2% peptone, 0.012% adenine, 0.006% uracil) containing 2% dextrose (YEPD) in a water bath shaker at a temperature of 30 °C until a cell density of ~ 3e6 cells/mL was reached. Cell density was estimated using a hemocytometer on a light microscope. Upon reaching the required cell density, cells were either treated with various ethanol concentrations (0%, 5%, 10%, 12%, 15%, 20%, and 80%) or temperatures (25 °C, 30 °C, 35 °C, 40 °C, 45 °C, 50 °C, 55 °C, and 65 °C). Ethanol treated cells were kept at a temperature of 30 °C.The sampling regime for ethanol treated cells was 0, 0.5, 3 and 6 h, while the sampling regime for temperature treated cells was 0, 0.5, 1, 2, 3, 4, 5, and 6 h. The time course and gradient of treatments resulted in 28 and 64 different conditions for ethanol and temperature, respectively (Fig. 2A). Effectively, this created a gradient consisting of 28 and 64 samples with potentially different levels of cell death in the population, which we believe should capture both dead and dying yeast cells. Given the high resolution here, all experiments were run with a single replicate to see if our model could capture trends across these conditions.Figure 2AutoGater labeling and training framework. (A) Gradients of two modes of killing (heat and ethanol) were used to generate data for training. Four conditions were used for training and all remaining were used for test. (B) AutoGater’s two stage framework first predicts the set of held out conditions based on the weak labels using a random forest classifier (RFC), and then adjusts those predictions based on the CFU data at that condition using a neural network. (C) The three different methods provide three very different assessments of percentages of live cells for the same sample. Methods are needed to harmonize across these different assessments. (D) Architecture of a neural network that takes non-color channels from a flow cytometer as input (FSC, SSC) and is trained to jointly optimize cell-based and population based notions of death.In an effort to compare multiple methods for assessing cell death, each sample was split into 3 samples: one for inoculating a plate with ~ 300 cells for subsequent CFU counting, one for staining with SYTOX Orange Nucleic Acid Stain (Invitrogen) and the last kept untreated with SYTOX. One mL of sample was aliquoted from the main culture into a microcentrifuge tube for both stained and unstained samples. Thesetwo samples, were then analyzed on an Attune NxT Flow Cytometer (ThermoFisher Scientific) with 488 nm and 561 nm lasers and fluorescent protein filter kit. The YL1 detector with the 585/16 band pass filter was used in the acquisition of SYTOX Orange fluorescence. To plate ~ 300 cells onto a YEPD plate for CFU counting, the cell density at the time of sampling was determined using the same method as described above. The density measurement was then used to determine how much sample volume was needed or if a dilution of an aliquot from the main culture was needed to plate ~ 300 cells on a YEPD plate. The plate was placed into an incubate, set at 30 °C, for 48 h. After the 48 h incubation, colonies were counted.Two stage training of neural network with custom objective functionThe neural network model to gate cells goes through two stages of training. The first stage learns weights from a set of weak labels on events, while the second stage updates the weights based on information from CFU assay (Fig. 2B). Another description of the two stages is that the first stage provides a set of preliminary labels to events that are then modified based on information from the additional assay.Stage 1: Weakly supervised model for annotating dead cellsThe first stage of training relies on a set of weak labels attached to every event. In the case of the live/dead phenotype, we select a subset of experimental conditions to serve as “live” and “dead” conditions. Most of the 28 and 64 different conditions for ethanol and temperature contained a mixture of live and dead cells making this labeling effort by hand impossible. However, a subset of the conditions did contain mostly, if not all, live or dead cells. We used the “gold standard” of CFU to identify these conditions. The conditions containing all live cells were the 6-h timepoints of 25 °C, 30 °C and 35 °C for the temperature treatments and the 0-, 0.5-, 3-, and 6-h timepoints of the 0% ethanol treatment. These are typically permissive conditions for yeast cell growth. The conditions containing mostly dead cells (as judged by CFU) were the 6-h timepoints of 55 °C and 65 °C for the temperature treatments and the 6-h timepoints of the 20% and 80% ethanol treatments. Both the 20% and 80% ethanol conditions were used as dead labels as we wanted to include extreme (80%) and milder conditions to induce death (20%).The final time point of 0/5% ethanol was selected as LIVE labels and the final time point of 20%/80% ethanol was selected as DEAD (Fig. 2A). A similar labeling was provided with heat killed samples, namely the final time point of the first two and final time point of the last two gradients were selected as LIVE and DEAD, respectively. These were selected to provide the model with some very healthy, and most likely healthy cells, as well as very dead and dying/dead cells. The labels are called weak labels because the underlying hypothesis of the training regimen is that the majority of events will be labeled correctly, but it does not require all the events to be labeled correctly. In other words, we select control conditions that will have the expected majority of cells to be alive and dead. The reason we select the two highest gradients of treatment is to provide the model context with potentially different properties of dead—ones that either change or keep the cell’s morphology intact. A model is then trained with six channels as input (FSC-A, FSC-H, FSC-W, SSC-A, SSC-H, SSC-W) and is applied to all other conditions to provide all events with a preliminary set of labels. This model can be a neural network or any other machine learning model that can provide a set of preliminary labels to data from all other conditions. We chose a random forest classifier (RFC) for this first stage as it was significantly faster for training and inference. This model can be compared to stain based approaches of identifying live and dead cells.Stage 2: Training neural network with custom CFU-based objectiveThe second stage of training uses a fully-connected neural network model to update the labels based on a CFU measurement per condition (Fig. 2D). Specifically, the loss function, L, of the model can be divided into two parts:$$ {\text{L}}\left( {{\text{y}},{\text{y}}^{\prime } ;{\text{CFU}},{\text{condition}}} \right) \, = {\text{ BCE}}\left( {{\text{y}},{\text{y}}^{\prime } } \right) \, + \, \left| {{\text{sum}}\_{\text{i}}\left( {{\text{y}}_{{\text{i}}}^{\prime } } \right)/{\text{number}}\_{\text{of}}\_{\text{events }} – {\text{ CFU}}} \right| $$where y is the weak label from the condition, y’ is the predicted, CFU is the CFU measurement at each condition, BCE is the binary cross-entropy loss, and number_of_events is the number of events in the batch. The first term seeks to reproduce the labels that were learned by the first stage model while the second term updates the weights based on context from the CFU from each condition. Given the objective selects a step per batch, the batch size for this model needs to be larger than what one would usually use (we used 2048 vs. common batch sizes of 32 or 64). This ensures that each batch has a sufficient number of events per condition. We do not require each condition to be in the batch, but do want about 30 events from a condition in the batch as the model would compute standard statistics like means and variations for each batch. We found that setting the batch size to 2048 would ensure that. The model can now be used on any other data to annotate live and dead events.Evaluation of AutoGater with held out experimentsGiven two stages of training and the abundance of data generated from a large corpus of experiments, we evaluated the model in two ways:

Event based comparison with stained data: Event based comparison of model to manually using a Sytox stain. In this test, the model was trained with data that was stained, but the color channel that measured the stain was not provided as a feature to the model for training. In this way, we could identify the events that the model predicts as live and dead and correlate those events with the stain color channel.

Population based comparison with unstained data: Another goal of the modeling framework was to harmonize the different methods in which cells are considered alive and dead. In this case, we trained a model with unstained data and compared percent live and dead to the same experimental conditions with stained and CFU data. In this way, we expect to see AutoGater’s predictions fall in between the stained and CFU percentages as the loss function jointly optimizes both terms.

As a reminder, the modeling framework presented selects a subset of conditions to label as live and dead. A model is trained with those conditions and then evaluated on all held out conditions. The best model here was selected for the Stage 1 model. The Stage 1 model was then fine-tuned with the CFU measurements collected for all conditions. The fine-tuning updates the labels to harmonize the definition of live/viable cells across both stain based and CFU measurements. Conditions that measured CFU were never held out of fine-tuning. The final Stage 2 model then predicts a live/dead label for an event based only on the FSC and SSC channels. We should make clear that while the experiments were conducted over time for both heat and ethanol treated cells, time was never a feature that was input to the model. Thus, any temporal trends discovered by the model are an indirect artifact of the FSC and SSC features, as well as the CFU measurement at the point in time.

AutoGater: a weakly supervised neural network model to gate cells in flow cytometric analyses

In Science Journals | Science

Ensembling methods for protein-ligand binding affinity prediction

Yvonne Perrie: ‘Good research culture is about being able to learn and fail without judgment’ | Opinion

Online archive of Humphry Davy’s notebooks opens to the public | News

Women stay in science far longer than thought, study of OECD countries suggests | News

Hot Topics

In Science Journals | Science

Ensembling methods for protein-ligand binding affinity prediction

Yvonne Perrie: ‘Good research culture is about being able to learn and fail without judgment’ | Opinion

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

In Science Journals | Science

Ensembling methods for protein-ligand binding affinity prediction

Yvonne Perrie: ‘Good research culture is about being able to learn and fail without judgment’ | Opinion

Online archive of Humphry Davy’s notebooks opens to the public | News

Popular Articles

In Science Journals | Science

Ensembling methods for protein-ligand binding affinity prediction

Yvonne Perrie: ‘Good research culture is about being able to learn and fail without judgment’ | Opinion