METASPACE-ML: Context-specific metabolite annotation for imaging mass spectrometry using machine learning

Metadata curation of public datasetsIn order to better classify and contextualize the datasets for selection, existing metadata fields and additional classifications were introduced. First, the metadata of a total of 9251 public datasets was downloaded using the METASPACE API (https://metaspace2020.readthedocs.io) on September 13, 2023. The metadata associated with each dataset includes sample information, sample preparation, MS analysis settings, annotation configuration, publication status, and project association. Using the sample information for each dataset, the organism field was manually standardized to either a genus/species level classification (e.g., Human to Homo sapiens, Mouse to Mus Musculus) whenever possible. In total, 275 different organism names were filtered down to 143 genus/species (Supplementary Data 1). Moreover, a kingdom metadata field was added based on the previous organism classification to further classify the organism into one (Animal, Plant, Bacteria, Fungi, Protista, Undefined) whenever possible (Supplementary Data 1). For datasets that are part of a project, using information from sample information, sample preparation, and manual inspection, an additional metadata field (Sample type) was added to classify datasets into one (Tissue, Cells, Whole Organism, Spheroids, Environmental, Spots). Keywords like biopsy or tissue section were used to classify datasets as tissue, while keywords like single cells, cell-monolayer, co-culture, and cultured cells were used to classify datasets into cells (Supplementary Data 2). Most sample type classification was curated for datasets associated with a project, however, since most datasets submitted to METASPACE are tissues, we considered the acquisition geometry of the ion images as a filter to classify non-project associated datasets into tissue based on the assumption that images with irregular geometries are mostly tissues. Ion images where the total pixel count is not equal to the product of the number of pixels in x and y coordinates are classified as irregular (Supplementary Data 3). Datasets that are not project-associated and don’t have an irregular geometry were classified as uncurated. Finally, datasets were also classified based on their m/z range, where datasets having maximum m/z ≤ 400 are classified as small molecules, and those having minimum m/z > 500 are classified as Lipids. The rest of the datasets with min m/z < 500 and max m/z > 400 are classified as “Lipids and small molecules” (Supplementary Data 2).Exclusion of non-centroided datasetsMETASPACE requires users to submit centroided datasets. However, some users, by mistake submit non-centroided profile datasets, which can lead to overselection of peaks. Such cases lead to suboptimal annotation because of the increased likelihood of isotopes of decoy ions matching shoulders of peaks or noise peaks. So, we have developed a strategy to identify and exclude such datasets.To avoid having to check the spectra for each pixel in each dataset among thousands of public datasets, for each dataset, we considered the top n = 50 pixels with the highest number of non-zero intensity m/z peaks. For each of those 50 pixels, we consider a vector of non-zero m/z values denoted by m = [m1, m2, m3, …, mN] where mi represents i’th non-zero m/z value with N peaks altogether in the spectrum. For the m/z tolerance m/z_tol_ppm used in METASPACE-ML defined in ppm (here we use 3 ppm), we can calculate the proportion of consecutive m/z peaks that are found within a given ppm by comparing the differences in m/z between consecutive peaks according to Eq. (1) below:$${proportion} \, {{\rm{\_}}}{overlap}=\frac{1}{N-1}{\sum}_{i=1}^{N-1} \quad\quad I \left(({m}_{i+1}-{m}_{i})\ge \frac{m/z{{\_}}{tol}{{\_}}{ppm}}{{10}^{6}}\right)$$
(1)
where I(.) is the indicator function returning 1 if the argument is true and 0 otherwise, and N is the number of non-zero m/z peaks. Then, for each dataset, we take the average proportion to overlap over the aforementioned 50 pixels as a heuristic score to flag datasets for exclusion. Based on the distribution of those scores overall public datasets, we considered datasets with a score > 0.5 for exclusion.Exclusion of low-quality datasetsWe have developed the following criteria to exclude low-quality datasets from training and testing sets: (1) The number of annotations at FDR 20% is less than 10 for each possible target adduct and annotation database combination, which resulted in the exclusion of 1418 datasets; (2) The median (across pixels) number of m/z peaks with non-zero intensity > 50,000, this led to the exclusion of 6 datasets; (3) The proportion_overlap score > 0.5 (see “Methods”) which led to the exclusion of 127 datasets. More information about the quality filters for all considered datasets can be found in Supplementary Data 4.Context-dependent selection of training and testing datasetsTo ensure a representative selection of different datasets from the Metaspace knowledge base, we first categorized all public datasets into contexts based on existing and newly added metadata (see metadata curation). The datasets were classified based on possible combinations of the following 6 variables: polarity, ionization source, mass analyzer, m/z class, sample type, and species. Ideally, a given context is defined by all 6 variables, however in very few cases some datasets might match to multiple contexts in cases where sample type is not well defined. In such cases, the matching will be made so that it considers combinations of the fixed 5 variables with all possible sample types (See Context explorer Shiny app for more information). Moreover, the selection of training and testing datasets were performed separately for animal and plant-based datasets, which constitute 92% of all public datasets. For each kingdom, the species variable was first grouped so that species with cumulative frequency of the bottom 10% will be grouped as “OTHER”. Also, we only considered the following sample types (Tissue, Whole organism, Cells, and Uncurated). To ensure the statistical reliability of the predictions, we only selected contexts with at least 45 datasets, where 30 would be used for testing and the rest for training and cross-validation. Given that the number of training datasets might be scarce for the model and to test the effect of underfitting vs overfitting, 5 different models were trained of varying numbers of datasets per context, starting from a minimum of 10, up to a maximum of 50 datasets per context. Accordingly, for some models, certain contexts are not represented due to the insufficiency of available datasets to cover both testing and training.Once the number of datasets for each context was determined, the next step was to optimize the sampling procedure to maximize the diversity of datasets within each context. Using information about the institute / lab (group) that submitted the dataset, the project ID (for project-associated datasets), and the submission day, we tailored the sampling procedure based on the following assumptions: (1) datasets submitted by the same group and project are more homogenous (2) datasets submitted in the same day are more likely to be biological / technical replicates and (3) datasets from different groups are more heterogeneous. For datasets not associated with a project, a pseudo-label for the project was created based on the submitter name and day of submission. Given the number of required datasets per context, a recursive sampling procedure is performed to iteratively sample datasets so that at least one dataset per project-group combination is selected from the context-specific datasets, in cases where the size of the sampling pool is more than the required datasets, the datasets are weighted and ranked by both the relative size of project-group combination as well as their Shannon’s entropy. Then, the remaining datasets are selected from the top-ranked list so that project-group combinations with maximum entropy and highest relative size are used to randomly select the remaining datasets. Based on this previous sampling procedure, 30 testing datasets per context were selected first for each kingdom, and then those datasets were removed from the selection pool for the training datasets to make sure that training and testing datasets were mutually exclusive. The selection of 30 datasets per context was chosen to strike a balance where including more datasets could lead to less diversity of contexts, whereas including fewer datasets might risk compromising statistical reliability due to a smaller sample size potentially leading to less robust evaluation. The final list of selected datasets for each context and kingdom can be found in Supplementary Data 5.Processing training data using the rule-based METASPACE and CoreMetabolomeThe selected training datasets were then reprocessed on the METASPACE server using the rule-based approach as previously described6 with minor modifications to their configuration file (Supplementary Note S1). The datasets were annotated against the CoreMetabolome database. The CoreMetabolome database is a molecular database specially designed and developed by us for METASPACE annotation. CoreMetabolome was designed to be large enough for untargeted spatial metabolomics, include only chemically plausible molecules, prioritize endogenous molecules over exogenous molecules, and include the molecules from primary metabolic pathways that are commonly abundant in different types of samples. HMDB (version 4) and KEGG were used as input databases and curated manually by an experienced chemist and mass spectrometrist. Supplementary Note S2 contains detailed information on the curation process.New ion scores quantifying the m/z error from the centroided dataIn addition to the scores used in the Metabolite Signal Match score (MSM) (spatial isotope rho_spatial, spectral isotope rho_spectral, and spatial chaos rho_chaos), we have introduced two new scores: m/z error abs (absolute m/z error) and m/z error rel (relative m/z error). These scores quantify the error in estimating the m/z value for an ion of interest compared to its theoretically defined value as follows:$$\underline{m} \,=\frac{{\sum }_{p=1}^{n} \, {m}_{p}{I}_{p}}{{\sum }_{p=1}^{n} \, {I}_{p}},$$
(2)
$$m/z{{\_}}{error}{{\_}}{abs}=1-{{|}}{\underline{m}}_{i=1}-{\hat{m}}_{i=1}{{|}}$$
(3)
$$m/z{{\_}}{error}{{\_}}{rel}=1- \left| \frac{{\sum }_{i=2}^{T} \, \left(\left({\underline{m}}_{i}-{\hat{m}}_{i}\right)-({\underline{m}}_{i=1}-{\hat{m}}_{i=1})\right) * \hat{{I}_{i}}}{{\sum }_{i=2}^{T} \, \hat{{I}_{i}}} \right|$$
(4)
where, for a given ion, mp is the m/z value in pixel p of the respective ion image, Ip is the corresponding intensity in pixel p, and n is the total number of pixels in a given ion image. Moreover, \({\hat{m}}_{i}\), \(\underline{m}\) and \(\hat{{I}_{i}}\) are the theoretical m/z value, the observed mean (across pixels), and the theoretical relative intensity of the i’th isotopic peak of that ion, respectively.), T is the total number of isotopic ion peaks considered (in METASPACE-ML we use T = 4).While m/z_error_abs quantifies the m/z error between observed and theoretical only in the first isotope, the m/z_error_rel quantifies the difference between observed and theoretical values for 2’th to T’th isotopes relative to the difference between observed and theoretical in the first isotope.FDR estimationIn the rule-based approach6, the target-decoy strategy was formulated so that for an MSM threshold, the ratio of positive decoys to all positives provides an estimate for FDR. To reduce the variability introduced by the random choice of decoy adducts, the decoy adducts were sampled SD = 20 times for each target adduct, taking the median value across SD rankings for each formula before applying monotonicity adjustments. In METASPACE-ML, we also sample decoy adducts SD times, yet we propose to use a single weighted ranking where decoys are weighted with \({1/S}_{D}\). First, for a database and each target adduct, both target ions and all sampled decoy ions are sorted in descending order based on the model prediction score. For each rank threshold i of sorted ions, we calculate Ti and Di which are the numbers of targets and decoys, respectively, with ranks smaller or equal than i. The FDR value for an ion with the rank threshold i was defined as$${{FD}}{{{R}}}_{{{i}}}=\frac{({{{D}}}_{{{i}}}+{{1}})/{{{S}}}_{{{D}}}}{({{{T}}}_{{{i}}}+{{1}})+(({{{D}}}_{{{i}}}+{{1}})/{{{S}}}_{{{D}}})}$$
(5)
A pseudocount of 1 was added to both Ti and Di as per the rule of succession to avoid misleading 0% FDR and for a better estimate of the mean of targets and decoys which have a binomial-like distribution.In summary, we introduced the following changes to the FDR estimation compared to the rule-based approach6: (1) Using a single weighted ranking, where decoys are given 1/SD of the weight, (2) Using a single selection of SD random decoys per formula which are shared between all FDR rankings, (3) Allowing for calculation of continuous FDR values for each ion instead of snapping FDRs to fixed thresholds (5%, 10%, 20%, 50%), and (4) Introducing a rule of succession where a pseudocount of 1 is added to the number of targets and decoys.These changes increase the computational performance. In the rule-based approach, SD random decoys would be sampled from a set of implausible adducts for each formula and target adduct (e.g., + H, + Na, + K). In METASPACE-ML, we propose instead to randomly sample SD decoy adducts per target formula and share them across all possible target adducts for that formula. This change should not affect the FDR rankings as they are statistically independent. However, this allows to reduce the calculations of scores as it produces fewer decoy ions overall. To show an example of calculated FDR alongside the scores, we provide (Supplementary Data 10) which presents the scores for both MSM and METASPACE-ML for both targets and decoys, along with their corresponding FDRs. For a given FDR threshold, such as 10%, a true positive is defined as a target ion with an FDR < 10%, while a false positive is a decoy ion with an FDR < 10%. This table includes the input data used to estimate the FDR for a specific group (+ Na adducts) within a particular dataset (https://metaspace2020.eu/dataset/2018-12-14_16h34m31s).Reliability score for target-decoy annotationsIn order to assess the reliability of the results provided in the target-decoy-based annotation and help select the optimal FDR threshold (e.g., from 5%, 10%, 20%, and 50%) we formulated a reliability score reliability_score. This score can help minimize false positives and maximize annotation coverage. Accordingly, we considered the concept of the F-score that estimates the balance between precision and recall. The most popular F-score is the F1-score which is the harmonic mean of precision and recall. However, given the inherent imbalance between target and decoy ions, as we sample n = 20 decoy ions for each target ion for robustness, there are more negative than positive instances in our data to be ranked, and thus precision is more important than recall given the higher number of false positives. Thus, we chose to optimize the F-beta score, which weights the contribution of precision vs recall based on the value of beta. Here, we choose beta = 0.5 to put more weight on the precision while still considering the recall; see Eq. (6). Then, using the Cutpointr R package12, we calculated the F-beta score for the annotation results where target and decoy ions are considered as positive and negative instances, respectively. For each dataset, we consider a vector for all possible FDR cutoffs denoted by f = [f1, f2, f3, …, fK] where fi represents the i’th FDR cut-off sorted ascendingly for K possible cutoffs. For each FDR cut-off, we get a corresponding F-beta score denoted as b = [b1, b2, b3, …, bK] where bi represents i’th F-beta score for K possible FDR cutoffs. Then, for each of the 4 default FDR thresholds considered in METASPACE (5%, 10%, 20%, and 50%), we calculate the reliability score in Eq. (8) as the product of two terms: (1) a ratio between the F-beta score at the FDR-cutoff closest to the chosen fixed threshold relative the maximum F-beta score and (2) complement of the optimal FDR-cutoff (FDR-cutoff at maximum F-beta score). The first term measures how far is the F-beta score at the chosen FDR threshold from the maximum possible F-beta score and has a scale of [0,1]. The second term penalizes the overall score based on the optimal FDR value: the lower the optimal FDR, the lower the penalty. Finally, since we calculate the reliability score for all four METASPACE-default FDR thresholds, we can select the minimum FDR threshold that has the maximum reliability score as the optimal FDR threshold at which the annotations are most reliable.$$F \, {beta}=\frac{(1+{\beta }^{2}) * {precision} * {recall}}{{\beta }^{2} * {precision}+{recall}} \, ; \beta=0.5$$
(6)
$${opti}{m}_{{fdr}} \,=\, f \, [{{argmax}}_{i} \, b[i\,]]$$
(7)
$${reliability}{{\_}}{score}= \frac{b[{{argmin}}_{i}\left|f[i]-{FD}{R}_{{METASPACE}{{\_}}{thresh}}\right|]}{max (b)} \\ * (1-{opti}{m}_{{fdr}})$$
(8)
where, f is a vector of all possible FDR-cutoffs sorted ascendingly, and b is a vector of the corresponding F-beta scores. FDR(METASPACE_thresh) is one of the four possible METASPACE-default FDR thresholds (5%, 10%, 20%, and 50%).Training and cross-validating the modelWe employed a ranking-based model using gradient boosting decision trees implemented using the CatBoost framework10. As input features, we used the original MSM features (spatial isotope, spectral isotope and spatial chaos) in addition to the newly introduced features (relative and absolute m/z error), five features in total. Two independent CatBoost models were built for animal and plant datasets independently. The models were first initialized using the “CatBoost” method using PairLogit as the loss function and fitted on 600 and 180 training datasets for 1000 iterations for animal and plant datasets, respectively. In each iteration, the decision trees are built in such a way that it improves the previous trees’ output based on a loss function. For each dataset, a decision tree at a specific iteration scores all decoys and target ions based on their feature scores, using combinations of pairwise objects where one is considered a winner (target ion) and the other a loser (decoy ion). The pairlogit loss function13 selects the best tree that maximizes the positive difference between the tree score for the target vs decoy ion. To evaluate the model and ensure that it was not overfitting, we performed cross-validation where the training datasets were split into 5 splits, where 80% of the dataset were used for training and the remaining 20% were used for evaluation (see Evaluation metrics) while maintaining the relative size of each context constant. The final prediction score of the model was scaled [0,1] using min-max scaling based on the leaf values of the decision trees so that scores closer to 1 denote high-confidence annotations and vice versa.Evaluation metrics and annotation database comparisonIn order to evaluate the performance of METASPACE-ML compared to the rule-based approach, we compared how well their respective scores were able to rank target ions relative to decoy ions. We used Mean Average Precision (MAP), a commonly used metric that provides a comprehensive evaluation of the ranking accuracy and precision14. Precision is defined as the number of targets in the top “k” ions of a ranked list, divided by k. Then, the average precision for each ion in the ranked list is calculated followed by taking the mean of these average precision over all datasets. MAP scores were calculated for each cross-validated dataset for each of the 5 splits (see Training and cross-validation) and for the testing datasets.While MAP and PR-AUC evaluate the ranking quality of METASPACE-ML, they do not necessarily quantify the desired increase in the number of annotations. So, we calculated the relative fold change and the log 10 difference in the number of target ions captured relative to the rule-based approach at specific FDR thresholds (5%, 10%, 20%, 50%). In addition, we calculated ion coverage for the testing datasets to determine the percentage of ions in a testing dataset that the model has encountered in any training dataset. Supplementary Note S3 provides detailed information on the calculation of coverage, and the scores are provided for all testing datasets in Supplementary Data 5.Selection of datasets for database comparisonIn order to appropriately evaluate the performance of METASPACE-ML and ensure its ability to generalize to new unseen data aligning with the type of datasets represented in METASPACE and annotated with different databases, we have selected 389 datasets (Supplementary Data 6) in a context-independent manner having the same configuration as mentioned in Supplementary Note S1. All 389 datasets were annotated against 3 different databases: CoreMetabolome, SwissLipids(2018-02-02), and HMDB (version 4), and 206/389 were also annotated using KEGG (version 1).Separation of target-decoy and feature importanceAn important aspect of the ideal annotation score is its ability to optimally differentiate between targets and decoys. In order to examine this separation per dataset, we calculated three original rule-based scores plus the new m/z error features for each ion and projected both target and decoy ions onto a two-dimensional space using UMAP implemented in the M3C R package15 using the default configuration parameters except for n_neighbors which we set to 20 to align with the fixed decoy sampling size of 20 during FDR estimation. In addition, to evaluate which features are most important in driving the METASPACE-ML model’s prediction, we used SHAP values from the Shap python package16. SHAP values quantify the contribution of each feature to the final prediction score for a given ion in each dataset. Given the additive property of the SHAP values, the contribution of each feature to the final prediction was taken as the ratio of the absolute SHAP value per feature to the sum of absolute SHAP values across all features for each ion. This was performed on 720 testing datasets, and the results were aggregated per dataset using the median of SHAP contribution and visualized as either a density heatmap from ComplexHeatmap R package17 or a ridge plot from ggridge R package18.Testing for difference in intensities of annotationsTo further investigate the characteristics of the newly captured ions by the METASPACE-ML model, we compared the intensity distributions of those ions to the ones captured by both METASPACE-ML and the rule-based approach for animal datasets. Using METASPACE API (https://metaspace2020.readthedocs.io), we retrieved the ion images for each target ion in a given dataset and calculated the 99% percentile intensity across all pixels, followed by Log 10 transformation and taking the median across all ions captured at specific FDR threshold (default 10%). The distribution of the median-transformed intensities per dataset for each approach (METASPACE-ML and MSM) was compared using the Wilcoxon test. In addition, we used the ggbetweenstats function in the ggstatsplot R package19 to perform the pairwise Mann-Whitney test between the intensity distributions of ions only captured by METASPACE-ML compared to either all ions captured by METASPACE-ML or only ions captured by the rule-based approach for a given dataset. P-values were adjusted using the Benjamini-Hochberg correction20.Enrichment analysisTo learn more about the types of metabolites that were only picked up by METASPACE-ML, we performed a hypergeometric test to identify the molecular classes that were enriched in those metabolites. Accordingly, we retrieved the class and subclass information for all annotated metabolites from the HMDB database (version 4) and used the HMDB “subclass” as background for enrichment. Then, for each dataset and subclass, we first filtered annotations with 10% FDR and then performed a two-tailed Fisher exact test where we considered the log fold enrichment as described in ref. 21 as a proxy for the enrichment score. Finally, we filtered significantly enriched terms (p-value < 0.05), and only terms that were enriched in at least 10% of the total number of input datasets were considered for visualization purposes. The complete enrichment results per dataset, context, and term can be found in (Supplementary Data 7).LC-MS/MS bulk validation of METASPACE-ML annotationsLC-MS/MS analysis was performed on an Agilent 1260 liquid chromatography (LC) system (Agilent, CA, USA) coupled to a Q Exactive Plus Orbitrap high-resolution mass spectrometer (Thermo Scientific, MA, USA) in positive ESI (electrospray ionization) mode.Chromatographic separation was carried out on an Ascentis Express C18 column (Supelco, PA, USA; 100 × 2.1 mm; 2.7 µM) at a flow rate of 0.25 mL/min. The mobile phase consisted of water:ACN (40:60, v/v; mobile phase phase A) and IPA:ACN (9:1, v/v; mobile phase B), which were modified with a total buffer concentration of 10 mM ammonium formate + 0.1% formic acid. The following gradient was applied (min/%B): 0/10, 1/10, 5/50, 10/70, 18/97, 23/97, 24/10, 28/10. Column temperature was maintained at 25 °C, the autosampler was set to 4 °C and sample injection volume was 10 µL. Analytes were recorded via a full scan with a mass resolving power of 70,000 over a mass range from 150–900 m/z. MS/MS fragment spectra were acquired at 35,000 resolving power and stepped collision energies [%]: 10/20/30. Ion source parameters were set to the following values: spray voltage: 4000 V, sheath gas: 30 psi, auxiliary gas: 10 psi, ion transfer tube temperature: 280 °C, vaporizer temperature: 280 °C.Data was processed using MS‑DIAL 4.9.22121822. Feature identification was based on the MS-DIAL LipidBlast V68 library by matching accurate mass (m/z tolerance: 0.005), isotope pattern, and MS/MS fragmentation (m/z tolerance: 0.02) data (matching score threshold: 90%). To remove misannotations and to enhance confidence in lipid identification, intra-class elution patterns of lipid species were checked for consistency by relying on the expected chromatographic behavior on reversed-phase columns within homologous lipid series, considering carbon chain length and degree of saturation as the main factors23.For each of the 10 datasets (Supplementary Data 8) from6, we matched the formula and adduct captured by either METASPACE-ML or the rule-based approach at 10% FDR with the LC-MS/MS matched formula + adduct. Only positive adducts ( + H, + Na, and + K) were considered. Accordingly, for each dataset, we got a 2 × 2 contingency table where true positives (TP) are defined as the set of ions that have < 10% FDR and bulk-validated, false positives (FP) are those that have < 10% FDR but not bulk validated, false negatives (FN) are those that have > 10% FDR and bulk-validated, and true negatives are those that > 10% FDR and not bulk validated. We calculated these metrics in either an exclusive manner for each approach (i.e., considering ions that are < 10% FDR for one approach but not the other) or inclusive. Finally, we report the common diagnostic metrics: true positive rate (TPR), false positive rate (FPR), and false negative rate (FNR) for each dataset.Hardware configuration for model trainingModel training was executed on the EMBL in-house high-performance computing cluster. The training utilized AMD Epyc CPU nodes, each equipped with up to 128 cores, 256 threads, and a maximum of 384 GB of memory. Each training and evaluation job demanded a maximum of 50 and 30 GB of memory per core, and it was parallelized by splitting the job into 4 and 2 tasks, with each task employing 4 and 2 processors on the same node, respectively.Context Explorer Shiny appTo assist the user in the assessment of the reliability of prediction results per context and to provide more information about coverage, we developed the “METASPACE-ML: Context Explorer” web app (https://t.ly/q-nb5) using the R Shiny framework. The interactive web app allows the user to match the context closest to their dataset or just choose one or more contexts to further explore based on the 6 variables defined in (the context-dependent selection of training and testing datasets). Once the selected context(s) has been defined, the user will be able to view and download the corresponding training and testing datasets covering the selected context(s) and will be able to view all the evaluation and enrichment plots pertaining to datasets in such context(s). The app is only intended to view and interact with the results of the model on testing datasets in the specified context(s). It is not designed to predict outcomes for new datasets in similar contexts.Statistics and reproducibilityAll statistical analyses were performed using R version 4.1.2 and Python version 3.8.13. The specific tests used are detailed within the respective figure legends. Unless otherwise stated, data are presented as the median with interquartile range. For comparisons between the two groups, a two-tailed Wilcoxon signed-rank test was used. For comparisons involving more than two groups, the Kruskal-Wallis test was employed. For repeated measurements, a paired Wilcoxon signed-rank test was applied. P-values were corrected for multiple comparisons using the Benjamini-Hochberg procedure unless stated otherwise. A p-value of less than 0.05 was considered statistically significant unless stated otherwise. No statistical method was used to predetermine the sample size. No data were excluded from the analyses unless specified otherwise. The experiments were not randomized; The Investigators were not blinded to allocation during experiments and outcome assessment.Reporting summaryFurther information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Hot Topics

Related Articles