Retention time dataset for heterogeneous molecules in reversed–phase liquid chromatography

Repository and data overviewThe dataset is publicly accessible through the Science Data Bank25 at https://doi.org/10.57760/sciencedb.15823. It is organized into 30.xlsx files, each corresponding to a unique CM run. Each file contains two worksheets. The first worksheet in each file is dedicated to RT data, where molecules are identified using isomeric SMILES strings encoded to represent their molecular structures. To ensure consistency, all SMILES strings adhere to the PubChem standardization procedure26. RT data for all observed molecules were recorded in MCMRT, including those with RTs close to the dead time. The RT values provided are the averages of three replicate analyses. Additionally, the relative standard deviation (RSD) between the three replicates is included to indicate method variability and support data quality. Furthermore, the repository offers extensive molecular data, including InChI codes, IUPAC names, MCMRT numbers, CAS numbers, PubChem numbers, and chemical formulas. The second worksheet provides comprehensive chromatographic information, including details on data sources, instruments used, analytical columns, temperatures, mobile phases, gradient profiles, runtimes, flow rates, and dead times used to calculate retention factors. Retention factors are also provided in the first worksheet. This thorough documentation ensures the dataset’s robustness and utility for researchers.Data descriptionThe MCMRT repository currently houses 10,073 RT entries, encompassing 343 unique molecules and 30 different CMs. These CMs utilized RP columns, specifically six different C18 columns with varying dimensions (50–150 × 2.1–4.6 mm) and particle sizes (1.7–5 μm). Except for the Thermo Hypersil GOLD column (100 × 2.1 mm, 1.9 μm) and the Acclaim 120 C18 column (4.6 × 150 mm, 5 μm), all columns were new at the time of use. To ensure proper equilibration, two blank gradient runs were performed prior to each CM run. Among the published datasets22, the most frequently utilized columns were the Waters ACQUITY UPLC BEH C18 and Waters ACQUITY UPLC HSS T3, both included in MCMRT. The gradient profiles were designed with both single and multi–slopes, employing either isocratic or gradient flow rates ranging from 0.2 to 1 mL/min. While constant flow rates are more common in RPLC, gradient flow rates were included to explore their potential effects on RTs. This approach was inspired by the work of Gago-Ferrero et al.24 who introduced flow rate variations in their CMs, creating a widely used dataset for suspect and non-target screening of environmental samples10,11,27. Total run times for these methods varied from 10 to 100 min. The column temperatures were varied between 30 °C and 45 °C to optimize separation efficiency. Regarding the mobile phases, 18 CMs utilized a water/MeOH (90:10, v/v) mixture for mobile phase A, 12 utilized water for mobile phase A, 24 used MeOH for mobile phase B, and 6 chose ACN for mobile phase B. While ACN generally offers higher efficiency, we used MeOH in most CMs based on initial experiments indicating that RT variations were more influenced by additives than the solvent itself. This choice was also guided by the work of Gago-Ferreroa et al.24, who used MeOH in their CMs. Preferred mobile phases included water with 0.1% formic acid (weak phase) and either acetonitrile or MeOH with 0.1% formic acid (strong phase)22. MCMRT also explores various mobile phase compositions, optimized with different additives such as 0.01% formic acid with 5 mM ammonium formate, 0.1% formic acid with 4 mM ammonium formate, 0.1% formic acid, 5 mM ammonium formate, and 5 mM ammonium acetate. These mobile phase compositions were referenced from existing published datasets11,14,23,24,28, facilitating the comparison and integration of new data with historical data for better understanding and utilization. An analysis of representative chromatographic parameters in the repository highlights the significant influence of column selection and mobile phase compositions on RTs and peak orders29. Detail information about the instrumental and chromatographic conditions are described in Table 1 and Table S2 (see supplementary xlsx file).The molecules in MCMRT span diverse chemical classes and exhibit a broad range of octanol/water partition coefficients (log Kow −8.1 to 11.6) and molecular weights (89 to 1449 Da) (Fig. 1a). They encompass 11 ClassyFire groups at the superclass level30, including benzenoids (27.7%), organic acids and derivatives (20.4%), organoheterocyclic compounds (18.7%), lipids and lipid-like molecules (9.9%), phenylpropanoids and polyketides (7.6%), organohalogen compounds (7.3%), organic oxygen compounds (3.5%), organosulfur compounds (1.2%), organic nitrogen compounds (1.2%), organophosphorus compounds (1.2%), and other compounds (1.5%). Figure 1b,c provide an overview of the elemental composition within these molecules, showcasing a diversity of elements (C, H, O, N, P, S, Cl, Br, F, and I). The METLIN dataset contains 80,038 molecules and covers seven similar superclasses23. Additionally, Gago-Ferreroa et al.’s dataset (referred to as CM 03 P) includes retention time data for 1820 emerging pollutants, such as pesticides, pharmaceuticals from different therapeutic categories, illicit drugs, industrial chemicals, and transformation products, representing a diverse set of chemical structures24. However, compared to these datasets, MCMRT includes some unique compound classes, such as organophosphorus flame retardants and perfluoro and polyfluoro organic compounds, which are absent in both the METLIN and CM 03 P datasets. METLIN focuses on metabolomics and aims to include molecules likely to be found in human samples, which explains the absence of certain classes. In contrast, MCMRT aims to provide broad coverage of chemical structures, including those not typically found in human samples. MCMRT also includes several pairs of isomers, further enhancing its utility in various analytical applications. A full list of these molecules is provided in Table S3 (see supplementary xlsx file), with their common name, IUPAC name, InChI, SMILES, PubChem number, CAS number, formula, Molecular Weight, predicted log Kow and superclass.Fig. 1Chemical diversity of molecules in MCMRT. (a) Molecular weight and log Kow predicted by EPISuite for each molecule. Each data point corresponds to one molecule from the mixture; its color indicates the superclass defined by ClassyFire; its size indicates the adduct ion detected by ESI-HRMS. Panels (b,c) show the elemental composition of each molecule. Columns are aligned vertically for each individual molecule. The left axis represents the relative abundance of each element, while the right axis represents the absolute number of carbon atoms.Among the 343 diverse molecules in MCMRT, eight environmental hormones were detected exclusively in non-acidic mobile phases (CMs 25–30). These hormones include bisphenol A, bisphenol B, bisphenol F, 4-octylphenol, 4-nonylphenol, diethylstilbestrol, hexestrol, and estriol. These compounds primarily ionize in negative ion mode, exhibiting significant responses. The presence of acidic additives in mobile phases likely suppresses their ionization efficiency, resulting in detection limits not being met at the used concentration levels in acidic mobile phases (CMs 01–24). Additionally, five molecules were undetected in mobile phases containing solely acidic additives (CMs 20–24). Among these, one is an environmental hormone whose ionization efficiency may have been further reduced by the high concentration of 0.1% formic acid. The other four molecules—bromopropylate, permethrin, halfenprox, and bifenthrin—primarily responded as [M + NH4]+ or [M + Na]+ ions. In acidic mobile phases, their [M + NH4]+ peaks were not detected, and their [M + H]+ and [M + Na]+ peaks were too weak to be detected. In contrast, the remaining 330 molecules were consistently detected across all CMs (Table S4, see supplementary xlsx file). This significant overlap enables cross-comparison and the study of retention behavior under various chromatographic conditions. Furthermore, MCMRT includes CMs that systematically vary a single chromatographic parameter, providing valuable insights into the effects of these variations. For instance, there are variations in column type between CM 04 and CM 05, mobile phase composition between CM 03, CM 19, and CM 30, running time between CM 01 and CM 13, and gradient profile between CM 09 and CM 10.Overall, MCMRT serves as a crucial resource for exploring the complex relationship between LC setups and molecular RTs. With its comprehensive coverage of LC setups and systematic variations in chromatographic parameters, this resource is poised to significantly enhance the work of researchers who are exploring the optimization of LC methods or the development of predictive models that incorporate these chromatographic conditions. While replicating all setups may not be practical, MCMRT allows researchers to select the most relevant setups for their studies. This flexibility enables the evaluation of model performance across different chromatographic conditions, thereby enhancing the robustness and applicability of their models. This dataset is expected to play a crucial role in the methodological transition across diverse LC setups, providing valuable references for molecular behavior under various conditions. Such insights are crucial for making customized adjustments to methodologies. Furthermore, MCMRT is positioned to improve the accuracy and reliability of scientific work by enabling the cross-validation of methods, ensuring that the RTs of known compounds are consistent with those recorded in the dataset across different CMs. In its contribution to the broader field, MCMRT aims to promote methodological consistency and uniformity in data reporting by providing a benchmark for RTs across a range of CMs. This initiative is a step toward fostering a more integrated and collaborative scientific community, where shared knowledge leads to collective advancement.

Hot Topics

Related Articles