Defining a core breath profile for healthy, non-human primates

Defining the breath volatilome of non-human primates
In this pilot study, we collected and analyzed the breath of 30 healthy NHPs and found 2,017 volatile features in their collective breath samples (Fig. 1). Of these, 125 were consistently found in the breath from every animal in this study, which we define as the core breathprint (supplemental Tables S2, S3, and S4). We define their accessory breath volatilome (i.e., features present in more than 10% of samples but less than 100%) as having 1,426 features. We define sparse features present in less than 10% of the samples as the rare breathprint, consisting of 466 features. We conducted an accumulation and rarefaction curve analysis (Fig. 2) to assess whether 30 animals were enough to capture the full diversity of molecules in this macaque species under our experimental conditions, an approach commonly used in genetic studies22. Each curve’s asymptote indicates that the pan and core breathprint sizes converged (Fig. 2). Thus, the 30 animals from which breath was sampled capture the full variation of the pan and core breathprint size, and breath samples from additional animals would likely not significantly change the results22,23. The core molecules comprise aliphatic hydrocarbons (primarily alkanes, alkenes, and terpenes), accounting for 40% of the core profile. Aromatic compounds (benzene derivatives, naphthalene, and furan) represented 28% of the core profile. Although classified as aromatic, some compounds in this class may also fall into the third largest category, carbonyl compounds. This group comprised 16% of the core profile and included non-aromatic aldehydes, ketones, and esters. A complete list of core molecules, retention times, linear retention index (LRI), and putative annotations are shown in Tables S2, S3, and S4. Broadly speaking, healthy human breath contains over 1,000 separate molecules, including hydrocarbons, benzene derivatives, aldehydes, ketones, esters, and alcohols24,25,26,27,28. Hydrocarbons represent the most considerable fraction of volatiles in human breath, which is consistent with what we found in macaques. Healthy human breath also contains trace amounts of amino, nitrile, acids, and sulfide groups26,27, which we also observed in healthy macaques’ accessory and rare breathprints. However, some high prevalence features in human breath were not reported as core molecules in macaque breath. The untargeted GC method aims to detect as many volatile organic compounds as possible, while an unoptimized GC method may cause peak split, overlap, or even undetection. For example, 2-pentanone was included in accessory volatilome (only absent in one sample). Acetone was detected in the early stage of the chromatograph with broad and splitted peaks due to the effect of moisture in the breath sample, so the frequency of observation was low. Isoprene was not detected in our method because of its low molecular weight and early elution time, so the software could not identify the peak. Therefore, optimization of the GC method is vital in the future to increase the reliability of peak identification, especially for targeted and quantitative analysis.Figure 1(a) The data analysis process and reduction in analytes as they are removed or classified according to core, critical core, accessory, or rare status. (b) The breakdown of the proportion of molecules in the pan volatilome that belong to each subclass.Figure 2Accumulated and rarefaction curves generated from random sampling without replacement 500 times from the training dataset and examining the size of the resulting pan and core volatilomes as a function of sample size.Validation of the core, accessory, and rare volatilomesAs an initial step in validating these core, accessory, and rare volatilome results, we examined the frequency with which we observed these features in an additional 19 samples from the identical 30 macaques on different sampling days used as a validation set. Regarding the core breathprint, and in accordance with our S/N cut-offs, we initially found that one core molecule was not observed in four validation samples, four core molecules were not observed in two validation samples, and 22 core molecules were not observed in one validation sample. This meant that 27 core molecules (based on the training set) were missing, with varying frequency, across seven samples. However, a careful visual comparison of the mass spectra of these “missing” peaks (using 1D and 2D retention times that fall within 6.0 s and 0.3 s of known core molecules) revealed that many were indeed present but not assigned a putative identification by ChromaToF. Following these checks, we found that:

(a)

Only 12 core molecules in the training set were not found in the validation set

(b)

Each missing molecule was undetected in only one sample

(c)

One breath sample, denoted S26, accounted for seven of the 12 missing molecules

Ultimately, 113 of the 125 core molecules defined in the training set were observed in 100% of the validation samples. Table S6 shows the 12 missing core molecules and their distribution across samples. Of the five validation samples for which only one core molecule was missing, four samples (S14, S24, S40, and S42) came from male Chinese cynomolgus macaques, while one (S4) was from a female Chinese cynomolgus macaques (Table S1). Given the small sample sizes (only 4 females to 26 males), narrow age range (6–8 years), and the detection of these molecules in the other sample from these animals, we are unable to attribute these missing observations to any of these factors meaningfully.Additionally, the pattern in Table S6 suggests something unusual about sample S26. The frequency of missing core molecules (7/12) relative to every other sample with missing core molecules (1/12) implies that the features are “Missing Not at Random” for S26. Due to the more anomalous nature of S26, we checked the retention times and peak areas of its core molecules that were not missing and did not find anything that would allow us to classify it as an outlier. We also note that the animal from which sample S26 was taken was very similar to 16 other animals in the study in terms of species, origin, age, and sex (i.e., Macacca fascicularis, Chinese, six years old, and male). Therefore, these factors also did not provide any information that would allow us to confidently call this sample an outlier. As such, although removing this sample would yield a higher Validation Accuracy, we have kept it in our calculation. Using Equation (1), we determined a Core Validation Accuracy of 99.5% (Fig. 1).It is important to note that this number does not reflect the portion of the core molecules from the training set that were found in 100% of the validation set but instead reflects the number of times core molecules were observed in validation samples relative to the expected number of observations. This Core Validation Accuracy reveals that core molecules defined by the training set were observed in the validation set in 99.5% of expected instances.This represents a very high validation accuracy despite the inclusion of two female Indian rhesus macaques (animals A5 and A6) in the core-defining training set. This suggests that the core breath volatilome, for this cohort at least, may be common across members of the same genus, regardless of origin, sex and species. Confirming this will require thorough evaluation with a larger cohort, which includes a more heterogeneous distribution of macaque species, origin, age, and sex, as well as multiple samples from individuals in each of these categories. This would allow the use of more varied training and validation sets.An evaluation of the accessory breathprint showed that of the 1,426 accessory features, 1,199 were found in between 10 and 100% of the samples in the validation data. Thirty-one accessory features were observed in 100% of the validation samples; 124 accessory features were found in only one validation sample (5.26%), and 72 were not found in the validation data. Of these accessory features, 142 were significantly different in frequency of observation across the training and validation data using a Fisher’s exact test where \(\alpha =0.05\). For the rare features, 382 of the 466 features did not significantly differ between the training and validation data at a significance threshold of \(\alpha =0.05\). Given the large number of features found in these sets, as well as the relatively high level of confidence in the spectral matching of the core molecules, no mass spectral checks were done to confirm the validity of the absence of “missing” accessory and rare peaks.Identification of the critical core breathprintRoutine animal health monitoring that requires measuring 125 molecules is impractical for most settings. Therefore, we identified a subset of the core molecules that have the smallest variance for easier translation to more standard GC–MS systems or equivalent (Fig. S1). These 23 molecules, which have a normalized area variance < 0.05 across the entire cohort of macaques, are termed the ‘critical core’ breathprint for healthy macaques. Notably, all but one of these molecules has a higher-than-average normalized peak area compared to those molecules in the accessory and rare volatile, indicating that the 23 critical core components are abundant as well as consistently invariant. These molecules are reliably present across each of the 49 macaque breath samples analyzed here.We applied the proposed Metabolomic Standards Initiative criteria to the 23 critical core peaks29. Therefore, we can provide putative names for two molecules (Level 2), putative formulae (without name) for nine molecules and a putative class (without name or formula) for ten molecules (Level 3). No putative identification could be confidently given for two of the peaks, though their presence and reproducibility are confirmed (Level 4). Table 1 lists these core molecules along with their mean retention times on both columns. Aliphatic hydrocarbons constitute the majority of this critical core (52.2%, 11 alkanes, and one terpene), while aromatic compounds (21.7%, four benzene derivatives, and one furan) represented the second largest compound class. Figure 3 illustrates the distribution of the compound classes that comprise the critical core alongside the distribution of the entire core subset of 125 compounds. A comparison of the molecular formulae of compounds identified in this current study with that in Bishop et al.30, which reported the exhaled volatiles associated with cardio-metabolic health in baboons, revealed 15 matching formulae. Further, of the seven compounds in this study to which putative names could be assigned, two matched putative names were given in the Bishop et al. study. These 17 compounds (15 matched by formulae and two by putative name) are listed in supplemental Table S5.Table 1 List of 23 critical core compounds. Names, formulae, and class assignments are putative.Figure 3The outer chart shows the compound class distribution of the 125 core molecules while the inner chart represents the compound class distribution of the 23 critical core molecules. Note that compounds that showed aromaticity along with other functionalities (e.g., acetophenone, an aromatic ketone) were grouped as aromatics. Aldehydes, ketones, and esters were grouped collectively as carbonyl compounds.To investigate the difference between a critical core molecule in breath and that in the environment (room air), we independently identified and aligned 18 room air samples. All critical core molecules were present in all room air samples, but the average peak area ratios between the two samples (breath sample/room air sample) were different (Table 1). A total of 16 compounds were enriched in the breath samples, ranging from 1.7 to 6.8 times higher than in room air samples, suggesting that they were potential real baseline compounds. Three out of 23 compounds (unknown 48, 50, 55) had a ratio of less than one, ranging from 0.6 to 0.8, suggesting that they may be enriched in the room air samples. Another four compounds have a ratio of one, indicating that similar amounts were present in both breath and room air samples. It makes sense that exogenous compounds are exchanged consistently when the monkey is breathing in a fixed room. However, there were some limitations in this comparison. Due to the separate alignment and data processing, we were only able to use the average peak area to compare the difference between breath and background, which had higher variability compared with normalized peak areas, let alone to the possibility of difference in the quantitative mass selected by the software for peak area calculation (such as the three compounds enriched in the room air samples). Aligning all samples together may solve this issue in the future and can compare all compounds automatically instead of focusing on critical core molecules manually. In addition, breath VOCs are highly variable, and 5% normalized peak area variance could restrict criteria in selecting the critical core molecules and further reduced the numbers of selected critical core molecules after comparing the average peak areas between breath and room air. However, we respectfully suggest that it is not clear how best to use room air samples. In this BSL3 facility, for example, the air exchange is substantial while we recognize that there is an exchange in air and monkey lungs, which gives us high variance in the peak area intensities. Therefore, even some compounds were enriched in room air samples, we still kept them as critical core molecules in this pilot study.In addition, the linear retention index (LRI) is commonly used for reporting chemical compounds. Since no references were analyzed along with our samples, we can not calculate the LRI directly. However, breath compounds were predominant in hydrocarbons, indicating that a series of n-alkanes could be present in the samples. Therefore, we checked all compounds via the putative names provided by the software, mass spectral information for alkanes (such as a series of 14 differences between two fragments), and retention times compared with alkane reference standards analyzed in other studies but with the same GC conditions on different days. As such, we can generate a calibration curve (C6–C18 in this study) to calculate the LRI indirectly. The LRI for critical core molecules was shown in Table 1, along with the top 4 mass spectral information (sorted in descending order of abundance). Based on the LRI and mass spectral data, we have more confidence in identifying some of the compounds. For example, the alkane with formula C7H16 and LRI 700 could be heptane; the unknown 50 with LRI 1300 and four m/z fragments indicate a n-tridecane; the unknown 56 with LRI 1400 and four m/z fragments indicate a n-tetradecane. Due to the column type (Rxi-624Sil MS, mid-polar), the LRI of the compounds was not able to match the NIST library. However, it was expected between the LRI obtained from non-polar and polar GC columns. Hence, the annotation for core molecules was still only defined by the criteria described in S2 of the supplementary material.The work in this pilot study describes a methodology for analyzing exhaled volatiles in NHPs and represents the initial effort in establishing distinctive breath signatures that characterize healthy NHPs. The core breathprint mainly comprises aliphatic hydrocarbons, aromatic compounds, and carbonyl compounds. The critical core breathprint consists of 23 highly abundant and invariant molecules, which can serve as a basis for further validation studies. Research based on these data will need to incorporate external validation of these analytes. This involves using internal or external chemical authentication standards to validate the identity of the breathprint, especially for the core or critical core molecules. This also involves adding quality control samples to monitor the quality of breath collection (such as the impact of sedation), the shipping and storage, and the stability of the breath samples. In addition, for the purposes of study improvement, it is also necessary to increase the heterogeneity of the animal and the sample size to have better statistical power, such as age- and gender-matched rhesus macaques in this study. Collecting breath samples longitudinally is also an option to prove intra-primate reproducibility. As a baseline compound, their presence or amount should be kept steady or reasonable. Therefore, future studies aimed at expanding the repertoire of NHP VOCs, assessing the analytical and biological sensitivity, and quantifying the range of these compounds under various conditions (e.g., diet, disease status, etc.) will provide valuable insights.

Hot Topics

Related Articles