Detecting cardiac states with wearable photoplethysmograms and implications for out-of-hospital cardiac arrest detection

PPG datasetIn this study, we obtained and examined PPG recordings under three conditions, including normal cardiac, occlusion, and off-body states. Normal cardiac measurements represented physiologic data during normal blood circulation, while occlusion recordings captured cardiac characteristics in the absence of pulsatile blood flow (i.e., simulating a cardiac arrest). Additionally, off-body PPG measurements were obtained to represent conditions where the wearable device is not worn or may not be properly secured to the skin. This consideration is particularly relevant in the context of OHCA detection, as it can help mitigate the number of potential false positive cardiac arrest detections in real-world scenarios, due to the device being removed or in the absence of proper sensor placement or contact.We performed four experiments to obtain data representing the above-mentioned cardiac and non-cardiac states. These experiments involved collecting time-synchronized reference ECG and four PPGs at 50 Hz. Thirty-one healthy volunteers with no known cardiac conditions were recruited and provided written informed consent to participate in this study. Participants aged between 19 and 58 years old (mean ± sd: 27.3 ± 7.3 years, n = 31) and included 14 females (45%). In the first experiment (Fig. 1a, Experiment 1), participants wore PPGs on common wearable sensor locations including the fingertip (the most common form factor in clinical settings), finger base (recently adopted in wearables with a ring form factor), and wrist (the most common form factor for consumer wearables)30. All participants’ baseline systolic blood pressure was recorded, which ranged between 90 and 140 mmHg (mean ± sd: 117 ± 11 mmHg, n = 31). We applied pressure on the upper arm (> 20mmHg above the participants’ systolic pressure) to induce arterial occlusion for 60 s. We used both manual pulse palpation and a stethoscope to confirm complete occlusion. Infrared (IR) PPG data at full pressure were labeled as Occlusion. Additional data without pressure applied was collected 30 s before and after. These measurements were labeled as Normal Cardiac. This procedure was repeated 3 to 5 times for each participant. In the second experiment (Fig. 1a, Experiment 2), we collected 2 min of continuous and uninterrupted IR PPG data at the same anatomical locations as those in Experiment 1 (i.e., fingertip, finger base, and wrist). All measurements from this experiment were labeled Normal Cardiac.In the third experiment (Fig. 1a, Experiment 3), we alternated attaching and detaching PPGs at the same anatomical locations for 30 s at a time over 5 min. While attached, sensors remained on the skin (PPG data labeled as Normal Cardiac) and while detached, sensors were removed fully from the body (PPG data labeled as Off-Body). This was done to simulate normal wearable device usage as individual put on and remove devices. We further supplemented Off-Body data by collecting PPG recordings while sensors were not attached to any participants. This was done over a range of conditions, with sensors facing surfaces with various reflections/absorption characteristics as well as during dynamic movement or motionless states. All recordings were labeled Off-Body. A comprehensive list of all tests conducted by each participant, as well as detailed information about the sensor placement in each test, is available in the study’s documentation repository in Dataverse.We collected 34 h and 44 min of Normal Cardiac, 5 h and 13 min of Occlusion measurements on fingertips, finger base, and wrist, as well as 10 h and 22 min of Off-Body data. All PPG recordings were pre-processed (Fig. 1b), following which we extracted common time-domain (time: n = 10), PSD-domain (psd: n = 11), and frequency-domain (freq: n = 5) feature sets for classification. The choice of selected features was informed by several studies focused on PPG signal analysis24,34,59,60.Classification model trainingFor PPG recordings obtained from the fingertip, finger base, and wrist, a multi-class classification pipeline with grid search was developed (Fig. 1c). This included the following steps: (1) Standardization: this involved subtracting the mean and normalizing to one standard deviation for each feature in the training set; (2) Feature selection: this included training a random forest classifier with 5-fold cross-validation (cv) and a forward sequential feature selection algorithm to identify optimal feature combinations that maximize classification performance. A total of six feature sets were evaluated including independent time-, frequency-, and PSD-domain features, as well as combinations of these features, including timefreq (n = 15), timepsd (n = 21), and timefreqpsd (n = 26); (3) Model training: this used 70% of randomly selected PPG measurements to train random forest classifiers with 10-fold cv and hyperparameter tuning (number of estimators and maximum depth of the trees) with the objective of maximizing macro average F1-score; and (4) Model evaluation: this utilized the 30% hold-out testing dataset, standardized using the means and standard deviations calculated from the training datasets; and the classification performance was assessed on the standardized testing set. For each anatomical location, all models were trained on the same 70% of data and evaluated on the same 30% hold-out dataset (overview of the classification pipeline is presented in Supplementary Materials, Classification Pipeline).Fig. 1Overview of the experimental procedure, signal processing, and classification pipeline. (a) Experiment 1 involved collecting 30 s baseline Normal Cardiac measurements, 60 s Occlusion measurement, followed by 30 s Normal Cardiac recordings. Experiment 2 involved collecting 120 s of continuous Normal Cardiac. Experiment 3 involved collecting Normal Cardiac and Off-Body measurements at 30 s intervals for 5 min. (b) Data analysis pipeline, including raw PPG signal segmentation, low-pass (LP) and band-pass (BP) filtering, and time-/frequency-/PSD-domain feature extraction. (c) Classification pipeline, including feature transformation, random forest classification model training with sequential forward feature selection, and evaluation. OCC: Occlusion; NC: Normal Cardiac; OB: Off-Body.Classification performance: dependence on PPG feature characteristicsFor each anatomical location (i.e., fingertip, finger base, and wrist), we compared classification performances when using each of the six feature sets (Fig. 2a). Across all three anatomical sites, the classification performance of models using both time- and PSD-/frequency-domain features (i.e., timefreq, timepsd, timefreqpsd) were higher than the models trained on independent time-, frequency- and PSD- -domain features (i.e., time, freq, psd). Classifiers trained on timefreq, timepsd, timefreqpsd features had identical or very similar performance on each anatomical location. For the remainder of the results section, we will only present the output of classification models trained on the highest-performing timefreqpsd feature set.Various time-, frequency-, and PSD-domain features were selected and contributed to achieving optimal classification performances. For instance, both the PPG Mean and 1.17 Hz component of the PPG PSD (equal to heart rate of ~ 70 beats per minute (BPM)), were common among classification models across all anatomical sites. Several other features were common among at least two classifiers. These included the PPG signal Mean Crossing on the fingertip and wrist, the signal Total Power on the fingertip and finger base, and the 2.34 Hz PSD (the second harmonic of 1.17 Hz) on the finger base and wrist. The Kruskal-Wallis test for independent samples revealed a statistically significant difference (P < 0.001) in the selected features across at least two classes (for further detail refer to Statistical Analysis in Supplementary Materials). For instance, the Total Power of PPG measurements showed significant differences between the Normal Cardiac, Occlusion, and Off-Body recordings (Fig. 2b). Another example is the PPG Mean, which showed significant differentiation between Off-Body and the other two states (Normal Cardiac and Occlusion) (Fig. 2b,c). Similarly, the 1.17 Hz PSD feature demonstrated significant differentiation between Normal Cardiac and the other two states (Occlusion and Off-Body) (Fig. 2b,c).Classification performance: dependence on anatomical locationAcross the classification models trained and evaluated on the three anatomical locations (i.e., fingertip, finger base, wrist), higher classification performances were observed on the finger (macro average F1-score of 0.964 on the fingertip and 0.954 on the finger base) compared to the wrist (macro average F1-score of 0.837) (Table 1). Across all anatomical sites, Normal Cardiac and Off-Body states had higher sensitivity and fewer false negative predictions compared to Occlusion (Table 1; Fig. 3a). Conversely, Occlusion had higher specificity and fewer false positive predictions compared to Normal Cardiac and Off-Body states (Table 1; Fig. 3a). Occlusion detection had the lowest performance on the wrist, with 56.2% false negative predictions (Fig. 3a) and an area under the curve of 0.91 (Fig. 3b). Analysis of the receiver operating curve revealed that when eliminating all false positive Occlusion predictions (i.e., no potential false cardiac arrest prediction), the classifiers trained on the fingertip, finger base, and wrist achieved Occlusion sensitivity (true positive rate) of 0.518, 0.825, and 0.155, respectively (Table 1).Table 1 Classification performances of models trained and evaluated on all PPG recordings and timefreqpsd feature set.Fig. 2Classification performance and associated optimal features used in classification. (a) Classification performances across different feature sets and anatomical locations. (b) Distribution of sample time-domain (Mean and Window Change) frequency-domain (Total Power) and PSD-domain (1.17 Hz) features across different cardiac states. Each boxplot shows the quartiles of the dataset (center line: median; box limits: upper and lower quartiles) and the whiskers extend to show the 1.5x interquartile range of the rest of the distribution. (c) Sample time- and PSD-domain PPG characteristics across different cardiac states. OCC: Occlusion; NC: Normal Cardiac; OB: Off-Body.Fig. 3Classification performance and representative cardiac measurements on different anatomical sites. (a) Confusion matrix for multi-class classification performance at each anatomical location for classifiers trained and evaluated on all PPG recordings and timefreqpsd feature set. The rows of each confusion matrix represent the true labels of the dataset and the columns of the confusion matrix represent the predicted labels. (b) The receiver operating curve for One-vs-the-Rest cardiac state classification at each anatomical location for classifiers trained and evaluated on all PPG recordings. (c) Sample PPG waveforms on different anatomical locations for Normal Cardiac and Occlusion states. OCC: Occlusion; NC: Normal Cardiac; OB: Off-Body; TPR: True Positive Rate; FPR: False Positive Rate; AUC: Area Under the Curve.We found that Normal Cardiac measurements taken at the fingertip and finger base had a distinct pulsatile component (PPG AC), differentiating them from pulseless Occlusion measurements on those sites (Fig. 3c). However, Normal Cardiac measurements of the wrist exhibited a weaker pulsatile component, potentially making it more challenging to distinguish them from pulseless Occlusion measurements (Fig. 3c). Considering the impact of PPG signal quality on the accuracy of cardiac characteristics, as highlighted by previous research31,32, and the recommendations for implementing signal quality assessment strategies to enhance accuracy and reliability of wrist-worn PPG devices33, we hypothesized that the poor classification performance of the model trained on wrist PPG recordings might be associated with the poor signal quality of the Normal Cardiac measurements on this site. Consequently, the secondary objectives of this study were to (1) determine the impact of PPG signal quality on the classification performance of different anatomical sites; and (2) determine the real-world implications of the anatomical performance.Classification performance: dependence on PPG signal qualityTo explore the secondary objectives, we wanted to evaluate the performance of classifiers that were trained and tested with only high-quality signals. Signal quality index measures or threshold-based approaches are commonly used to evaluate PPG signal quality62,63, and to prune high vs. low-quality PPG data previously59,64. We used a frequency-domain-based signal quality index, known as PPG Power Ratio34. We calculated this index by dividing the sum of PPG PSD components within the 0.83–1.7 Hz frequency range by the sum of PPG PSD components for the entire frequency range (0.5–5 Hz). The 0.83–1.7 Hz frequency range corresponds to the healthy cardiac frequency at rest (50–100 BPM), as is the case in our study. Consistent with this assumption, the average heart rate of our study participants (mean ± sd, n = 30) was 70 ± 11 BPM (ECG measurements were not available for one participant due to sensor malfunction).To create the High-Quality datasets for each anatomical location, we retained PPG recordings with Power Ratio above the 25th percentile of all Normal Cardiac PPG measurements (Fig. 4a). Consequently, Normal Cardiac measurements that passed the quality threshold contained the highest signal energy within the resting heart rate frequency, while those categorized as low-quality contained artifacts affecting both the PPG waveform and its frequency-domain characteristics (Fig. 4b).For each anatomical location, we trained and evaluated a classification model using the High-Quality PPG dataset. Specifically, the High-Quality training and testing sets retained Normal Cardiac data from the original training and testing sets that met the quality threshold. The High-Quality datasets comprised the same Occlusion and Off-Body data as the original training and testing sets. Overall, classifiers trained and evaluated on High-Quality datasets outperformed those trained and evaluated on Mixed-Quality datasets, which include all PPG recordings (i.e., both high- and low-quality data). (Fig. 4c). The largest improvement in classification performance was observed for models trained and evaluated on PPG data obtained from the wrist, with increases of 3.79% in macro average precision, 14.73% in macro average recall, and 11.64% of macro average F1-score (Table 2). Furthermore, the Occlusion detection sensitivity (true positive rate) at each anatomical location was consistently higher for models trained and evaluated on High-Quality datasets compared to Mixed-Quality datasets (fingertip: 0.937 vs. 0.899; finger base: 0.934 vs. 0.876; wrist: 0.779 vs. 0.438) (Fig. 4d). However, Occlusion specificity was either unchanged or exhibited a minor improvement. When eliminating all false positive Occlusion predictions (Occlusion False Positive Rate = 0), the true positive Occlusion prediction rates were 0.788, 0.752, and 0.730 for classifiers trained on High-Quality fingertip, finger base, and wrist PPG recordings, respectively (Table 2; Fig. 4e).Table 2 Classification performances of models trained and evaluated on High-Quality PPG recordings and timefreqpsd feature set.While training and evaluating classifiers on High-Quality PPG datasets led to higher classification performances, this involved discarding a portion of the PPG recordings. The percentage of PPG measurements of high quality varied across different anatomical locations (Fig. 4f). Approximately 90.7% of PPG measurements taken at the fingertip and 86.5% of the recordings on the finger base met the quality threshold and were considered of high quality. However, PPG recordings on the wrist exhibited the lowest quality, with only 53.2% of the measurements meeting the quality threshold. Notably, the substantial classification improvements observed on the wrist were achieved by discarding nearly half of the PPG recordings obtained from this site (additional information regarding the breakdown of high- vs. low-quality measurements are available in Supplementary Materials).Fig. 4PPG signal quality characteristics and classification performance for High-Quality PPG datasets. (a) Distribution of Power Ratio for all and location-specific Normal Cardiac measurements, the 25th percentile of all PPG measurements is marked as the quality assessment threshold. Each boxplot shows the quartiles of the dataset (center line: median; box limits: upper and lower quartiles) and the whiskers extend to show the 1.5x interquartile range of the rest of the distribution. (b) Sample time series and short time Fourier transform PPG characteristics of high- and low-quality cardiac measurements at different anatomical locations. (c) Classification performances across different anatomical locations for classifiers trained and evaluated on Mixed-Quality and High-Quality PPG measurements. (d) Confusion matrix for multi-class classification performance at each anatomical location for classifiers trained and evaluated on High-Quality PPG recordings and timefreqpsd feature set. The rows of each confusion matrix represent the true labels of the dataset and the columns of the confusion matrix represent the predicted labels. (e) The receiver operating curve One-vs-the-Rest cardiac state classification at each anatomical location for classifiers trained and evaluated on High-Quality PPG recordings. (f) Percentage of high- and low-qualtiy PPG measuremetns at different anatomical locations. OCC: Occlusion; NC: Normal Cardiac; OB: Off-Body; TPR: True Positive Rate; FPR: False Positive Rate; AUC: Area Under the Curve.Classification performance: practical implicationsTraining classifiers on PPG data of high quality provides valuable insights into theoretically optimal classification performances, however, consistently obtaining PPG recordings with high quality in real-world settings may not always be practical. Therefore, it is crucial to validate classification robustness in more practical sensor deployment scenarios. In our study, in addition to the previously described classification evaluation conditions, we investigated a third condition where the classifier was trained on High-Quality PPG data (training set) and evaluated on Mixed-Quality PPG recordings (testing set). This scenario represents a case where the model is trained on High-Quality PPG data, such as those obtained during controlled lab experiments and those that pass the quality threshold, but its performance is evaluated on a mix of high- and low-quality PPG recordings (Fig. 5a). This is aimed specifically at evaluating the effectiveness of a theoretically optimal model in more practical settings.Consistent with previous models, higher classification performances were observed on the finger, with macro average F1-score of 0.914 on the fingertip and 0.867 on the finger base, compared to the wrist which had a macro average F1-score of 0.658 (Table 3). Across all three anatomical locations, models trained on High-Quality data and evaluated on Mixed-Quality data (HQ-MQ condition) had the lowest overall classification performances compared to classifiers trained and evaluated on either High-Quality (HQ-HQ condition) or Mixed-Quality (MQ-MQ condition) datasets. While the HQ-MQ condition had similar Occlusion sensitivity (i.e., correctly identifying occlusion when they occur) to HQ-HQ, it had lower specificity (i.e., incorrectly identifying occlusion when they are not occurring) at all anatomical locations. Notably, the HQ-MQ condition had remarkably higher false positive Occlusion predictions (fingertip: 0.022, finger base: 0.048, wrist: 0.226) compared to the HQ-HQ condition (fingertip: 0.004, finger base: 0.004; wrist: 0.005) (Fig. 5b). Moreover, compared to the HQ-HQ condition, the HQ-MQ condition had substantially lower true positive Occlusion predictions when eliminating all false positives (Fig. 5c; Table 3). While HQ-HQ and MQ-MQ conditions showed high sensitivity (> 0.970) for Normal Cardiac detection across all three anatomical locations, HQ-MQ condition had remarkably lower Normal Cardiac sensitivity, specifically on the wrist (fingertip: 0.904, finger base: 0.849; wrist: 0.524).Table 3 Classification performances of classifiers trained on High-Quality PPG and evaluated on mixed-Quality PPG recordings and timefreqpsd feature set.Fig. 5Classification performance. (a) Classifiers are trained on High-Quality data and are evaluated on Mixed-Quality data (HQ-MQ condition). (b) Confusion matrix for multi-class classification performance at each anatomical location for classifiers trained on High-Quality PPG data evaluated on Mixed-Quality PPG recordings (HQ-MQ condition) and timefreqpsd feature set. The rows of each confusion matrix represent the true labels of the dataset and the columns of the confusion matrix represent the predicted labels. (c) The receiver operating curve for One-vs-the-Rest cardiac state classification at each anatomical location for classifiers trained on High-Quality PPG and evaluated on all Mixed-Quality recordings (HQ-MQ condition). OCC: Occlusion; NC: Normal Cardiac; OB: Off-Body; TPR: True Positive Rate; FPR: False Positive Rate; AUC: Area Under the Curve.Potential impact on survival outcomes of unwitnessed OHCAsThe primary beneficiaries of a cardiac arrest detection system are the subgroup of OHCA cases that are unwitnessed at the time of onset. These cases experience delays in recognition and reporting, leading to a reduced likelihood of receiving timely treatment. To estimate the potential impact of improved detection, Hutton et al.,7 conducted a simulation analysis based on over 11,000 EMS-treated and untreated OHCAs and estimated the potential survival improvements in previously unwitnessed cases if the event were promptly recognized (e.g., either through bystander witness or via a biosensor technology). We used similar simulation models to estimate potential survival improvements (relative to baseline survival) based on the Occlusion sensitivity of our classification models (Table 4). The estimated survival rate varied depending on the evaluation condition and anatomical location. Overall, training and evaluating models on High-Quality datasets (HQ-HQ condition) could theoretically offer the highest survival improvement by at least doubling OHCA survival of unwitnessed cases. This remains consistent whether considering optimal classification performance or when eliminating all false positive cardiac arrest alarms (i.e., false Occlusion predictions). It is crucial to acknowledge that the actual efficacy of these models is strictly determined by the availability of PPG recordings with high quality, a factor that is yet to be determined in a practical real-world setting. For classifiers trained and evaluated on Mixed-Quality datasets (MQ-MQ condition), while we observed comparable survival enhancements to the HQ-HQ conditions on the finger, the potential survival improvements were remarkably lower for models trained on the wrist (63.9% for the optimal classifier and 22.2% when eliminating all false positive cardiac arrest predictions). Classifiers trained on High-Quality datasets and evaluated on Mixed-Quality measurements (HQ-MQ condition) demonstrated potential for more than doubling the OHCA survival rate (> 100%), however, the rate dropped substantially for classifiers with zero false positive Occlusion predictions. For these models, the additional survival improvements were 73%0.1 on the fingertip, 3.4% on the finger base, and 2.0% on the wrist.Table 4 Estimated additional survivors compared to the 620 survivors at baseline if all unwitnessed OHCA events were recognized with a biosensor (with a specified sensitivity) to alert EMS.

Hot Topics

Related Articles