Accuracy of smartwatches for the remote assessment of exercise capacity

Participant characteristicsThe characteristics of the 16 study participants are summarized in Table 1. Of the 16 adults recruited, 15 adults completed both study visits resulting in the following number of observations: 6MWT-standard n = 45, 6MWT-continuous lap n = 45, 3MST n = 45 and 10CRT n = 48. All 10CRT were completed in full, that is that a full chair rise was performed 10 times in total in serial in each test. Similarly, in all 3MST, 24 steps (up and down) were completed for a total of 3-min (72 steps in total). A summary of the number of participants, observations for each test and median [IQR] for the reference measure of each of the parameters assessed is provided in Table 2.Table 1 Study participant characteristics. Data expressed as median [interquartile range, IQR] or frequency (%). BMI (body mass index).Table 2 A summary of the number of participants, observations, and reference results for pooled 6-min walk tests (6MWT all), 6-min walk test—standard (6MWT-standard), 6-min walk test continuous lap (6MWT-continuous lap), 3-min step tests (3MST) and 10 chair rise tests (10CRT).6-min walk testsDistanceParticipants walked further during the 6MWT-continuous lap compared to the 6MWT-standard (6MWT-standard: 649 m [604, 694 m]; 6MWT-continuous lap: 679 m [638, 746 m], p < 0.001) (Table 2). Distance covered during the 6MWT-continuous lap was measured more accurately than during 6MWT-standard for both Garmin (6MWT-continuous lap: MAPE = 6.4% [3.0, 10.4%]; 6MWT-standard: MAPE = 20.1% [13.9, 28.4%], p < 0.001) and Fitbit (6MWT-continuous lap: MAPE = 8.0% [2.9, 10.1%]; 6MWT-standard: MAPE = 18.8% [15.2, 28.1%], p < 0.001), indicating that the 6MWT-continuous lap protocol is more suitable for remote monitoring (Table 3). MAPE for distance was not different between Garmin and Fitbit in either 6MWT-standard (20.1% [13.9, 28.4%] versus 18.8% [15.2, 28.1%], p = 0.935) or 6MWT-continuous lap (6.4% [3.0, 10.4%] versus 8.0% [2.9, 10.1%], p = 0.678) protocol, respectively. Bland–Altman plots showed that both Garmin and Fitbit smartwatches underestimated the distance walked in the walk tests, although this bias was greater for the 6MWT-standard than 6MWT-continuous lap (Fig. 2).Table 3 Distance covered (metres) and step count (n) from the Garmin and Fitbit smartwatches during standard and non-standard 6-min walk tests (6MWT-standard & 6MWT-continuous lap) are compared to the reference measure values.Fig. 2Bland–Altman plots demonstrating levels of agreement between smartwatch GPS distance and meter-wheel distance during 6-min walk test standard (6MWT-S) and 6-min walk test continuous lap (6MWT-CL).Step countParticipants did a similar number of steps during the 6MWT-continuous lap compared to the 6MWT-standard (6MWT-standard: 800 steps [760, 840 steps]; 6MWT-continuous lap: 800 steps [760, 840 steps], p = 0.83) (Table 2). Differences were not observed in errors for step count between the two protocols (p > 0.9). Compared to the Fitbit, the Garmin device showed smaller errors for step count for both 6MWT-standard (Garmin: MAPE = 1.8% [0.9, 2.9%] and Fitbit: MAPE = 6.8% [3.2, 12.9%], p < 0.001) and 6MWT-continuous lap (Garmin: MAPE = 0.9% [0.4, 2.2%] and Fitbit: MAPE = 8.0% [2.6, 12.3%], p < 0.001) despite the same number of steps being performed (Tables 2 and 3). Bland–Altman plots for smartwatch step count and hand tally counter are shown in Fig. 3 and illustrate the better agreement for Garmin.Fig. 3Bland–Altman plots demonstrating levels of agreement between smartwatch step count and hand tally count during 6-min walk test standard (6MWT-standard) and 6-min walk test continuous lap (6MWT-continuous lap).Heart rateHR results are presented as a pooled analysis for 6MWT protocols (Tables 2 and 4). Both devices showed small median errors when measuring HR at rest (Garmin: MAPE = 3.1% [1.1, 6.1%] and Fitbit: MAPE = 2.3% [1.4, 5.3%], p = 0.180), peak exercise (Garmin: MAPE = 0.8% [0.3, 6.0%] and Fitbit: MAPE = 2.5% [1.1, 7.3%], p = 0.003 and recovery (Garmin: MAPE = 3.1% [1.1, 5.6%] and Fitbit: MAPE = 3.2% [1.3, 5.3%], p = 0.503). Bland–Altman plots are shown in Fig. 4. These showed little evidence of bias under any condition, but the LOA increased noticeably during peak exercise.Table 4 Lin’s concordance correlation coefficient (CCC), Bland–Altman plots for repeated measures presented as mean difference [limits of agreement], absolute percentage error (APE) presented as median [interquartile range] and number of outliers (%) results for pooled 6-min walk tests (6MWT), 3-min step tests (3MST) and 10 chair rises tests (10CRT) heart rate (HR) at rest (Rest.), peak exercise (Ex.) and 1-min recovery (Rec.).Fig. 4Bland–Altman plots demonstrating levels of agreement between smartwatch heart rate (HR) and ECG HR during pooled 6-min walk tests (6MWT).3-min step testsBoth devices showed small errors in measuring HR at rest (Garmin: MAPE = 2.4% [1.1, 5.7%] and Fitbit: MAPE = 3.0% [1.2, 5.0%], p = 0.898) and recovery (Garmin: MAPE = 2.6% [1.1, 9.6%] and Fitbit: MAPE = 3.2% [1.7, 6.2%], p = 0.946). Error during peak exercise for Garmin was MAPE = 0.5% [0.2, 22.5%] and for Fitbit was MAPE = 10.3% [5.2, 15.5%], p < 0.001 (Table 4). The relevant Bland–Altman plots are shown in Fig. 5.Fig. 5Bland–Altman plots demonstrating levels of agreement between smartwatch heart rate (HR) and ECG HR during 3-min step tests (3MST).10-chair rise testsBoth devices showed small errors in measuring HR at rest (Garmin: MAPE = 1.9% [1.0, 3.9%] and Fitbit: MAPE = 2.1% [1.0, 4.0%], p = 0.533) and recovery (Garmin: MAPE = 1.4% [0.5, 3.8%] and Fitbit: MAPE = 2.2% [1.5, 3.6%], p value = 0.115). Median error during peak exercise for Garmin was MAPE = 7.1% [1.4, 12.6%] and for Fitbit was MAPE = 12.1% [9.1, 17.4%], p < 0.001 (Table 4). Bland–Altman plots illustrating the limits of agreement between smartwatch HR and ECG HR are shown in Fig. 6.Fig. 6Bland–Altman plots demonstrating levels of agreement between smartwatch heart rate (HR) and ECG HR during 10 chair rises test (10CRT).Lin’s CCC, Bland–Altman analysis results, APE and number of outliers results are summarized for HR measured in all tests in Table 4.Comparison of HR inaccuracies across devicesAfter pooling together 1098 observations across two devices (Garmin and Fitbit), 15 participants, three test types (6MWT, 3MST, 10CRT), six repetitions for 6MWT, three repetitions for 3MST and 10CRT, three test phases (rest, peak exercise, 1-min recovery) and accounting for pseudo-replication using a two- way ANOVA model with nested random effects, Garmin was found have lower APE than Fitbit (p < 0.001), with differences driven by lower APE during peak exercise.Sources of inaccuracyWe found no convincing evidence of associations between absolute errors in HR or distance and participant characteristics (Table 5). A weak association between height and step count errors for the Fitbit device may have been a chance finding given the number of comparisons examined.Table 5 Univariate associations between absolute errors and selected participant characteristics for heart rate, step count and distance.

Hot Topics

Related Articles