Engagement analysis of a persuasive-design-optimized eHealth intervention through machine learning

Study designThis secondary analysis is primarily aimed at examining participant engagement with the procrastination intervention and its associated eHealth platform, with a particular emphasis on key engagement measures. The focus is on the patterns of engagement, investigating how various factors, including demographics, affect these patterns within the context of a previously conducted RCT7. To comprehensively understand engagement dynamics, assessments were conducted at multiple stages: baseline (t0), four weeks (t1), eight weeks (t2), 12 weeks (t3), six months (t4), and 12 months (t5) of the RCT. Data collection began in June 2021 and concluded in August 2022. The majority of the data was gathered between June 2021 and January 2022, though some participants completed the intervention over an extended period, leading to an ongoing data collection process for several months. During the data collection period, no widespread external factors, such as COVID-19 lockdowns, uniformly impacted all participants. The staggered recruitment and participation timelines may have led to some variability in data collection timing across individuals.InterventionThe intervention aimed to assist students in managing their procrastination tendencies. The intervention consisted of six lessons, with the first lesson serving as an introduction and an optional lesson following the fifth one. The lessons were offered in German and were released sequentially, with one lesson made available each week. However, participants had the flexibility to complete the lessons without any specific deadlines. Each lesson included a weekly challenge, and participants could choose to track their progress using the diary feature.Each module in the intervention was structured using a variety of formats, including written texts, videos, audio recordings, images, and interactive elements, to facilitate effective learning and application. The first module focused on educating participants about procrastination, outlining its causes and symptoms, and introducing the Rubicon Model of Action Phases13. This module also incorporated exercises for self-monitoring. The second module concentrated on developing time management skills, covering topics such as goal setting, task prioritization, and planning. The third module addressed motivational techniques, allowing participants to examine their individual motivation patterns. The fourth module provided strategies for self-regulation and self-control and included mindfulness exercises aimed at relaxation. The fifth module reviewed all the theories and strategies presented in the previous modules, included reflective exercises, and offered guidance on managing relapses. An optional sixth module was available, which addressed issues related to self-worth, perfectionism, and fear of failure, and provided exercises to assist participants in managing these challenges. For additional information regarding the intervention, refer to14.When working through the intervention, participants generate two distinct types of answer sheets: intervention and diary answer sheets. A new intervention answer sheet is created upon the completion of each lesson. Additionally, every instance of engaging with their daily diary prompts the creation of a new diary answer sheet.To enhance participant adherence and engagement, the intervention was optimized using PD techniques as outlined in9. The intervention included several PD strategies. A personalization strategy was implemented by incorporating the participant’s name throughout the app. Self-monitoring was facilitated through the use of progress bars and daily diaries, assisting participants in tracking their progress in each lesson. To increase the intervention’s credibility, the study team was introduced to the users, accompanied by the provision of evidence-based materials. A tunneling strategy was applied by releasing the lessons to participants sequentially. This step-by-step approach was designed to engage participants more effectively with each piece of content, reinforcing the learning objectives of the current lesson before progressing to the next. It also aimed to prevent information overload and maintain the participants’ focus throughout the intervention. Furthermore, to provide social support, participants were paired in teams of two with other participants from their respective groups, aiming to offer motivational feedback and reminders for lesson completion. The feedback and reminder content, which was predefined and unchangeable, was exchanged between participants via email.eSano platformThe eSano eHealth platform was used to deliver the intervention. This platform was designed to be adaptable, allowing for the creation, customization and delivery of new interventions as required. eSano is an eHealth platform for Internet- and mobile-based interventions and is composed of three sub-platforms: a Content Management System (CMS), an eCoach platform, and a Participant app. The Content Management System (CMS) is a tool that helps researchers develop interventions without any background in IT. The eCoach platform enables the supervision of individuals who sign up for an intervention. The Participant app is a cross-platform application that allows individuals to access their interventions through any internet-connected device, including desktop computers, smartphones, and tablets. The content of the interventions on eSano is delivered through text-based means as well as multimedia formats, such as images and videos. Furthermore, various interactive tools are available to help participants engage with the content and provide responses to the different lessons of an intervention. Study participants accessed the intervention via the participant app of the platform. Additional details regarding the platform are available in15, which describes the platform, its requirements, and its architecture. Moreover16, provides more details about eSano with a focus on its backend.Recruitment and randomizationRecruitment for the RCT was conducted between May and September 2021. Participants were enrolled on a staggered basis, meaning they did not all begin the intervention simultaneously. Participants were recruited through multiple channels. These channels included the trial management system of Ulm University, cooperating universities in Germany, Austria, and Switzerland, in addition to flyers and posters. Participants were recruited on a voluntary basis. No incentives, financial or otherwise, were provided to engage in the study or the intervention. To be eligible for participation, candidates were required to be at least 18 years old, proficient in the German language, have access to an Internet-connected device such as a smartphone or a notebook, and be registered as students. Candidates who met these criteria were assessed for a procrastination level of 32 or higher on the Irrational Procrastination Scale (IPS \(\ge\) 32)17. Participants were randomized by a researcher not otherwise involved in the trial in a 1:1 ratio between the two arms.GuidanceIn the IG, participants received guidance from a digital coach. They could choose the avatar of their coach, which could represent either a male or a female. The coach provided feedback based on each participant’s input. For example, if a participant completed a lesson, they received motivational feedback. If they stopped using the intervention, a reminder email was sent after a pre-selected amount of time. In this case, participants received a reminder email 12 days after their self-selected date of intervention continuation. Additionally, participants received a standardized module summary and motivational feedback two days after completing each module. At the end of each module, participants scheduled their next module appointment. If they missed this appointment, they received a reminder email 12 days later. Throughout each module, the digital coach provided immediate, standardized feedback based on the participant’s responses. In contrast, the CG received human guidance from a trained guide with at least a bachelor’s degree in psychology. Feedback was given only after completing each lesson, and the next lesson became available after reading the feedback. This feedback was a mix of standardized content and personalized text based on the participants’ entries. In addition, CG participants received reminders three, seven, ten, and twelve days after the start of each lesson.Ethics approvalThe methods used in this trial were approved by the Ethics Committee of Ulm University (Ulm, Germany; 502/20) and preregistered at the German Clinical Trials Register (DRKS00025209). The trial was conducted according to “the CONSORT Guidelines for Noninferiority and Equivalence Trials Studies”18. Detailed informed consent was obtained from all participants to ensure that they understood the procedures of the study and their rights, including the option to withdraw at any time without any consequences. No personally identifiable information or images were included in this paper.DemographicsAt the start of the study, we gathered demographic information about participants, including age, gender, field of study, state, number of completed semesters, current exam preparation, semester break, and experience with psychotherapy.Data analysisTo answer our research question, we employed the following data analysis methods using Python (version 3.9) in an Anaconda environment with key libraries including pandas, numpy, optuna, matplotlib, scikit-learn, imbalanced-learn, and SHapley Additive exPlanations19 (SHAP) (https://shap.readthedocs.io/en/latest).Target variableIn this study, the ’late_engagement’ variable is designed to describe participant engagement in an eight-week intervention, specifically focusing on engagement from week five onwards due to observed decreases in usage during the later weeks. Engagement status, classified as ’engaged’ (1) or ’disengaged’ (0), is predicted based on a variety of features from the first four weeks of participant interaction. These features include interaction metrics like the number of logins, the number of submitted intervention answer sheets, the number of submitted diary answer sheets, and social support interactions. Participants were classified as “engaged” based on a holistic assessment of their interaction with the platform. Specifically:

A participant who logged in multiple times but did not complete any new lessons or daily tasks was not classified as engaged, as mere logins without substantive interaction do not reflect meaningful engagement.

Similarly, participants who spent significant time on the platform but failed to complete any lessons were also considered disengaged, as time spent without achieving key milestones indicates a lack of active participation.

Conversely, participants who completed several lessons but spent only minimal time on each were not considered engaged either. This behavior suggests a focus on quickly progressing through the content rather than a thorough engagement with the material, which typically requires adequate time to absorb and reflect on the lessons.

Participants who demonstrated a balanced interaction by completing lessons, spending a sufficient amount of time on each, engaging with daily tasks, and actively participating in social support interactions were classified as engaged.

This classification approach was developed in consultation with experts in the field, thereby aligning it with realistic expectations of participant behavior and the objectives of the intervention. Additionally, it helps identify meaningful engagement that contributes to the intervention’s efficacy, rather than merely superficial interaction with the platform. Additional features encompass demographic information (age, gender, state), study-related variables (study subject, semester break), therapy status (current, past, waiting list, none), and scores from the Irrational Procrastination Scale (IPS) at baseline and after four weeks.FeaturesThe dataset included a diverse array of features, broadly categorized into demographic data, therapy status, and interaction metrics:

1.

Demographic Data: This included age, gender, country of residence, and the study subject/major, with all participants being students.

2.

Therapy Status: The dataset captured various aspects of therapy engagement:

current_therapy: Indicates ongoing therapy participation.

past_therapy: Denotes previous therapy experience without current engagement.

waiting_therapy: Reflects an anticipation or plan to commence therapy.

3.

Interaction Features: These features provided insights into the participants’ interaction with the intervention platform:

Number of logins (logins_first_4_weeks): The total number of times a participant logged into the platform during the first four weeks.

Total time spent on the platform (total_minutes): The cumulative time spent by participants on the platform.

Number of intervention answer sheets (number_intervention_answersheets): The number of intervention lessons completed by a participant during the first four weeks.

Number of social support activities (number_sent_reminders and number_times_adherence_reminders): The number of social support-related activities a participant engaged in.

Number of completed daily diaries (number_diary_answersheets): The total number of daily diaries completed by participants.

First login’s day of the week (first_login_day_of_week): The day of the week on which the participant first logged into the platform.

Age band categorization (age_band): The categorization of participants into different age bands.

Interaction of age band and gender (age_gender_interaction): A composite feature combining age band and gender to study their joint effect on intervention interaction.

In total, 13 features were considered in the analysis, categorized as follows: 4 features in Demographic Data, 1 feature in Therapy Status, and 8 features related to user interaction with the intervention.Feature engineeringTo enhance the dataset’s analytical utility, several derived features were formulated:

Age band: This categorical feature was extrapolated from the continuous ‘age’ variable. Age groups were systematically categorized (e.g., ‘18–24’, ‘25–30’, etc.), aiding in the analysis by simplifying age-related trends.

Age band and gender interaction: This composite feature integrates age bands with gender, creating a unique identifier for each combination. It aims to capture the nuanced interplay between age and gender in participant engagement.

Therapy status: The nuanced therapy status of participants was encapsulated through “current_therapy”, “past_therapy”, and “waiting_therapy”. These features were meticulously derived to distinguish between different phases of therapy engagement, enriching the dataset with layers of behavioral insight.

First login’s day of the week: This feature captures the day of the first login of each participant.

Initial analysis showed that ‘total_minutes’ was a significantly dominant predictor. To understand the impact of other variables on engagement, a subsequent analysis was performed excluding ‘total_minutes’. This exclusion serves as an ablation study, aimed at uncovering the relative importance of other features and providing a more comprehensive understanding of user interaction beyond the primary metric of time spent on the application.Machine learning modelsFollowing the classification of participants as engaged or disengaged, we began the exploration of machine learning models to forecast user engagement. We employed several models, namely Logistic Regression, Decision Trees, Random Forests, Support Vector Machines (SVM), and Gradient Boosting. This choice was motivated by their wide use in the field of machine learning for binary classification problems such as ours. Alongside this, an additional aim was to investigate whether the selection of the model itself significantly impacts the prediction of user engagement. This approach allowed for a side-by-side comparison to identify if some models provided more accurate or consistent predictions in the context of this study. For each model, the following steps were performed:

1.

Data preprocessing: The dataset consisted of a mixture of numerical and categorical variables. Numerical variables were standardized using StandardScaler, while categorical variables underwent one-hot encoding with OneHotEncoder, to normalize their range and format, respectively.

2.

Data splitting: a nested cross-validation approach was employed using StratifiedKFold with 5 splits, to ensure a balanced representation of classes in each fold. This method divided the data into separate training and testing sets in each iteration.

3.

Oversampling: Due to imbalanced class distribution, SMOTE20 was applied to generate synthetic samples for the minority class in the training data.

4.

Model training: Each machine learning model was trained using hyperparameters optimized with Optuna (an Open-Source framework designed for automatic hyperparameter optimization21), fine-tuned to their specific requirements:

1.

Decision tree: The hyperparameters max_depth (ranging from 1 to 40), min_samples_split (between 2 and 10), and min_samples_leaf (from 1 to 10) were adjusted.

2.

Gradient boosting: Adjustments included ‘n_estimators‘ (50 to 200), ‘learning_rate‘ (0.01 to 0.2), ‘max_depth‘ (1 to 7), ‘min_samples_split‘ (2 to 10), ‘min_samples_leaf‘ (1 to 10), and ‘subsample‘ (0.5 to 1).

3.

Logistic regression: Adjustments were conducted for the regularization parameter ‘C‘ (1e-4 to 1e4) and the ‘penalty‘ type (either ‘l1’ or ‘l2’), with the solver selection corresponding to the penalty type.

4.

Random forests: Adjustments included ‘n_estimators‘ (2 to 150), ‘max_depth‘ (1 to 40), ‘min_samples_split‘ (2 to 10), and ‘min_samples_leaf‘ (1 to 10).

5.

Support vector machine (SVM): The adjustment process included ‘C‘ (1e-4 to 1e4), ‘kernel‘ type (options: ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’), ‘degree‘ (1 to 5 for ‘poly’ kernel), ‘gamma‘ (‘scale’ or ‘auto’), and ‘coef0‘ (− 1.0 to 1.0).

6.

Model evaluation: The performance of the models was assessed through accuracy and Precision-Recall Area Under Curve (PR-AUC) metrics. The PR-AUC provides a reliable method for assessing the models in the face of class imbalance. This metric is particularly important as it balances the focus between precision (the model’s ability to identify true positives) and recall (the model’s ability to capture all actual positives), which is often more informative than just accuracy in imbalanced datasets. SHAP values were also computed to interpret the model’s decision-making process, offering insights into feature importance. To compare the performance between the first and second iterations of the models, the percentage difference in accuracy and PR-AUC was calculated. The percentage difference was determined using the following formula: $$\begin{aligned} \text {Diff (\%)} = \frac{\text {Value (1st)} – \text {Value (2nd)}}{\text {Value (1st)}} \times 100 \end{aligned}$$
(1)
This formula expresses the relative change between the first and second iterations, allowing for a standardized comparison of the model performance.

7.

Statistical analysis: Accuracy, classification reports, and confusion matrices were computed for each cross-validation fold to evaluate the performance of the models. A cumulative Precision-Recall Curve was plotted to visualize the trade-off between precision and recall across different thresholds.

All images presented in this paper were generated by the authors as part of this data analysis and are not reproductions from other sources.

Hot Topics

Related Articles