Development, validation and use of custom software for the analysis of pain trajectories

Technical informationTAMS was written using the scripting language AWK combined with Korn shell scripts17,18. The software runs on macOS, Linux and other Unix-type operating systems.Input dataTAMS uses the comma-separated or CSV file format as the input because it is the most common format in which datasets are stored. Furthermore, TAMS was developed to deal with ‘long’ file formats, whereby each patient encounter with all recorded parameters of that encounter is placed on a separate line in the file. Commercial statistical software can typically export data in this format; therefore, converting any existing dataset into a format (long and CSV) that TAMS can use is easy.Pattern recognitionCurrently, pain trajectories within cohorts of patients with LBP are mainly identified using data-driven analyses with longitudinal data (e.g., latent class modelling)9. All these analyses show a number of unstandardised trajectories that appear similar but have consistent differences in profile. Hence, comparisons between cohorts and settings using these trajectories can be problematic. To remedy this situation, a group of pain trajectory researchers suggested standards for naming and defining pain trajectories, as shown in Fig. 1.11Figure 1Suggested standard pain trajectories (reproduced from Kongsted et al.11).Using these pre-defined definitions to identify pain trajectories has a major advantage over pure data-driven analyses identifying latent classes, as each analysis will result in the same defined pain trajectories, independent of the dataset used as input.TAMS uses longitudinal patient data to build a trend line for each unique patient in a dataset, using the week number of the follow-up appointment as an index and pain intensity as parameter for time and y-axis, respectively. After creating a trendline, TAMS calculates extra parameters (metadata parameters) needed for the pattern detection algorithm to function, such as the mean perceived pain of the patient or if there were periods of four or more pain-free weeks. TAMS then uses the created trendline to calculate extra parameters to determine whether any pain trajectory definitions match the trendline for that patient.The flow diagram of the novel algorithm is shown in Fig. 2. As presented in the flow diagram, all trajectories specified in the suggested standards can be further stratified based on the severity of the mean pain intensity of a patient into minor, mild, moderate, and severe categories, creating 28 unique trajectory-severity groups. The trajectory ‘fluctuating pain’ is the last pain trajectory defined in the algorithm because it has the broadest definition. Consequently, the fluctuating pain trajectory serves as a collection tray for patients who do not fall into the other definitions.Figure 2Pain trajectory detection algorithm flow diagram. 1 ‘NoBaselineMean’ is the mean pain calculated over all the pain points of a patient, except for the baseline pain score. 2 A change in pain of ≥ 30 points on a 0–100 point Numeric Rating pain scale (NRS) is considered clinically significant19. 3 ‘Episodic behaviour’ is defined as ‘the detection of a pain-free period of at least four weeks in between periods with pain’20. Once the pain trajectories of all individual patients in the dataset is determined, the patient distribution across each pain trajectory is calculated. This distribution will be used as the result parameter in subsequent analyses. In most LBP studies, the result parameter used in the analyses is a change in symptoms at a specific time post-treatment, for instance, at one and three months after the patient was seen by a physician for the first time. When dealing with trends over time, such as pain trajectories, it is potentially meaningless to use a patient’s pain intensity at a certain time post-baseline. This is because information about the trend is not factored into this analysis. If a patient experiences less pain after one month of a certain treatment, how do we know if this reflects improvement or merely fluctuating pain? Hence, we decided to use the distribution of patients over different pain trajectories as a result parameter.Subgroup analysisRecursive partitioning is the process of creating a decision tree for subgroups based on a set of rules21. Any numeric parameters or parameters that can be represented numerically in a dataset, such as the age or gender of the patients, can be used to create subgroups. The software further allows for dichotomised groups based on the mean value of the population, or groups can be created using population percentiles. Tailored rules can also be created for specific age groups or BMI with predetermined group limits. By specifying more than one parameter from the dataset for subgroup creation, TAMS combines the subgroups based on the specified parameters. This process of combining subgroups can continue to infinity in datasets with many parameters; however, dimensionality becomes an issue as much data would be needed to have enough patients per subgroup to perform statistically meaningful analysis across groups. For this reason, a maximum limit of three parameters was specified for TAMS to be used in recursive partitioning.The goal of introducing a subgroup analysis capability in TAMS was to help identify specific characteristics that directly influence a patient’s pain trajectory. For instance, let us say that age has a major influence on a patient’s pain trajectory. By creating subgroups based on age, we expect the distribution of patients over different trajectories to be very different from that of the entire patient population.Episode data extractionMost pain trajectory studies indicate that LBP is episodic in most cases9,10,14,22. Many patients experience episodes of pain separated by a minimum of four weeks, where the patient has zero pain20. Most treatment efficacy studies on LBP assume that LBP is a self-resolving disease11. Pain intensity data at predetermined moments post-baseline, for instance, at one and three months, were compared against the pain intensity at baseline to determine if a certain treatment option is more beneficial for a group of patients than another23,24,25. In the case of episodic diseases, it is probably more important to determine whether a certain treatment option affects future episodes. For instance, an effect on the duration of future episodes, an effect on the mean pain of future episodes, or an effect on the number of episodes a patient experiences during the year. To enable these types of analyses, TAMS collects, calculates, and stores episode-specific information per patient per episode in a separate output file for future analysis.Testing and validationThe functional features of TAMS were successfully tested using simulated datasets created specifically for this purpose. This type of testing guaranteed essential workings of the software, such as the creation of subgroups. The next validation step to prove that the algorithm can recognise pain trajectory patterns was to redo a previously performed analysis using known trajectory definitions so that the results of the two analyses could be compared.A study performed in Denmark in 201710 was the only study that used a part of the standard definitions, as shown in Fig. 1, to identify the pain trajectories of a group of patients in a dataset. For the purpose of this paper, this study will be referred to as ‘the Danish study’. The dataset used in the Danish study, which contains data from 1077 LBP patients, was previously analysed for pain trajectories using a latent class analysis (LCA)12. Having a sufficient number of patients with known trajectories, the Danish Study was used to validate the output of TAMS using real-life data.The researchers from the Danish study decided to exclude the first ten weeks of the recorded data because they were irrelevant to their analysis. Consequently, certain pain trajectories could not be detected, mainly the rapidly and gradually improving trajectories. These specific pain trajectories rely heavily on what happens in the first few weeks after the baseline measurement, and they become impossible to detect once the first ten weeks of data are removed.To replicate the Danish analysis and prove the functionality of our pattern recognition software, we had to create the same conditions. The first ten weeks of data were removed, and our pattern detection algorithm was altered to remove the rapidly and gradually improving and progressing pain trajectories. The resulting algorithm, which was more or less identical to the Danish study, is shown in Fig. 3.Figure 3Recreation of the pain trajectory detection algorithm used in the Danish study.The trajectory assessment was compared with the original study to calculate the accuracy scores obtained by constructing a truth table per trajectory per patient. This table was subsequently used to calculate the F1-score, which is an accuracy score based on the harmonic mean between precision and recall26. F-scores are frequently used as accuracy measures in algorithm testing for machine learning and data mining. The F1 score is a number between 0 and 1 or 0 and 100%; the higher the F1 value, the higher the accuracy. The F1-score is calculated using the following equation26:$${\text{F}}1{\text{-score }} = { }\left( {\frac{{{\text{recall}}^{ – 1} + {\text{Precision}}^{ – 1} }}{2}} \right)^{ – 1} = { }2{ * }\frac{{\left( {\text{Precision*Recall}} \right)}}{{\left( {{\text{Precision}} + {\text{Recall}}} \right)}},$$where\({\text{Precision }} = { }\frac{{\text{True Positive}}}{{\left( {{\text{True Positive }} + {\text{ False Positive}}} \right)}}\) and \({\text{recall }} = { }\frac{{\text{True Positive}}}{{\left( {{\text{True Positive }} + {\text{ False Negative}}} \right)}}\).Output filesTAMS can generate several different output files. These files are in the CSV format, similar to the input files, so that they can be easily imported into any statistical software. The first output file contains the distribution data of patients over the different pain trajectories, which are calculated after the trajectories of all patients in the dataset are identified. The second output file was almost identical to the input file, with the addition of an extra column containing the detected trajectory. The third output file contains episode information per patient, how many episodes were detected, and information per episode, specifically the duration and mean pain intensity for that episode. The fourth output file contains the extra parameters that TAMS calculates per patient and the metadata parameters that it needs to be able to run the pattern recognition algorithm. The final output file contains a condensed version of the first output file with distribution information. This file is meant to hold the output of a data mining run; therefore, if a thousand different subgroup combinations are tested, the user of TAMS does not have a thousand individual files but instead has one file with a thousand lines, all representing one run of TAMS.

Hot Topics

Related Articles