Machine-learning model to predict the tacrolimus concentration and suggest optimal dose in liver transplantation recipients: a multicenter retrospective cohort study

Study approvalThis study was conducted in accordance with the tenets of the Declaration of Helsinki. The Institutional Review Board of Seoul National University Hospital approved the study proposal (approval number: H-2007-083-1141) and waived the requirement for written informed consent due to the retrospective study design. After obtaining approval, we retrospectively collected data from patients who underwent liver transplantation between January 2017 and October 2020. Patients aged < 15 years or those without any record of tacrolimus concentrations were excluded. We followed the recommendations from the article “STROCSS 2021: Strengthening the Reporting of Cohort, Cross-sectional and Case–control Studies in Surgery”37.Data collectionThe two times daily doses of tacrolimus up to 14 days postoperatively and whole blood tacrolimus concentration measured by chemiluminescence immunoassay were collected from electronic medical records at the Seoul National University Hospital for model training and internal validation. Additionally, the patient’s age, sex, height, weight, Model for End-Stage Liver Disease (MELD) score, type of donor, indication for transplant, other immunosuppresants were recorded. Blood test results for alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bilirubin, International Normalized Ratio (INR), serum albumin, serum creatinine, hematocrit were collected daily38.During the study period, the patients were administered an oral dosage of tacrolimus two times daily from the 1st day after liver transplantation. Doses were empirically decided by the attending intensivists based on the patient’s weight, laboratory results related to liver and renal functions, and the whole blood tacrolimus concentration measured before taking the morning dose of the medication. Dose control and drug concentration monitoring were repeated until the tacrolimus concentration reached a steady-state concentration in the target range between 8 and 10 ng/mL.Model developmentA machine-learning model was developed to predict the next whole tacrolimus concentration test results based on the history of oral tacrolimus doses, measured whole blood tacrolimus concentrations, time-dependent covariates (weight, ALT, AST, total bilirubin, INR, serum albumin, serum creatinine, hematocrit) of previous n days, and time-independent covariates (age, sex, and height). The dataset comprised the variables for n + 1 consecutive days, the first n days for inputs, and the last day for output. Furthermore, the missing values were imputed using multiple imputations. The concentrations and doses of the tacrolimus before the first administration were substituted with zeros.A long short-term memory (LSTM) model was developed using the input nodes of the tacrolimus dose, measured tacrolimus concentration, and time-dependent covariates. The LSTM outputs were concatenated with time-independent covariates and entered into the fully connected layer. These structures were inspired by Lee et al.’s study19.Gradient-boosted regression tree (GBRT) and LR models have also been developed for comparison. These models received the same inputs as the final LSTM model based on data from the previous n days. GBRT hyperparameters, such as the number of estimators and maximal depth, were optimized using a similar method.We employed a one-compartment PK model with first-order absorption developed for patients in the first 2 weeks post-liver transplantation39. The PK parameters were adjusted based on the post-transplant stage and the serum albumin, AST, or hematocrit measurements: apparent clearance (CL/F) of 8.93 and 11.0 L/h for AST ≥ 500 and < 500 U/L, respectively, and apparent volume (V/F) of 328 L between 0 and 3 days post-transplantation period. After 4 days, apparent clearance was set to 25.1 L/h for serum albumin of < 2.5 g/dL or hematocrit of < 28% and 17.1 L/h otherwise with an apparent volume of 568 L.Once the best combination of features and hyperparameters was identified, multiple random sampling was performed to evaluate the models’ internal and external validation performance.Training and validation of the models were performed by the author’s program written in Python (version 3.10.5) using the Keras library (version 2.10.0).We compared the accuracy of the models with all combinations of the abovementioned variables for feature selection. Among the various combinations, the one with the highest performance and fewer variables in the five-fold cross-validation was selected. A grid search was performed to determine the optimal combination of hyperparameters. Possible combinations of the hyperparameters were 8, 16, 32, 64, 128, and 256 for the number of nodes in the LSTM; 8, 16, 32, 64, and 128 for the number of nodes in the fully connected layer; and 2–7 days for the number of days for input.To enhance the model transparency and reveal the effects of the input features on the next tacrolimus concentration, we applied the Shapley Additive exPlanations (SHAP) algorithm to further visualize the explanation at the feature level using SHAP version 0.39.0 in Python40. Briefly, the SHAP summary plot was used to illustrate the strength and the direction of associations between features and tacrolimus concentration.Internal validationMultiple random sample validations were conducted. The samples in the derivation cohorts were classified into training (80%) and test (20%) sets using 10 random seeds. Subsequently, the training of the model was repeated using similar methods to estimate the mean performance and 95% confidence interval41. The predictive performance was evaluated based on the root-mean-squared error (RMSE), median absolute error (MAE), median performance error (MDPE), and median absolute performance error (MDAPE). The agreement between the predicted and measured tacrolimus concentrations was evaluated for each model.External validationFor external validation, this study analyzed data from the eICU-CRD dataset, which included over 200,000 intensive care unit stays from 208 hospitals in the United States between 2014 and 201521. The “patient unit stay id” of patients whose admission diagnosis was “liver transplantation” was extracted from the “admission dx” table. Patients aged < 15 years were excluded. Whole blood tacrolimus concentration, ALT, AST, total bilirubin, INR, serum albumin, serum creatinine, and hematocrit measurements (labeled as “lab result offset”) were queried from the “lab” table. The tacrolimus doses were retrieved from the “medication” table and aligned with the lab result based on “drug start offset,” “drug stop offset,” and “lab result offset.” Cases were excluded when the route of drug administration was sublingual or intravenous instead of oral. Data on age, sex, height, and weight were obtained from the “patient” table. Data with missing drug doses or concentrations were excluded to ensure consistency with the training dataset. The LSTM, GBRT, and LR models predicted tacrolimus concentrations in this dataset to confirm the external validity of the model performance.Dose recommendationThe model suggested tacrolimus doses by first predicting the tacrolimus concentration for all hypothetical doses between the minimum (0.5 mg) and maximum doses (20 mg). The tacrolimus doses predicted to achieve the target concentration range (8–10 ng/mL) were then identified as the suggested doses. A 3 × 3 contingency table was produced by juxtaposing the administered dose against the suggested doses and the actual measured concentration within the therapeutic range. Subsequently, these frequencies were statistically examined using the chi-square test.We further evaluated whether dose adjustments aligned with the suggested tacrolimus doses were associated with expedited ICU discharges. We compared the duration of ICU stays between patients who received tacrolimus doses within and outside the suggested range.Clinical outcomeWe investigated whether tacrolimus concentrations outside the target range or high intra-patient variability, defined as a standard deviation of tacrolimus concetnration over 2 ng/ml, significantly impacted prognosis during the first 14 days post-transplant42. The clinical outcomes evaluated included transplantation rejection, renal failure, and CMV infection. Transplant rejection was assessed by transplant surgeons based on laboratory findings, biopsy results, and imaging examinations43. Acute kidney failure was defined as an increase in serum creatinine by 0.3 mg/dL or more within 48 h or an increase to 1.5 to 1.9 times baseline within the previous 7 days44. CMV infection was diagnosed using PCR assays45. We used the chi-squared test to analyze the association between tacrolimus concentration and clinical outcomes during the early post-transplant period.Sensitivity analysisSensitivity analyses were performed to confirm the robustness of the LSTM model. Specifically, we trained the models without any drug concentration results and evaluated their performance.Statistical analysisFormal sample size calculation was not performed because of the inherent nature of retrospective studies. Instead, the study used available data from tertiary hospitals and a large open dataset to develop and test the prediction model. The patient demographics and doses and concentrations of tacrolimus are described as means (± standard deviations) or medians (interquartile ranges), depending on the results of the Shapiro–Wilk test, and the categorical variables are presented numerically (percentages). Continuous variables, such as the doses and concentrations of tacrolimus, age, weight, height, AST, ALT, total bilirubin, INR, serum albumin, serum creatinine, and hematocrit were compared using the Student’s t-test or the Mann–Whitney U-test. Categorical variables, such as patient sex, were compared using Pearson’s chi-square test.Model performance was evaluated using internal test and external validation datasets. The RMSE, MAE, MDPE, and MDAPE were compared using analysis of variance, followed by a post-hoc t-test with Bonferroni correction. An MDPE of < 20% or MDAPE of < 30% was determined to be clinically acceptable based on previous studies22,23,24.Statistical analyses were performed using Python and IBM SPSS for Windows, version 21 (IBM, Armonk, NY, USA), and a significant difference was considered at P < 0.05. The code used for the analysis is attached in Supplementary Table S4.

Hot Topics

Related Articles