A two-tier feature selection method for predicting mortality risk in ICU patients with acute kidney injury

The conceptual framework for our developing two-tier feature selection prediction model is presented in Fig. 7.Figure 7A conceptual model for predicting outcomes in AKI patients within the ICU using a limited set of features. The initial conceptual model is designed for ongoing prediction of AKI-related hospitalization outcomes. Firstly, we gather data on the patient’s laboratory tests, surgeries, and medication usage. Secondly, relevant features are identified for prediction through feature selection. Thirdly, we introduce a stacking ensemble model, employing fivefold cross-validation to assess patient outcomes. Lastly, the model undergoes analysis using various interpretable methods.Study populationData for this study were retrieved from three distinct critical care databases: MIMIC-III35, MIMIC-IV36, and eICU-CRD37. The prediction models were developed using the publicly accessible MIMIC-III databases. The data were divided into two sets: 30% of the data were reserved for internal validation, and the remaining 70% were used for model construction. The predictive performance of these models was validated using an entirely independent dataset, the MIMIC-IV and eICU-CRD datasets. MIMIC-III includes critical care data from 46,520 ICU patients admitted to Beth Israel Deaconess Medical Center in Boston between June 1, 2001, and October 31, 2012. This dataset encompasses 26 tables encompassing demographics, admission records, discharge summaries, ICD-9 diagnostic records, vital signs, laboratory measurements, and medication usage. In contrast, MIMIC-IV includes data from over 190,000 patients, 450,000 hospitalizations, and more than 1000 hospital admissions to Beth Israel Deaconess Medical Center (BIDMC) and Massachusetts Institute of Technology (MIT) between 2008 and 2019, totaling 1,000,000 admissions. It offers a broader array of information, covering demographics, laboratory tests, medication usage, vital signs, surgical procedures, and disease diagnoses. Although MIMIC-III and MIMIC-IV may share medical information and data types, their data collection, processing, and dissemination methodologies differ. The MIMIC-IV dataset is broader in scope, spanning more hospitals and patients and covering a longer timeframe. The eICU-CRD Collaborative Research Database (eICU-CRD) is a large public database created by MIT in collaboration with the Laboratory for Computational Physiology (LCP). The database is a completely independent dataset that brings together data from many hospitals within the United States, expanding the scope of the study by providing data from multiple centers. The database covers routine data on more than 200,000 patients admitted to intensive care units in 2014 and 2015 and includes a wealth of high-quality clinical information such as physiological parameters, laboratory results, medication records, and diagnostic information. The data are presented in both structured and unstructured forms and are automatically collected from monitoring equipment, electronic medical records, and other healthcare information systems.For each patient sample, the following information was collected: (1) Demographic characteristics: including gender, age in years, and survival status; (2) Vital signs: including heart rate (HR, beats/min), respiratory rate (Resp, beats/min), body temperature (Temp, degrees Celsius), and pain (pain, not applicable); (3) Laboratory parameters: including blood urea nitrogen (BUN, mg/dL), creatinine (Creatinine, mg/dL), glucose (GLU, mg/dL), bicarbonate (HCO3, mmol/L), international normalized ratio (INR), potassium (K), potassium (K, mmol/L), sodium (Na, mmol/L), partial pressure of carbon dioxide (PCO2, mmHg), prothrombin time (PT, s), white blood cell count (PCR, mmol/L s), white blood cell count (WBC, in 103/μL), chloride (CL, in mmol/L), Glasgow Coma Scale (GCS), hematocrit (HCT, %), hemoglobin ( HB, g/dL), acid–base balance index (PH,) platelet count (PL, in mmol/L), platelet count (PLT, in 103/μL), oxygen pressure (PO2, in mmHg), peripheral oxygen saturation (SpO2, in %), and fraction of inspired oxygen (FiO2, in %). Blood samples were taken before and after dialysis, following an 8-h fast for routine biochemical testing.Determination of outcome variables: mortality and AKIMortality, defined as the death rate among patients with AKI during their ICU hospitalization, was determined through specific criteria. Firstly, AKI diagnosis followed the Kidney Disease Improving Global Prognosis (KDIGO)1 guidelines, considering serum creatinine concentration (Scar) and urine output (UO) levels. According to literature studies30,31,32,33,38, serum creatinine concentration (Scar) was used as the main target of study in this experiment. AKI was defined as: a 1.5-fold increase in serum creatinine concentration within the prior 7 days; a rise of ≥ 0.3 mg/dL within 48 h; or a sustained urine output of < 0.5 mL/kg/h for ≥ 6 h. In cases where baseline serum creatinine was unavailable pre-admission, the first serum creatinine at admission served as the baseline. Patients with AKI in the ICU were identified via departmental codes. Subsequently, ICU duration was computed based on admission and discharge times, and data from 24 h preceding admission were extracted26,39,40. Data from the initial ICU admission were used for patients with multiple admissions; the average value was calculated for repeated examinations within 24 hours41.Inclusion and exclusion criteriaTo ensure data safety and emphasize the effectiveness of the model in early prediction, we focused on developing a predictive model using medical data from 24 h prior to a patient’s admission to the hospital to screen patients diagnosed with AKI. The final dataset for the experiment was selected from this data (Fig. 8). During the data selection process, we excluded patients who met the following criteria: (1) age < 18 years old; (2) patients who were admitted to the intensive care unit for > 24 h; (3) patients who had already received chronic renal replacement therapy prior to admission; and (4) data with < 20% of missing values or a lack of outcome information. These exclusion criteria were designed to ensure the quality and accuracy of the experimental data for better exploring the relationship between early patient status and AKI.Figure 8Flow chart of the study population selection. ICU-AKI: patients with acute kidney injury (AKI) treated in the Intensive Care Unit (ICU).Data processingIn this study, datasets with missing values exceeding 20% were excluded, and outliers were identified using box-and-whisker plots and subsequently removed. To handle missing values, multiple imputations were performed utilizing the RF algorithm, known for its effectiveness in imputing missing data42 RF offers several advantages, including the ability to handle mixed types of missing data, adaptability to interactions and nonlinearities, and scalability to large datasets43, while preserving the distribution of data post-imputation. Additionally, the data underwent Min–Max normalization, transforming it into a specific range of intervals to ensure uniform scaling of each feature. This normalization process ensured uniform scaling for each feature, maintaining a relative weight balance between features. By addressing issues of model bias towards certain features due to differing scales, the normalization process improved the performance and interpretability of the machine learning model, ensuring consistent contribution weights of individual features to the model.Statistical analysesDescriptive statistics were utilized to assess the distribution and inherent patterns of numerical characteristics within the dataset. Measures such as mean, median, mode, range, variance, and standard deviation were examined as appropriate. Pearson’s correlation coefficient was employed to analyze the degree of linear correlation between variables. Descriptive statistics for continuous variables included either mean ± standard deviation or median (interquartile range), while frequencies were utilized for categorical variables. The normal distribution of each variable was evaluated using the Kolmogorov–Smirnov test. Student’s t-test compared continuous variables, and Fisher’s exact method was used for correlational analysis between variables. Statistical analysis was performed using R version 4.3.1 for Windows.Feature selectionThis study employs a two-tier feature selection approach to improve both the performance and interpretability of the prediction model. The Boruta algorithm was utilized in the initial tier, and the XGBoost algorithm was employed in the subsequent tier. Boruta6,44 is a RF-based feature selection method that evaluates feature importance through modelling the distribution of random and original features. In the first tier, Boruta is applied to filter out features with significant predictive power for the target variable from the initial set. XGBoost23, an efficient gradient boosting tree algorithm, serves as the meta-model in the second tier, known for its excellent predictive performance and automatic feature screening. Feature selection within the XGBoost model further refines the initially selected features, resulting in a final subset with enhanced predictive power and stability.Model constructionIn this study, we have strategically employed the SEM (Stacking Ensemble Method) to build our model, with the goal of further enhancing overall model performance by adeptly integrating outputs from multiple base learners (single classifiers) as inputs to the meta-learner. Extensive prior research has demonstrated the substantial superiority of the SEM in performance compared to independent classifiers45. To further optimize model performance, this study has employed the voting ensemble method in the preliminary stage. This method selectively crafts the base model for the SEM based on data characteristics and the principle of model diversity. Ultimately, Logistic Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Light Gradient Boosting Machine (LGBM), EXtreme Gradient Boosting (XGBoost), and Random Forest (RF) were identified as the base models for stacking ensemble, with LR being specifically chosen as the metamodel. This decision was made considering that the variables outputted by the base models represent linear data and align with the pursuit of model interpretability. The objective of this selection is to strike a balance between the diversity of the base models and the performance of the overall model, thereby providing a more comprehensive and reliable analytical foundation for this study.Evaluation metricsTo assess the performance of the used models comprehensively and thoroughly, we utilize a diverse set of performance metrics, encompassing the area under the Receiver Operating Characteristic (AUROC), 95% Confidence Interval (CI), Precision-Recall Curve (AUC-PRC), Precision, Accuracy, Recall, F1 score, Calibration curves, and Brier scores. This comprehensive metric framework is designed to provide a more holistic understanding of the model’s performance across various dimensions. Specifically, the evaluation is conducted using the following formulas:True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).$$AUC-ROC={\int }_{0}^{1}TPd FP$$
(1)
$$AUC-PRC={\int }_{0}^{1}FPd Recall$$
(2)
$$Recall=\frac{TP}{TP+FN}$$
(3)
$$Precision=\frac{TP}{TP+FP}$$
(4)
$$Accuarcy=\frac{TP+TN}{TP+TN+FP+FN}$$
(5)
$$F1 score=\frac{2\times Precision\times Recall}{Precision+Recall}$$
(6)
$$Brier Score=\frac{1}{N}\sum_{i=1}^{N}{({f}_{i}-{o}_{i})}^{2}$$
(7)
N is the total number of samples, \({f}_{i}\) is the predicted probability of the ith predicted sample, and \({o}_{i}\) is the actual outcome of the ith sample (usually 0 or 1).Model interpretabilityThe SEM combines multiple base models to generate predictions. Thus, when interpreting the model, the feature weights of each base model undergo an initial assessment using the Permutation Importance (PI) technique. Subsequently, mathematical computation determines the feature weights of each model, which are then utilized as the feature weights of the stacked models. Features with higher weights are selected based on importance ranking, and a causal diagram is constructed using a causal inference framework46. In this framework, confounders are defined as variables directly influencing both the predicted outcome and the predictor. These confounders are pivotal factors contributing to AKI mortality rates47. Finally, Local Interpretable Model-Agnostic Explanations (LIME) is employed to analyze how specific values of different characteristics impact the model’s predicted outcomes across various categories. This elucidation of clinical parameters leading to high patient mortality facilitates targeted interventions for potentially critical illnesses during clinical practice.

Hot Topics

Related Articles