Development of a machine learning approach for prediction of red blood cell transfusion in patients undergoing Cesarean section at a single institution

Study design and parturientsThis retrospective study was approved by the Institutional Review Board of the Asan Medical Center (protocol number 2021-0812) and was conducted in accord with the Declaration of Helsinki. The need for written informed consent was waived. Parturients who underwent a CS from January 1, 2010 to December 31, 2020 were included. Parturients with incomplete data or missing laboratory values were excluded from this study.Data collection and study outcomesFor the input features of predictive modeling, we incorporated preoperative laboratory test results, risk factors highlighted in previous studies, and perioperative variables as suggested by clinical experts. All parturient data, including demographic data, perioperative variables, and laboratory values on preoperative days, were collected from an electronic medical record system. The demographic data included age, weight, height, body mass index (BMI), parity numbers, gestational diabetes mellitus (DM), placenta previa totalis/partialis/marginalis, placenta accreta/increta/percreta, placental abruption, pre-eclampsia, twin, and triple pregnancy. Perioperative variables included the type of anesthesia, midazolam use and intraoperative RBC transfusion. Preoperative laboratory tests of the parturients consisted of the most recent data values taken in the ward within two days prior to surgery. Preoperative laboratory values included white blood cell (WBC) count, hemoglobin, platelet count, neutrophil percent, lymphocyte percent, red cell distribution width (RDW), international normalized ratio (INR), neutrophil to lymphocyte ratio (NLR), platelet to lymphocyte ratio (PLR), prognostic nutritional index (PNI), estimated glomerular filtration rate (eGFR), creatinine, uric acid, albumin, aspartate transaminase (AST), alanine aminotransferase (ALT), total bilirubin, sodium, potassium, and chloride. The NLR was determined based on the ratio between the absolute neutrophil count and absolute lymphocyte count. The PLR was determined based on the ratio between the absolute platelet count and absolute lymphocyte count. The PNI was calculated as 10 × serum albumin (g/dl) + 0.005 × total lymphocyte count (per mm3). This study focused exclusively on red blood cells, excluding other blood products such as platelets and fresh frozen plasma, which were the transfusion products we aimed to predict. The primary aim was to select the ML model with the best performance in predicting the need for an intraoperative RBC transfusion during a CS. The secondary aim was to compare the prediction performance by applying the eight prediction algorithms to the five datasets (1:1, 1:2, 1:3, and 1:4 model datasets and raw data). Additionally, to investigate the impact of different combinations of input variables, or feature combinations, on predictive performance, we constructed several training datasets based on these combinations. Then, the performance of models trained on these varied datasets was comparatively analyzed.Analysis and preprocessing of the datasetOf the 16,137 parturients who were initially enrolled in the study, 1,883 were excluded due to incomplete demographic data, including missing information on height, weight, and comorbidities (n = 962), as well as incomplete laboratory values (n = 921). The parturients excluded from the study due to missing laboratory values accounted for approximately 5% of the total participants. Hence, 14,254 parturients were enrolled in this study. The number of parturients who received a RBC transfusion during surgery was 1020, that is, 7.16%. A dataset for predictive modeling was constructed by sampling data from parturients who received and those who did not receive a RBC transfusion. Alternatively, data from 1020 parturients randomly extracted from among the 13,234 parturients who did not receive a RBC transfusion and data from 1020 parturients who received a RBC transfusion were combined to form an equivalent ratio dataset and used as the training dataset of the 1:1 model. Furthermore, data from multiple numbers of the 1020 out of 13,234 parturients who did not receive a RBC transfusion were extracted, and datasets for the 1:2, 1:3, and 1:4 models were respectively constructed using data from 1020 patients who received a RBC transfusion. We used the bootstrap method to address the selection bias that occurs when sampling nonevent data. The bootstrap method was used to robustly evaluate the performance of the model by repeating the resampling process of the training data and thereby addressing the data imbalance. In this study, the average performance of the individual models was evaluated by resampling the training data 50 times and learning the extracted data. The missing values were removed during the modeling because there were no special mechanisms in which missing data occurred and the correlation between the missing variables was low (Supplementary Figure S1). All continuous input variables used in the predictive modeling were standardized using the StandardScaler provided by the Scikit-learn package 18. Categorical variables were input into the model through one-hot encoding.ML modelsAs algorithms for predictive modeling, ML techniques such as KNN, DT, MLP, SVM, and LR; tree-based ensemble algorithms such as RF and XGBoost; and a simple five-layer DNN were used19,20,21,22,23,24,25,26. The entire dataset was divided into training, validation, and test datasets at a ratio of 6:2:2 for creating predictive models using 8 ML algorithms. The hyperparameters of all algorithms were tuned using the grid search method to achieve the best predictive performance for each model (Supplementary Methods).Predictive performancesThe predictive performance of each algorithm was evaluated based on the area under the receiver operating characteristics curve (AUROC) and area under the precision-recall curve (AUPRC), and the predictive results achieved through multiple bootstrap implementations were expressed based on the means and confidence intervals. The AUROC and AUPRC of each model were statistically and numerically compared. The predictive performances of the models when applying the resampling datasets were compared with that of the model using all raw data. Shapley additive explanation (SHAP) values were used to extract the feature importance of the predictive model used in this study. The SHAP value is a numerical expression of the influence on the direction and range of the contribution to the feature prediction27.Statistical analysisContinuous variables were expressed through the means and standard deviations, whereas categorical variables were expressed as numbers and percentages. Categorical data were analyzed using the Chi-square test or Fisher’s exact test, and continuous data were evaluated using an independent t-test or Mann–Whitney U test. Variables with two-tailed p-values of < 0.05 were considered statistically significant. ML modeling were conducted in Python 3.9 using the Scikit-Learn and TensorFlow packages.Ethical approvalThis retrospective research was approved by Institutional Review Board (IRB) of Asan Medical Center. Written informed consent was waived by the IRB (Asan Medical Center, No. 2021-0812).

Hot Topics

Related Articles