Development of a machine learning model to identify intraventricular hemorrhage using time-series analysis in preterm infants

Study setting and data collectionWe retrospectively analyzed time-series data for the first 2 weeks after birth in preterm infants born at less than 32 weeks of GA admitted to the neonatal intensive care unit (NICU) of a single center in a tertiary university-affiliated hospital between January 2013 and June 2022. The study was approved by the Institutional Review Board of the Seoul National University Bundang Hospital (approval no. B-2207-771-102) in accordance with the Declaration of Helsinki. The requirement for informed consent was waived for this de-identified retrospective analysis.The following variables were extracted from our institution’s data warehouse: baseline demographics (birth weight, gender, GA, Apgar score at 1 min after birth [AS1], Agar score at 5 min after birth [AS5], prenatal history (maternal age, in vitro fertilization, the presence of premature rupture of membrane, pregnancy-induced hypertension, gestational diabetes mellitus, histological chorioamnionitis, and the administration of antenatal steroids), resuscitation in the delivery room, surfactant use, medication use (inotropes, antibiotics, sedatives, neuromuscular blockers, and systemic steroids), respiratory support with ventilator parameters (ventilator modes, mean airway pressure, and fraction of inspired oxygen), laboratory findings (pH, hemoglobin, potassium, chloride, and bicarbonate levels), and vital signs (systolic blood pressure [SBP], diastolic blood pressure [DBP], mean blood pressure [MBP], heart rate [HR], respiratory rate [RR], body temperature [BT], and percutaneous saturation of oxygen [SpO2]). Time-series data recorded after IVH diagnosis, including data from the re-admission of the same patient, were excluded. Thirty patients with a less meaningful observation period of less than 24 h were also excluded.Serial cranial ultrasounds were performed in accordance with the study institution’s protocols. Initial screening was obtained within 24 h after birth. Routine follow-up was repeated weekly until 32 weeks postmenstrual age and every 2 weeks thereafter. An additional follow-up ultrasound was conducted if clinically necessary, particularly in cases of unexpected events compromising the perfusion status. A median of 2.75 days after birth was taken for the initial IVH diagnosis in the study cohort. Experienced radiologists specializing in pediatric radiology reported the presence of IVH based on Papile’s classification16.Data preprocessingTime-series data were solely used for model development and validation, as categorical variables showed substantial differences in distribution between the groups with and without IVH. Laboratory findings and clinical information, such as the mode of respiratory support, with large sampling intervals, were excluded from the input variables used to build the machine-learning model. Linear interpolation was used to match time-series data in a 1-hour frame. The administration of medication was assumed to be maintained for 24 h once initiated. The observation period was set to be from birth to the first diagnosis of IVH in each patient. However, each measurement grouped by patient was zero-padded to equalize the input duration to 317 h, which was the maximum observation period of the patients, to minimize the missing values. Because the feature distributions were not validated to follow normal or Gaussian distribution, normalization was applied for scaling, utilizing the minimum and maximum feature values.The patient group without IVH was approximately 10-fold greater in size than the group with IVH. We employed the synthetic minority over-sampling technique (SMOTE), an oversampling technique that balances distribution by supplementing the number of minority group samples to avoid overfitting problems from the unbalanced data17.Model development and validationAutomated machine learning (AutoML) has recently been highlighted since it automates the process from feature extraction and model selection to model training with hyperparameter optimization18. We built a pipeline of model development and validation using the AutoML method and TabularPredictor from the AutoGluon package in Python 3.10.12. A hold-out strategy with a data fraction of 0.2 was applied using the AutoML method, and nine models were fitted. Temperature scaling was performed to calibrate the models. The following 14 features were selected for model development: SBP, DBP, MBP, HR, RR, BT, SpO2, mean airway pressure, FiO2, and medication use (inotropes, antibiotics, sedatives, neuromuscular blockers, and systemic steroids). The entire dataset was divided into a training set for model training and a test set for validation at a 7:3 ratio (Fig. 1).Fig. 1The architecture of model development and validation of machine learning. A total of 20 time-series data features derived from clinical information were used for model training. The entire dataset was split into a training set for model training and a test set for validation at a 7:3 ratio. Given the relatively small proportion of neonates with intraventricular hemorrhage (IVH), the synthetic minority oversampling technique was performed to improve classification performance. An automated machine-learning method was used to build a pipeline from feature selection to model training and hyperparameter tuning. IVH, intraventricular hemorrhage; SMOTE, synthetic minority oversampling technique; AutoML automated machine learning.The models were developed using the following algorithms: K-nearest neighbors, random forest, Extra Trees, weighted ensembles, and neural networks. Python version 3.10.12 (Python Software Foundation, Beaverton, OR, USA; https://www.python.org) and open libraries, including Pandas, Keras, Pytorch, Numpy, and Scikit-learn, were utilized for data preprocessing and machine learning.Statistical analysesThe categorical variables are expressed as frequencies with proportions and compared using the Chi-squared test, whereas continuous variables are presented as medians with interquartile ranges (IQRs) and compared using the Mann-Whitney U test. Statistical significance was set at a P-value of < 0.05. Precision scores were mainly used to compare model performance. The area under the receiver operating characteristic (ROC) curves was also calculated. An ROC curve and a precision-recall curve with a confusion matrix of the best-performing model were generated. R software version 4.3.1 and Python version 3.10.4 were used to analyze the baseline characteristics.

Hot Topics

Related Articles