Identifying severe community-acquired pneumonia using radiomics and clinical data: a machine learning approach

Radiomics feature extractionTo quantify the grey-scale features extracted for each ROI22, an open-source tool, Pyradiomics23 (v3.0, compliant with the Biomarker Standardization Initiative Guidelines24), was used to extract features from the CT scans automatically, and the extracted features were further analyzed. All radiomics features were classified into seven categories, namely: (i) shape-based features; (ii) first-order features; (iii) gray-level dependence matrix features; (iv) gray-level size zone matrix features; (v) neighboring gray-tone difference matrix features; (vi) gray-level run-length matrix features; and (vii) gray-level co-occurrence matrix features.We not only extracted features from the original CT scans but also from derived images processed through filters. Specifically, the original images were processed using a Laplacian of Gaussian filter. In the Laplacian of Gaussian filter, the parameter sigma defines the roughness of the texture intensity. A low sigma emphasizes fine textures, while a high sigma emphasizes coarse textures. We chose to set sigma to {1, 2, 3}. Ultimately, 126 features were extracted from the original images, and 258 features were extracted from the LoG-derived images, resulting in a total of 384 features.Construction of radiomic feature setThe high number of features extracted using the feature extraction tool quickly increases the risk of overfitting the model during the subsequent statistical analysis and machine learning modeling process. Reducing the number of features is crucial to building effective and generalizable models.To reduce the dimensionality of the features, we considered various feature selection methods. First, to eliminate highly similar redundant features, Pearson’s correlation coefficient was used to measure the similarity between these features25. The Pearson correlation coefficient measures the correlation between two features and ranges from [−1, 1]. When it is close to −1, it indicates a negative correlation between the two features; when it is close to 1, it indicates a positive correlation; when it is 0, it indicates no correlation. However, the threshold selection for the Pearson correlation coefficient often depends on the specific task. Generally, a threshold greater than 0.7 indicates a strong correlation, so we set it to 0.7 in our work26.Then, two filter-based methods were used: the Mann-Whitney U test and maximal relevance and minimal redundancy (mRMR). Both methods are based on statistical measures to select features and have the advantages of high computational efficiency and robustness. Specifically, the Mann-Whitney U test does not require the data to meet any distribution curve, making it more robust compared to the t-test, which compares means, as the Mann-Whitney U test compares medians27. mRMR uses mutual information as a criterion to calculate the relevance of the feature subset to the output class and the redundancy among features.To avoid the risk of overfitting during model training and to improve the generalization performance of the model, the construction of the feature set is therefore completed by evaluating the features and identifying those that have a significant impact on the target variable before training the classifier using the model-based feature screening methods: Random Forest, XGBoost, and Lasso model, respectively. From this, the importance scores of the features under the three models are obtained, the three scores are summed, and the features with the top 15 combined rankings are selected to be constructed as the imaging feature set.Clinical feature set constructionConsidering the influence of clinical characteristics on the target categorical variables, this study collated and collected clinical indicators from patients during their admission to the hospital as a clinical dataset. Patients’ age, gender, whether they smoked, whether they drank alcohol, and CURB-65 scores were collected; laboratory test results: blood urea nitrogen (BUN), Calcitonin (PCT), white blood cell count (WBC), neutrophil (NEU), lymphocyte (LYM), Total Plasma Protein (TP), serum albumin (ALB), serum creatinine (Scr), alanine aminotransferase (ALT), aspartate transaminase (AST), fibrinogen, D-dimer. We performed a correlation analysis between clinical features and imaging features to investigate whether there is information that can be mined between clinical features and imaging features. Specifically, the potential relationship between the two types of features was further discovered by using Pearson’s correlation coefficient to calculate the correlation coefficient between clinical and imaging features and visualization using heat maps. Univariate and multivariate logistic analyses were performed on the clinical features to identify statistically significant clinical features.Establishment of models and performance comparisonConsidering the small size of the data set and the high dimension of the features, the eight most mainstream machine learning models were selected. Including Ada Boost Classifier Logistic, Regression, Random Forest, SVM (Radial Kernel), XGBoost, KNN, Light Gradient Boosting, Naive Bayes, that use the image feature set clinical feature set, and a combination of the two as inputs to the model. For model validation, we employed 10-fold cross-validation. Specifically, the data were divided into 10 subsets, with 9 subsets used for model training and the remaining subset for testing. This process was repeated 10 times, and the results were averaged to obtain the final model performance.In terms of model performance evaluation, AUC is defined as the area surrounded by the ROC curve and the lower coordinate axis, which is mainly used to evaluate the accuracy of binary classification problems. In the case of data imbalance, AUC still has good stability and can accurately evaluate the performance of the model. Therefore, we use AUC as the most important evaluation index to evaluate the model.Interpretability analysisMachine learning models are a black-box operation for medical professionals who do not understand the model’s decision-making process. Interpretable analysis of machine learning models can provide a deep understanding of how the models work and can help identify incorrect decisions made by the models, which is particularly important in the medical field. In addition, interpretable analyses can provide valuable insights into medical research. For example, in identifying SCAP, which characteristic indicators are more important in the classification task, and which clinical or imaging indicators changes tend to worsen the patient’s condition. Therefore, it is vital to perform interpretable analyses of the model using relevant tools.Statistical analysisThe clinical characteristics of the patients were described using univariate analysis. Statistically significant indicators in the training set were subjected to multifactorial logistic regression analysis to screen for independent risk factors for SCAP. Statistical significance was defined as p value < 0.05.Ada Boost, Logistic Regression, Random Forest, SVM, XGBoost, and KNN machine learning models were constructed using the “sklearn” package. ROC curves were plotted, and AUC values were calculated to evaluate the models’ discrimination. This section was done using Python (version 3.9.6).Code availabilityWe open the core code, can be get by visiting https://github.com/COOk921/Identifying-SCAP-Using-Radiomics.

Hot Topics

Related Articles