Explainable machine learning models for early gastric cancer diagnosis

Model evaluation analysisOur study evaluated several advanced machine learning models to find the best one for early gastric cancer diagnosis. As shown in Fig. 1, the WeightedEnsemble model performed the best in most metrics, especially in Balanced Accuracy, F1 Score, and Recall. This indicates its strong potential for clinical use.Figures 1Multiple model performance radar chart: Early gastric cancer diagnosis machine learning model comparison.Balanced Accuracy: This metric averages the accuracy of detecting true positive and true negative cases, providing a balanced view of model performance.F1 Score: This is a combined measure of precision (how many of the predicted positives are correct) and recall (how many actual positives are detected), giving a single score that balances both concerns.Recall: This measures the ability of the model to find all the relevant cases in the dataset.Precision: This measures the accuracy of the positive predictions made by the model. The CatBoost and RandomForest models were also strong in specific metrics like predictive accuracy (ROC-AUC) and diagnostic precision (Precision). These results show that choosing and optimizing the right model can greatly improve the accuracy and efficiency of early gastric cancer diagnosis.In this study, we used lift charts, ROC curves, and Precision-Recall curves to comprehensively evaluate the performance of various machine learning models in early gastric cancer diagnosis. As shown in Fig. 2 here is a detailed analysis of these evaluation graphs:Figure 2Comprehensive performance evaluation: Lift chart, ROC curve, and Precision-Recall curve comparing various machine learning models in early gastric cancer diagnosis.Lift Chart: The WeightedEnsemble_L2 model showed more than a 2.4-fold performance improvement in the top 5% of data, highlighting its excellent predictive capability in high-confidence data subsets. Moreover, models like CatBoost, RandomForest, ExtraTreeGini, LightGBM, XGBoost, and NeuralNet also demonstrated higher enhancements in the top 20% of data, emphasizing the importance of considering early predictive performance in high-risk decision-making.ROC Curve: The ROC curve assesses the overall performance of a model by comparing the True Positive Rate (TPR) and False Positive Rate (FPR). In this study, the WeightedEnsemble_L2 model had the highest Area Under Curve (AUC) at 0.94, showing excellent classification capabilities. The CatBoost, RandomForest, and LightGBM models also exhibited high diagnostic accuracy with AUC values of 0.93 and 0.92, respectively. High AUC values indicate that the models achieve high true positive rates while maintaining low false positive rates, crucial for ensuring diagnostic reliability.Precision-Recall Curve: This curve is a powerful tool for assessing model performance in handling imbalanced datasets. In this assessment, the Weighted Ensemble and CatBoost models demonstrated excellent balance, with AUCs of 0.92 and 0.91, respectively. These models maintain high precision while ensuring substantial recall rates, providing reliable decision support for early diagnosis.In summary, through detailed multi-dimensional assessments, we confirmed that models such as Weighted Ensemble, CatBoost, and RandomForest have high potential in early gastric cancer diagnosis. These models not only perform effectively in data-limited scenarios for disease screening but also satisfy clinical needs for high precision and recall rates. Additionally, the high AUC values of the ROC curves further verify these models’ advantages in ensuring diagnostic reliability, providing a solid scientific basis for future clinical applications.Key feature discoveryFigure 3 illustrates the importance of various features in the WeightedEnsemble_L2 model for diagnosing gastric cancer, using bar graphs and error bars to represent standard deviations. The results highlight that the “Gastric Disease” feature significantly outperforms other variables in terms of importance, with a smaller standard deviation, indicating its high stability in the model. This finding underscores the predictive value of gastric disease symptoms in early gastric cancer diagnosis, revealing a direct correlation with the onset of the disease.Figure 3WeightedEnsemble_L2 model feature importance assessment.The importance of “Night Sweats” and its standard deviation, though lower than that of “Gastric Disease,” still ranks high among all features. As a manifestation of the systemic inflammatory response, its significance in the model suggests potential links with metabolic and immune changes associated with gastric cancer.Blood markers such as HGB, NEUT%, and CRP are identified as important features in the model. Their significance reflects their role in describing the inflammatory state and immune response of patients, closely related to the development of gastric cancer. Particularly, CRP, as an acute-phase protein, is widely recognized for its value in predicting inflammatory diseases and malignancies.Tumor markers CA72-4 and CA199 show moderate importance and are commonly used for monitoring and prognostic assessment of gastric cancer, further validating their diagnostic value in the model. Additionally, blood parameters like XWBC (peripheral white blood cell count) and PH (blood pH level) show lower importance but still reflect their potential contribution to gastric cancer diagnosis.By analyzing the importance of these features, this study not only reveals key biomarkers and clinical features in early gastric cancer diagnosis but also highlights the potential of integrated models in enhancing diagnostic accuracy and explainability. These findings are likely to promote the acceptance of machine learning in clinical applications, providing support for optimizing early diagnosis and treatment strategies.This study employed four methods to comprehensively assess the importance of variables related to gastric cancer diagnosis: LGBM importance, XGB importance, Permutation importance, and RFECV ranking. As illustrated in Fig. 4, these methods provided insights into the most influential factors for accurate diagnosis.Figure 4Multi-method comprehensive assessment of variable importance.LGBM and XGB Importance: These methods calculate the importance of variables based on their contribution in gradient boosting decision trees. Notably, “HGB” demonstrated high importance in both methods, emphasizing its predictive value in gastric cancer diagnosis. Additionally, “Night Sweats” and “Gastric Disease” also scored highly in both methods, underlining their central role in diagnosis.Permutation Importance: This analysis evaluates the importance of features by randomly permuting each feature’s values and observing changes in model performance. Features such as “XWBA,” “ca72,” “HGB,” “PLT,” “GGT,” “GA,” “A/G” showed high importance scores, indicating their significant impact on model predictive accuracy.RFECV Ranking: By progressively eliminating the least important features and using cross-validation to assess the importance of the remaining features, this method identified “APTT,” “AST,” “CA199,” and “CEA” as highly ranked features, indicating their indispensability in model construction.This multi-method assessment provides a comprehensive view of variable importance, ensuring robustness and reliability in the analysis. By comparing results across different methods, we more accurately identified the biomarkers and clinical features crucial for early gastric cancer diagnosis. These integrated findings will optimize diagnostic models, enhance prediction accuracy, and guide more effective clinical decision-making, promoting the development of personalized medicine.Our study employed multi-model SHAP value analysis to explore feature importance across various machine learning models in diagnosing gastric cancer. By comparing models such as CatBoost, NeuralNet, Extra Trees, Random Forest, LightGBM, and XGBoost, this study aimed to uncover how different models rely on key features and their impact on prediction outputs. This method enhanced model transparency, aiding clinical decision-making. As depicted in Fig. 5, the comparative analysis highlights the variability and significance of feature contributions across models.Figure 5Multi-model SHAP value analysis: comparative influence of features in gastric cancer diagnosis.Commonality analysisAll models consistently identified “Gastric Disease,” “Night Sweats,” “HGB,” and “RBP” as having significant impacts on model prediction outputs. This finding underscores the critical role these clinical features play in the early diagnosis of gastric cancer, indicating that their predictive value is substantial regardless of the algorithm used.Differential analysisWhile most models showed high consistency in key features, there were notable differences in sensitivity to certain features. For instance, features like “GA,” “GLU,” “Age,” “Radiating Pain,” and “UA” were identified as having higher impacts in different models. Additionally, Neural Network and LightGBM models demonstrated greater variability in SHAP value distributions for feature impacts, possibly reflecting their more complex or adaptive handling of features.Information content and valueThe SHAP value analysis not only enhanced the transparency of the model decision-making processes but also provided crucial bases for model selection and optimization. By comparing feature importance across different models, we gained deeper insights into each model’s performance and limitations in diagnosing gastric cancer. This in-depth feature importance analysis supports clinical decisions to prioritize specific biomarkers and clinical features, aiding in optimizing diagnostic workflows, enhancing precision, and efficiency. In summary, multi-model SHAP value analysis highlighted the consistency and differences among algorithms in handling the same clinical data, offering scientific and practical guidance for the application of machine learning models in gastric cancer diagnosis. This method not only deepened our understanding of model predictive behaviors but also, by showcasing the combined impact of features, bolstered confidence in the models’ reliability for practical medical applications.Model explanation examplesIn this research, we employed both a logistic regression model and SHAP analysis based on complex machine learning to compare the impact of features in gastric cancer diagnosis. The logistic regression model provides a direct, intuitive explanation of the influence of features, whereas SHAP analysis reveals the non-linear effects and interactions of features, offering in-depth data insights. This comparison not only highlights the statistical significance of features but also enhances the application of model selection and feature explanation in clinical settings.Logistic regression model analysisFigure 6 illustrates how biomarkers and clinical features influence the log odds of gastric cancer diagnosis and predict the risk of gastric cancer. Each subplot details how changes in feature values affect model outputs, revealing trends in gastric cancer risk associated with feature changes. Key findings include:Figure 6Logistic regression model feature impact analysis.Positive Impact Features: such as “PL-CR”, “ RDW-CV”, “PLT”, “AST”, “A/G”, “ca72-4”, “CEA”, etc., where an upward slope indicates that an increase in these indicators is associated with an increased risk of gastric cancer.Negative Impact Features: such as “EOM”, “ADA”, “GLU”, “ APTT”, “CRP”, “RBP”, “TBIL”, etc., where a downward slope indicates that higher levels of these biochemical markers might reduce the risk of gastric cancer.SHAP analysis (complex models)Figure 7 utilizes SHAP values from complex machine learning models to demonstrate the impact of different features on the gastric cancer diagnosis model’s prediction outputs. This analysis aids in thoroughly assessing each feature’s contribution and deepening the understanding of the decision-making process. Each subplot shows the distribution of SHAP values for a feature, where red indicates increased risk and blue indicates decreased risk. Major findings include:Figure 7Complex machine learning model SHAP value analysis.Gastric Disease and Night Sweats are clinical features that have a significant positive impact across multiple models, markedly increasing the predicted risk of gastric cancer, highlighting their importance in gastric cancer diagnosis.Gender and Age, as demographic variables, exhibit different impact patterns across models. Gender shows significant positive impacts in some models, while age displays a broad distribution from positive to negative, especially highlighting the importance of the age range 65 to 75 as a high-risk period for gastric cancer.“HGB” (Hemoglobin) and “RBP” (Retinol Binding Protein) are biochemical markers whose SHAP values indicate complex impacts. HGB typically shows a positive impact, whereas RBP’s impacts are more varied, reflecting its diverse role in assessing gastric cancer risk.Through SHAP value analysis, we gain a deeper understanding of how machine learning models handle clinical and biochemical features. These insights provide a scientific basis for optimizing early diagnosis and treatment strategies for gastric cancer, helping to improve patient survival rates and quality of life.Comparative analysis of scientific and clinical significanceIn comparing these two types of models, we find that the logistic regression model, due to its simplicity and high interpretability, is particularly suited for clinical applications, especially in scenarios requiring quick identification of key risk factors and decision-making. The directness and transparency of logistic regression make it a powerful tool for assessing and interpreting risk factors.Conversely, SHAP analysis, by providing deep insights into the decision-making processes of complex machine learning models, is particularly suitable for studying complex disease biomarkers and discovering potential new therapeutic targets. Its granular interpretability facilitates the development of personalized medicine, helping physicians tailor more precise treatment plans for each patient.Overall, the combination of logistic regression and SHAP analysis provides a comprehensive set of analysis tools for understanding, predicting, and treating gastric cancer. This not only optimizes diagnostic strategies and enhances treatment outcomes but also, by increasing model transparency and interpretability, lays a solid scientific foundation for precise diagnosis and personalized treatment of gastric cancer, driving medical technology innovation and development.

Hot Topics

Related Articles