Visualization obesity risk prediction system based on machine learning

Accurate and interpretable risk assessment is crucial for obesity prevention and intervention21,22. The proposed obesity prediction system in this study is an important health management tool that can assist physicians in deciding whether to intervene and develop personalized intervention plans.We first assessed the performance of ten machine learning models in obesity prediction based on health examination data and established an obesity risk assessment model with good predictive performance. In contrast to similar studies23,24,25,26, we did not simply classify the population into obese and non-obese groups, we further divided the obese population into class 1 and class 2 obesity. Additionally, the predictive accuracy of our trained machine learning models on the test set was higher than that in similar studies. This maybe due to the fact that our best model was selected from a larger pool of machine learning models, and we employed a Monte Carlo Cross-Validation algorithm during the training process. The ten machine learning models we used included tree models, deep learning models, and traditional statistical models, and the tree model XGB demonstrated the best predictive performance. This may be because traditional statistical models like LR are more suitable for linear or normally distributed problems, while deep learning models are better suited for image or natural language processing tasks and have poorer predictive performance on small-sample tabular data27.On the basis of the XGB model, we constructed a visual obesity risk prediction system using the SHAP algorithm, making the output results of the machine learning model interpretable. In this study, in addition to incorporating features such as age, gender, lifestyle, and blood routine, we also included body composition data such as total body water, protein content, and basal metabolic rate as variables. According to the SHAP interpretation results (Fig. 5C), bone mineral content was important predictors of class 2 obesity. This is consistent with the findings of Hwaung et al., which indicate that obesity is not only characterized by excessive adipose tissue but also by changes in characteristics such as protein content in skin and visceral organs28. Among the variables related to blood biochemistry, elevated triglycerides levels increased the risk of class 1 obesity, while elevated glycated hemoglobin and uric acid levels increased the risk of class 2 obesity. Conversely, decreased triglyceride and glycated hemoglobin A1 levels increased the probability of non-obesity (Fig. 5). These findings are in line with the research by Jeon et al., which identified triglycerides, glycated hemoglobin, and uric acid as important features for assessing obesity risk24. Indicators such as triglycerides and glycated hemoglobin are important features for assessing obesity risk, this may be because there is a close relationship between high triglyceride levels and insulin resistance, insulin resistance can lead to fluctuations in blood sugar levels, further stimulating excessive secretion of insulin, elevated insulin levels can promote fat synthesis and storage29,30. Therefore, when predicting obesity risk, it is necessary to consider not only common features such as lifestyle but also body composition and blood biochemical indicators in order to provide early intervention for individuals at high risk of obesity.In this study, we constructed an obesity risk prediction system based on the XGB model and SHAP technology, which is accessible on a webpage. To explain the usage of this system, we presented an example of its use in Fig. 6. After inputting information into the left-hand input interface, the system indicates that the BMI of the examined individual does not currently reach the obesity level, but the risk probability of class 1 obesity is 34.77%. In the SHAP plot below, the length of the feature bar indicates the strength of its influence on the risk probability, where red represents positive influence and blue represents negative influence. Factors contributing positively to the risk of class 1 obesity include hip circumference, alcohol consumption, and glycated hemoglobin A1, while factors contributing negatively include triglycerides, lymphocyte percentage, red blood cell distribution width, and diet. Therefore, targeted control of factors such as hip circumference, alcohol consumption, and glycated hemoglobin A1 can reduce the risk of class 1 obesity. Our constructed obesity risk prediction system allows information input through individual entries and file reading, providing personalized obesity risk assessment for examination personnel in a more convenient and user-friendly manner, laying the foundation for the practical application of future prediction systems.We are aware that our study has some limitations. Although we randomly divided the dataset into a training set and a test set, the results may still be influenced by the source of the data due to its single source. Additionally, the class 2 obese population was relatively less in the dataset, although we applied the SMOTE algorithm for over-sampling during model training, the F1-score for class 2 obesity by the best model was still lower than the other two classes, indicating relatively lower predictive ability for class 2 obesity. Lastly, the obesity risk prediction system developed in this study was based on the XGB model. While the XGB model demonstrated the best overall performance, its recall rate for the non-obese class was lower than that of the LGB and BPNN models, suggesting a potential need to improve the model’s predictive performance for the non-obese class through optimization algorithms or other approaches.

Visualization obesity risk prediction system based on machine learning

Utilization of a natural language processing-based approach to determine the composition of artifact residues | BMC Bioinformatics

GenRCA: a user-friendly rare codon analysis tool for comprehensive evaluation of codon usage preferences based on coding sequences in genomes | BMC Bioinformatics

Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny

Mugen-UMAP: UMAP visualization and clustering of mutated genes in single-cell DNA sequencing data | BMC Bioinformatics

DRCTdb: disease-related cell type analysis to decode cell type effect and underlying regulatory mechanisms

Hot Topics

Utilization of a natural language processing-based approach to determine the composition of artifact residues | BMC Bioinformatics

GenRCA: a user-friendly rare codon analysis tool for comprehensive evaluation of codon usage preferences based on coding sequences in genomes | BMC Bioinformatics

Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Utilization of a natural language processing-based approach to determine the composition of artifact residues | BMC Bioinformatics

GenRCA: a user-friendly rare codon analysis tool for comprehensive evaluation of codon usage preferences based on coding sequences in genomes | BMC Bioinformatics

Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny

Mugen-UMAP: UMAP visualization and clustering of mutated genes in single-cell DNA sequencing data | BMC Bioinformatics

Popular Articles

Utilization of a natural language processing-based approach to determine the composition of artifact residues | BMC Bioinformatics

GenRCA: a user-friendly rare codon analysis tool for comprehensive evaluation of codon usage preferences based on coding sequences in genomes | BMC Bioinformatics

Facilitating integrative and personalized oncology omics analysis with UCSCXenaShiny