Explainable artificial intelligence (XAI) to find optimal in-silico biomarkers for cardiac drug toxicity evaluation

This study focuses on substructural biomarkers and utilizes the value of Shapley Additive exPlanations (SHAP) to determine the influence of each biomarker in predicting drug-related risks like Torsades de Pointes (TdP). By implementing feature importance mechanisms, this research method methodically assesses how each biomarker contributes to the overall predictive model. This approach not only enhances the accuracy of toxicity prediction but also elucidates the complex role of certain electrophysiological features and enriches our understanding of the biological mechanisms behind drug-induced effects.In the context of this research, ANN, XGBoost, RF, SVM, KNN, and RBF are used to identify in-silico biomarkers that significantly impact drug toxicity evaluation. Twelve biomarkers were selected as input features, encompassing morphological aspects of the Action Potential (AP) like \(\frac{dVm}{dt}_{repol}\), \(\frac{dVm}{dt}_{max}\), \(V{m}_{resting}\), \(AP{D}_{90}\), \(AP{D}_{50}\), and \(AP{D}_{tri}\), calcium transient morphology like \(Ca{D}_{90}\), \(Ca{D}_{50}\), \(C{a}_{tri}\), \(C{a}_{Diastole}\), and features related to charge movement, namely qNet, and qInward.By implementing the GS method, we determined the most optimal model and then used the SHAP approach to identify biomarkers significantly contributing to drug risk classification. The SHAP approach is based on coalition game theory, where SHAP values are calculated to describe the contribution of each attribute. Unlike heuristic approaches used in previous studies, the SHAP approach considers the prediction probabilities of all inputs in a machine learning context. Its dual nature allows both positive and negative SHAP values to have equal influence in making predictions. As a result, we calculated the average absolute SHAP scores to identify the biomarkers with the most significant contributions.The feature importance levels from the SHAP results of the ANN model in Fig. 5a, highlight qInward, \(AP{D}_{50}\), and \(AP{D}_{90}\) as significant influences in the TdP risk assessment, aligning with Yoo et al. findings that emphasize the complexity of action potentials and calcium transients in proarrhythmic potential15. This conformity strengthens the validation of our model and illustrates its capacity to capture crucial electrophysiological dynamics critical for accurate drug toxicity prediction. Specifically, the prominence of qInward in the SHAP analysis reflects the intricate role of inward ion currents, especially calcium and sodium, in initiating and maintaining arrhythmogenic activity. This emphasizes the charge movement in drug-induced TdP risk discussed in Dutta et al. and Li et al.10,11. Furthermore, the sufficient importance of duration markers like \(AP{D}_{50}\) and \(AP{D}_{90}\) aligns with traditional electrophysiological theory and supports the CiPA initiative’s shift towards complex in silico assessment strategies.This analysis highlights the interaction between various electrophysiological parameters, indicating a more complex relationship than previously understood in the context of TdP risk, where the focus was often solely on AP prolongation or ionic current changes. Our ANN model, enriched with a comprehensive set of biomarkers, not only strengthens existing electrophysiological insights but also reveals the inherent complexity of proarrhythmic potential in drug-induced cardiotoxicity. A deeper understanding from the SHAP feature importance analysis confirms the robustness of our ANN approach in navigating the complex landscape of TdP risk assessment. ANN tends to capture the non-linear and complex interactions between features, using a deep architecture with multiple layers and neurons to model these relationships in detail, allowing it to detect very specific nuances and patterns in the data.The SHAP feature importance results for our XGBoost model, shown in Fig. 5b, provide nuanced insights into the critical electrophysiological determinants for TdP risk assessment. The prominence of qInward and \(\frac{dVm}{dt}_{repol}\) confirms the importance of inward ion flux and repolarization dynamics in modulating cardiac risk, in line with findings from Yoo et al. and Dutta et al.11,15, where a complex balance of ion currents underlies proarrhythmic potential. The model’s emphasis on \(AP{D}_{90}\) along with \(\frac{dVm}{dt}_{Max}\) indicates comprehensive electrophysiological interactions affecting arrhythmogenicity, highlighting the need for a holistic approach in cardiac safety evaluation, as also suggested in the CiPA comprehensive in silico modeling framework.In the XGBoost model, the prioritization of features like qNet underscores the critical role of Channel block handling in cardiac toxicity, in line with the nuanced insights provided by Li et al.55, where qNet is an integral part of TdP risk prediction. This alignment underlines the relevance of these biomarkers in reflecting the multifactorial nature of drug-induced arrhythmia. Moreover, the presence of \(V{m}_{Resting}\) in the analysis, albeit with a lower importance score, indicates the subtle impact of resting membrane potential on cardiac electrophysiology, necessitating further investigation as discussed in broader cardiac safety research.Our analysis through SHAP feature importance levels for the XGBoost model not only aligns with established electrophysiological paradigms but also offers detailed exploration into how these biomarkers interact within the predictive model, enhancing current understanding of TdP risk. Integrating comprehensive biomarker analysis within the machine learning framework represents an advance in refining drug safety assessment, leveraging the strength of XGBoost to dissect complex biomarker interactions in predicting proarrhythmic risk. Although our findings illuminate the way forward in using sophisticated analytical models to capture the complex landscape of cardiac arrhythmogenicity, features with greater influence in predicting the target are given higher priority, indicating that XGBoost is effective in feature selection and handling nonlinear data, using gradient boosting techniques to enhance prediction.On the other hand, the SHAP feature importance results for the RF model (Fig. 5c) identify qInward and qNet as the most influential factors in determining TdP risk, aligning with the critical role these features play in drug-induced arrhythmogenicity as discussed in studies by Yoo et al., and Li et al.10,15,55. The importance of qInward in the model underscores the fundamental role of inward ion currents, particularly through calcium and sodium channels, in shaping the cardiac action potential and inducing arrhythmias. This is consistent with the comprehensive analysis conducted in the CiPA initiative, which emphasizes the multifactorial nature of TdP risk, beyond the traditional focus on hERG channel blocking.Additionally, the significance of \(AP{D}_{90}\) and \(C{a}_{Diastole}\) in the RF model highlights the complex interaction between action potential duration and calcium homeostasis in cardiac electrophysiology, reflecting the intricate insights provided by Dutta et al. about regulated ion channel activity affecting cardiac repolarization11. This analysis reveals a sophisticated landscape where subtle variations in these parameters can significantly alter the proarrhythmic potential of drugs, reinforcing the need to integrate a broad spectrum of biomarkers for more accurate and holistic risk assessment.It can be concluded that the RF model, employing an ensemble approach, combines predictions from many decision trees to improve accuracy and reliability. RF tends to give equal weight to features and is good at identifying feature importance through its process.The feature importance results from the SVM model, shown in Fig. 5d, indicate equal priority of biomarkers for assessing TdP risk. The importance of \(Ca{D}_{50}\) and qInward in this analysis reaffirms the critical role of calcium dynamics and inward ion flow, respectively, in mediating arrhythmic potential, a finding that reflects the physiological basis emphasized in the CiPA initiative. Notably, the SVM model’s emphasis on \(AP{D}_{90}\) and qNet aligns with the electrophysiological criteria discussed by Dutta et al. and Li et al.11,55, where these parameters are vital in representing the proarrhythmic tendencies of drugs through detailed action potential profiles and ion charge characteristics.The lower ranking of features like \(\frac{dVm}{dt}_{max}\) and \(AP{D}_{50}\) in importance challenges the conventional consideration of their roles, indicating that TdP risk associated with drug compounds may not directly correspond with typical markers of cellular activity or repolarization duration. This insight challenges traditional views and signifies more complex interactions in arrhythmogenesis, requiring a broader perspective in risk stratification.Furthermore, the SVM analysis reflects a sophisticated understanding of how various biomarkers collectively inform proarrhythmic risk, advocating a multifaceted approach in cardiac safety evaluation as supported by the advanced ANN methodology of Yoo et al.15. The depth of this analysis enhances the precision of TdP risk assessment, providing a compelling argument for the integration of diverse electrophysiological features beyond standard APD measurements.At its core, the SHAP feature importance results for the SVM model significantly contribute to the evolving narrative of cardiac safety pharmacology, affirming the intricate balance of ionic and electrophysiological parameters in mediating drug-induced TdP risk. Meanwhile, the SVM model operates by seeking an optimal hyperplane to separate classes in feature space, with a strong focus on features that help define the decision margin. The use of kernels, such as RBF, allows SVM to work effectively in high-dimensional spaces.The SHAP feature importance analysis for our KNN model, represented in Fig. 5e, identifies \(Ca{D}_{90}\), \(AP{D}_{90}\), \(AP{D}_{50}\), \(Ca{D}_{50}\), and other electrophysiological parameters as critical in assessing the drug-induced TdP risk. The importance of calcium transient durations (\(Ca{D}_{90}\) and \(Ca{D}_{50}\)) along with action potential durations (\(AP{D}_{90}\) and \(AP{D}_{50}\)) aligns with findings from Dutta et al. and Yoo et al.11,15, underscoring the significance of these parameters in cardiac toxicity prediction. This alignment reaffirms the relevance of temporal dynamics of calcium handling and action potential in mediating arrhythmic vulnerability, as proposed in the CiPA initiative. The emphasis on these biomarkers by the KNN model reflects the multifactorial nature of TdP risk, highlighting not only ionic current changes but also the broader electrophysiological landscape affecting arrhythmogenic potential.Furthermore, the SHAP analysis elucidates the contributions of less emphasized features like \(\frac{dVm}{dt}_{max}\), qNet, and \(\frac{dVm}{dt}_{repol}\), suggesting a more complex and integrative approach to understanding proarrhythmic risk beyond conventional single-parameter assessment. This perspective aligns with the tiered complexity observed in real-world cardiac electrophysiology, where multiple interdependent factors converge to influence arrhythmia risk. By capturing the multifaceted interactions of electrophysiological biomarkers, the KNN model offers a sophisticated lens through which the subtle intricacies of drug-induced arrhythmia threats can be recognized, aligning with the broader goal of enhancing prediction accuracy and reliability in pre-clinical cardiac safety assessment.This comprehensive analysis not only reinforces the critical electrophysiological foundation identified in previous research but also advances our understanding by showcasing the KNN model’s capacity to integrate and prioritize a wide spectrum of biomarkers. The KNN model makes predictions based on the proximity of sample data to its k nearest neighbors, with features influencing the measurement of proximity or distance. This model is highly intuitive and non-parametric, relying on the local data structure.Figure 5f illustrates the feature importance levels of the RBF model, showcasing the ranking of features based on their impact on the model’s output. In this analysis, \(\frac{dVm}{dt}_{max}\) and \(\frac{dVm}{dt}_{repol}\) emerge as the top two features, indicating their significant influence on the model’s predictions, followed by features like \(C{a}_{tri}\), \(AP{D}_{tri}\), \(AP{D}_{90}\), and others.In the context of drug-induced proarrhythmic evaluation, as discussed in the study by Yoo et al.15, these features (especially \(\frac{dVm}{dt}_{max}\) and \(AP{D}_{90}\)) are critical in assessing the action potential morphology and calcium transients of cardiomyocytes, which are crucial for understanding drug-induced arrhythmias. The emphasis on \(\frac{dVm}{dt}_{max}\) (the maximum rate of voltage change during depolarization) and \(AP{D}_{90}\) (the duration of action potential at 90% repolarization) aligns with their physiological importance in cardiac electrophysiology and their recurrence as significant predictors in arrhythmogenic risk assessment.The significance of \(\frac{dVm}{dt}_{max}\) and \(AP{D}_{90}\) in this study can be linked to their crucial roles in the formation and propagation of cardiac action potentials, and how changes in these parameters are indicative of proarrhythmic risk. This analysis supports the utility of machine learning models in identifying key physiological indicators of drug-induced arrhythmias and enhances understanding of the underlying mechanisms of these adverse effects. Moreover, the presence of calcium-related features like \(Ca{D}_{50}\) and \(Ca{D}_{90}\) in the importance ranking further reinforces the significance of calcium dynamics in cardiac function and its disruption in drug-induced proarrhythmia.Overall, this study reveals that the qInward parameter significantly influences all levels of drug risk prediction—high, intermediate, or low—when using ANN, XGBoost, RF, and SVM models. This observation aligns with the research conducted by Li et al.10, where qInward was identified as an effective biomarker for differentiating levels of drug-induced TdP risk. However, it’s important to note that findings from the Food and Drug Administration (FDA) suggest that qNet is a more crucial feature in distinguishing drug risk levels, specifically when using a dynamic hERG model55.Contrary to previous studies, we utilized a conventional hERG model without dynamic parameter modifications. This decision was made with the primary goal of maintaining the integrity of experimental data before involving MCMC methods. We committed to preserving experimental conditions prior to the MCMC process so that the results could be more directly compared with existing empirical data. Through this approach, we aim to provide a clearer and more measurable contribution to potential biomarkers, in classifying drug risk, without affecting the foundational experimental conditions of this study.On the other hand, in other classification models like SVM, KNN, and RBF, different features emerged as important. In the SVM and KNN models, biomarkers indicating the duration of calcium transients (such as \(Ca{D}_{50}\), \(Ca{D}_{90}\), and \(C{a}_{tri}\)) play a crucial role in classifying drug risk levels, though the SVM model still retains qInward as an important feature in predicting drug risk levels. However, in the KNN model, features indicating the duration of action potential and calcium transients are key in predicting the TdP drug risk level. This contrasts with the RBF model, where the most critical features are \(\frac{dVm}{dt}_{max}\) and \(\frac{dVm}{dt}_{repol}\), highlighting the diversity in feature importance across different models and the complexity of accurately predicting drug-induced TdP risk.To understand how the SHAP (SHapley Additive exPlanations) waterfall plot depicts the contribution of each feature to the model’s prediction at different risk levels (high, intermediate, and low), we need to examine the specific characteristics of each feature within the context of various machine learning models (ANN, XGBoost, RF, SVM, KNN, and RBF). These plots are crucial for understanding how different features influence drug toxicity risk predictions in each model.In the context of high-risk prediction using the ANN model, features like \(AP{D}_{50}\) and \(AP{D}_{90}\) gain significant weight. \(AP{D}_{50}\) refers to the action potential duration at which 50% of repolarization is achieved, while \(AP{D}_{90}\) indicates the time to reach 90% repolarization. The emphasis on these features suggests that the ANN model considers repolarization duration as a critical indicator of high toxicity risk. This importance is based on the insight that these parameters provide information on how a drug affects the cardiac cell’s ability to recover after depolarization, with longer durations indicating a higher potential for arrhythmia risk.For intermediate risk scenarios, the ANN model shifts its focus to dynamic features like \(\frac{dVm}{dt}_{max}\), representing the maximum rate of voltage change during repolarization, and calcium-related features like \(C{a}_{tri}\), which may relate to intracellular calcium concentration. The importance of these features indicates that changes in voltage dynamics and calcium handling become more pivotal in assessing intermediate-level toxicity risk. This change signifies that ANN prioritizes dynamic and responsive cell aspects under intermediate risk conditions, reflecting a more dynamic and adaptive understanding of the drug’s interaction with the cardiovascular system.In predicting low risk, the ANN model places greater emphasis on calcium dynamics, particularly through features like \(Ca{D}_{50}\) and \(C{a}_{Diastole}\). \(Ca{D}_{50}\) relates to the duration over which 50% of calcium is repolarized, while \(C{a}_{Diastole}\) refers to the calcium level during the diastolic phase. Focusing on these features for low-risk assessment suggests that the ANN model regards changes in calcium homeostasis as a critical factor in determining lower toxicity potential. The model’s adaptation to the more subtle electrophysiological nuances in low-risk conditions highlights its sensitivity to minor changes in calcium parameters, which may indicate a greater safety margin against toxic effects.In high-risk classification, XGBoost places strong emphasis on features like qInward and qNet, indicating that the model is highly sensitive to changes in inward ionic currents and total net current, which can be key indicators of high toxicity. The qInward feature, potentially representing the inward current components during the action potential cycle, becomes central in assessing high-risk potential. This focus suggests that XGBoost associates significant changes in this parameter with increased toxicity potential.XGBoost focuses on calcium dynamics for intermediate-risk predictions, with features like \(Ca{D}_{90}\) and \(C{a}_{tri}\) becoming more prominent. \(Ca{D}_{90}\), referring to the duration at 90% calcium repolarization, and \(C{a}_{tri}\), possibly related to intracellular calcium levels, are crucial in determining intermediate-risk toxicity. This approach indicates that XGBoost is sensitive to how changes in cellular calcium handling can reflect a shift in risk level from high to intermediate, emphasizing the importance of calcium signaling in toxicity assessment.The model highlights the importance of action potential duration and calcium dynamics in low-risk classification, mainly through features like \(AP{D}_{90}\) and \(C{a}_{Diastole}\). APD_90 represents the time required for 90% of the action potential to repolarize, while \(C{a}_{Diastole}\) relates to diastolic calcium concentration. These features become more relevant in identifying low toxicity risk, suggesting that XGBoost considers subtle changes in electrophysiological and calcium dynamics as significant indicators of lower toxicity.RF model adopts a balanced approach in assessing high risk, giving significant weight to features like qInward and \(AP{D}_{90}\). qInward, related to inward ionic currents, and \(AP{D}_{90}\), measuring 90% of the action potential duration, are crucial in identifying high-risk situations. This indicates that RF recognizes both aspects—ionic currents and action potential duration—as key indicators that can reflect the high toxicity potential of drugs. This approach demonstrates the model’s comprehensive understanding of the factors contributing to high risk, integrating diverse electrophysiological inputs for accurate assessment.In the context of intermediate risk, RF focuses on features associated with repolarization dynamics and calcium, such as \(\frac{dVm}{dt}_{repol}\) and \(Ca{D}_{50}\). The feature \(\frac{dVm}{dt}_{repol}\), measuring the rate of voltage change during repolarization, and \(Ca{D}_{50}\), potentially indicating calcium concentration at 50% of the diastolic phase, become more pronounced. This signifies that the model prioritizes understanding how cells return to electrical stability and how calcium is regulated in intermediate-risk conditions, which is crucial for assessing moderate toxicity levels.RF exhibits a holistic approach to low-risk predictions by emphasizing various electrophysiological signals. Features covering different aspects of cellular function, such as qInward, \(Ca{D}_{50}\), \(AP{D}_{90}\), and others, all contribute to assessing low toxicity risk. This approach shows that RF considers various aspects of electrophysiological data to form a comprehensive view of low-risk potential, integrating information from multiple sources to make balanced and informed predictions.SVM model emphasizes explicitly features such as qInward and \(C{a}_{Diastole}\) when assessing high risk. This indicates that SVM regards changes in inward ionic current (qInward) and diastolic calcium levels (\(C{a}_{Diastole}\)) as essential indicators of high-risk potential. By prioritizing these features, SVM highlights their correlation with significant toxicity events, reflecting its sensitivity to ionic and calcium dynamics changes that can lead to severe cardiotoxic effects.SVM adjusts its focus for intermediate-risk predictions to highlight elements like \(Ca{D}_{50}\) and qNet. \(Ca{D}_{50}\), related to 50% calcium repolarization duration, becomes significant in the context of intermediate risk, indicating that SVM considers how calcium repolarization duration contributes to moderate toxicity levels. Moreover, qNet, representing the total net current, emerges as another key feature, showing that SVM acknowledges the importance of integrating ionic current information in toxicity risk assessment.In the low-risk category, SVM adopts a subtler approach, focusing on features associated with action potential dynamics. In contrast, within the KNN algorithm, the primary focus on high risk is evident in features like \(Ca{D}_{90}\) and \(AP{D}_{90}\). \(Ca{D}_{90}\), measuring the duration of 90% calcium repolarization, and \(AP{D}_{90}\), indicating the duration until 90% of the action potential repolarization, becomes significant. The emphasis on these features reflects KNN’s recognition of the importance of prolonged electrophysiological events in signaling high toxicity risk. This shows that KNN considers extended durations in calcium activity and action potential as key high risk indicators, reflecting its sensitivity to potentially dangerous cardiovascular dynamics.KNN shows a balanced assessment of features such as \(\frac{dVm}{dt}_{max}\) and qInward for intermediate risk. \(\frac{dVm}{dt}_{max}\), representing the maximum rate of voltage change during repolarization, and qInward, indicating the inward current, both receive attention, highlighting how KNN evaluates various aspects of electrophysiological dynamics to classify intermediate risk. This shows the model’s mature understanding of the complex interactions between various electrophysiological factors in determining toxicity risk.At low risk, KNN adopts a comprehensive approach, considering various electrophysiological features. This shows that KNN does not solely focus on one or two features in assessing low risk but integrates various signals to form a holistic view of toxicity potential. This approach allows KNN to effectively differentiate between different risk profiles, leveraging a broad understanding of electrophysiological dynamics that contribute to low toxicity occurrences. It illustrates KNN’s sensitivity to electrophysiological nuances and calcium dynamics that characterize low-risk situations, using this information to distinguish between different risk levels accurately.For the RBF model, biomarkers like \(\frac{dVm}{dt}_{repol}\) and \(AP{D}_{90}\) are particularly prominent when analyzing high-risk drugs. The feature \(\frac{dVm}{dt}_{repol}\), which measures the rate of membrane potential change during repolarization, along with \(AP{D}_{90}\), the duration until 90% of the action potential repolarization, becomes critical. This indicates that RBF focuses explicitly on the dynamics of repolarization and the duration of action potential in the context of high risk, associating changes in these parameters with increased toxicity potential. The model considers significant changes in these features as key indicators of heightened toxicity, emphasizing the importance of understanding these electrophysiological processes in risk prediction.RBF emphasizes the importance of features like \(\frac{dVm}{dt}_{repol}\) and qNet for intermediate risk. This reflects the model’s adaptation to more moderate risk predictions, where the dynamics of repolarization \(\frac{dVm}{dt}_{repol}\)) and the total net current (qNet) plays a crucial role. This approach underscores RBF’s understanding of how integrating repolarization dynamics and ionic current information can assist in differentiating between high and intermediate risk, highlighting the significance of these features in characterizing toxicity levels.RBF employs a versatile approach in classifying low risk, emphasizing a broad spectrum of features from \(\frac{dVm}{dt}_{repol}\) to \(C{a}_{Diastole}\). The feature \(C{a}_{Diastole}\), associated with the calcium level at the end of the diastolic phase and other parameters, becomes relevant in identifying low risk. This reflects RBF’s ability to interpret various electrophysiological signals, using a diverse feature set to predict low toxicity risk accurately.In this study, several machine learning models were used, incorporating 12 in-silico biomarkers as input features. Using ANN, XGBoost, RF, SVM, KNN, and RBF, the training process was conducted with 12 drug datasets, each containing 12 in-silico biomarkers as input features \(\frac{dVm}{dt}_{repol}\), \(\frac{dVm}{dt}_{max}\), \(V{m}_{resting}\), \(AP{D}_{90}\), \(AP{D}_{50}\), \(AP{D}_{tri}\), \(Ca{D}_{90}\), \(Ca{D}_{50}\), \(C{a}_{tri}\), \(C{a}_{Diastole}\), qInward, and qNet). The training process utilized fivefold cross-validation, retaining the model with the smallest validation loss for testing. Following the Comprehensive in vitro Proarrhythmia Assay (CiPA) criteria, testing involved 10,000 iterations, each comprising 16 drug datasets showing different drug sample combinations15,29,55. Each drug combination test determined the classification system’s performance for each model. Subsequently, considering the results from the feature importance plot, biomarkers were eliminated one by one according to their overall importance level.The ANN model showed the highest Area Under Curve (AUC) classification performance when \(V{m}_{resting}\) was removed, achieving AUC of 0.92 (0.88–0.96) for high-risk levels, 0.83 (0.73–0.9) for intermediate-risk levels, and 0.98 (0.95–0.98) for low-risk levels (Table 2). However, it should be noted that there was variability in AUC for predicting intermediate-risk levels due to similarities between high and intermediate risk groups. The analysis of critical features also revealed this differentiation, with the feature contribution to high-risk prediction being lower than for intermediate risk. In the ANN model, removing \(Ca{D}_{90}\) resulted in a performance decrease to 0.79, indicating that optimal accuracy was achieved using only eleven biomarkers.Meanwhile, other classification models like Random Forest (RF) and XGBoost are discussed in Table 2. Optimal system performance in the Random Forest model was achieved by eliminating six biomarkers: \(C{a}_{tri}\), \(AP{D}_{tri}\), \(\frac{dVm}{dt}_{repol}\), \(Ca{D}_{90}\), \(Ca{D}_{50}\), and \(\frac{dVm}{dt}_{max}\). The Random Forest classification model yielded the best AUC scores of 0.75 (0.5–0.96) for high risk and 0.71 (0.62–0.81) for low risk (Table 2). Similarly, the XGBoost model showed comparable AUC scores of 0.71 (0.5–0.96) for high risk and 0.71 (0.62–0.81) for low risk (Table 2). Despite removing the six lowest-ranking biomarkers, their accuracy remained unchanged in the XGBoost classification model. Therefore, the overall best system performance in the XGBoost model was achieved by removing only one biomarker (\(C{a}_{tri}\)).The Support Vector Machine (SVM) model with seven biomarkers (\(Ca{D}_{50}\), qInward, \(AP{D}_{90}\), qNet, \(\frac{dVm}{dt}_{max}\), \(Ca{D}_{90}\), \(\frac{dVm}{dt}_{repol}\)) showed varying performance across risk levels. At high risk, SVM achieved an AUC of 0.71, indicating good accuracy but less than that achieved by ANN. For intermediate risk, SVM showed a lower AUC than the ANN model, suggesting that SVM might struggle to differentiate between high and intermediate risk. This is further emphasized by the relatively low positive likelihood ratio (LR+) for high-risk samples. However, for low risk, SVM performed quite well, with an AUC nearly the same as that of ANN.The k-Nearest Neighbors (KNN) model, after eliminating six biomarkers \(\frac{dVm}{dt}_{max}\), qNet, \(\frac{dVm}{dt}_{repol}\), \(C{a}_{Diastole}\), \(V{m}_{Resting}\), qInward), had a performance similar to SVM in terms of AUC for high and intermediate risk but appeared slightly better in predicting low risk. KNN is known to be sensitive to the selected features because its algorithm depends on the proximity of these features in feature space. This result suggests that feature reduction may have facilitated the model in identifying patterns associated with low risk in the data.RBF model showed AUC values similar to SVM and KNN for high risk but stood out with better performance in predicting low risk, indicating that the RBF kernel function might be more effective in mapping features to a higher-dimensional space to distinguish low risk from other risks. Good positive and negative likelihood ratio (LR+ and LR−) results also indicate that the RBF model is quite reliable in classifying high and low risk.These findings highlight the importance of feature selection and the impact of different classification models on the accuracy of drug risk level predictions. Reducing the number of biomarkers as input features in classification models does not always lead to improved classification capabilities. This is because the complexity of interactions among the features can sometimes be nonlinear or not directly apparent. In some cases, eliminating features that seem irrelevant or contribute minimally can reduce the model’s ability to capture essential patterns in the data. In other words, features that are individually considered less significant can contribute to complex interactions with other features, and their removal can diminish the information needed by the model to make accurate predictions. Therefore, in developing classification models, it is crucial to consider the presence of each feature and how these features interact with each other in a broader context.Moreover, the system’s performance was evaluated based on diagnostic accuracy, as suggested by Li and Simundic55,32. The obtained AUC values for the high and low-risk groups are within the “outstanding” evaluation criteria, with AUCs of 0.92 and 0.98, respectively. However, the intermediate risk group falls within the “good” accuracy range for the ANN model, with an AUC of 0.83. Additionally, the positive likelihood ratio (LR+) values are greater than 5 for the high and low-risk groups in the ANN model, classified as “outstanding” according to Li’s criteria and also indicating good performance according to the criteria proposed by Šimundic32.In a more detailed analysis of the diagnostic accuracy performance conducted by Li and Simundic55,32, it is evident that the XGBoost and RF classification models consistently meet the specified benchmarks, providing vital predictive accuracy for high and low-risk levels as evidenced by their Area Under the Curve (AUC), positive likelihood ratio (LR+), and negative likelihood ratio (LR−) values. The SVM model shows commendable AUC scores for high and low risk, yet its performance appears dimmer for intermediate risk, indicating potential limitations of SVM in differentiating between high and intermediate risk levels. This could be related to nuances in selecting hyperparameters or characteristics of the kernel used. SVM’s strength lies in its ability to transform the feature space through kernel functions, but this can also obscure which features are most influential, especially in datasets where features are complexly correlated or intricately intertwined.KNN model, with its intuitive approach based on feature proximity, produces commendable Area Under the Curve (AUC) scores for high and low risk but shows a decrease in performance for intermediate risk. The effectiveness of this model depends on the correct selection of the number of neighbors and the distance metric, where suboptimal tuning can result in reduced sensitivity to the subtle gradations among different risk levels. Challenges for KNN may arise if the data distribution is uneven or imbalanced among the various risk categories.RBF, often used in the SVM framework for its kernel function, shows consistent performance across all risk categories, particularly excelling in identifying low-risk cases. This suggests that the RBF model might adeptly navigate the complexities in the data, effectively separating low-risk cases. However, similar to SVM, the use of the RBF kernel can obscure the interpretability of the model and the significance of individual features.While the performance of these models does not surpass that of the ANN classification model, our findings align with previous research by Yoo et al.15, who developed an ANN model integrating nine in-silico features. This model, encapsulating morphological insights from action potential traces, calcium transient traces, and net ionic charge features, achieved an AUC of 0.92 for high risk, 0.83 for intermediate risk, and 0.98 for low risk15. Although the authors suggest a slightly better system efficacy for the intermediate-risk AUC, their analysis did not delve into the contribution of in-silico features in drug risk level classification using XAI, highlighting an area for further exploration in the nuanced landscape of model-based diagnostics.A more detailed analysis based on the provided table would integrate various AUC, accuracy, and likelihood ratio (LR) values for each model, shedding light on assessing drug risk levels. In the ANN model, removing \(V{m}_{resting}\) improved AUC for high risk to 0.92, but performance for intermediate risk decreased when \(Ca{D}_{90}\) was excluded. In contrast, in the XGBoost, RF, SVM, KNN, and RBF models, eliminating the six lowest-ranked biomarkers did not alter accuracy, indicating that these features might not provide significant contributions or that complex interactions remain uncovered.Comparing these metrics with studies by Yoo et al. and Li et al.15,55, the ANN model is confirmed to have superior diagnostic performance, even with varying features. This evaluation reveals success in identifying high- and low-risk cases, but it also presents challenges in the intermediate-risk scenario. This emphasizes the crucial role of accurate feature selection and how modifying specific features can affect the overall performance of the classification model.Optimal feature selection aims to enhance accuracy and delve into the contribution and interactions among features related to classification outcomes. Therefore, a meticulous evaluation of each feature is essential, considering their significance and their synergistic function within the model to produce the most accurate predictions.Extensive testing involving 10,000 iterations through various classification models using drug datasets has revealed critical insights into the predictive performance and feature relevance of in-silico biomarkers. Specifically, this comprehensive analysis illuminates the impact of feature reduction on the model’s ability to classify different levels of drug risk accurately.ANN model demonstrated optimal improvement in the Area Under the Curve (AUC), particularly for high-risk classification, using only 7 features. This significant reduction in the number of features, supported by relevant p-values, signifies success in filtering out less influential features, resulting in a leaner yet more accurate model. However, the performance showed variability for intermediate and low-risk classifications, suggesting that some eliminated features might contain important information for more accurate predictions at these risk levels.For XGBoost, an increase in AUC for high risk was observed with feature reduction, indicating the effectiveness of feature selection in eliminating redundancy without sacrificing performance. Yet, performance declines for intermediate and low risks highlight the complexities in predictive modeling that require more detailed feature interactions.RF model saw consistent AUC improvements for high risk with feature reduction, indicating the robustness of the RF model in mitigating overfitting and capturing essential patterns with a smaller feature set. However, for intermediate and low risks, a broader set of features may be necessary for accurate predictive modeling.Conversely, SVM exhibited a consistent decrease in AUC with feature reduction, indicating this model’s need for a comprehensive feature set to maintain good predictive performance. This may reflect SVM’s sensitivity to high-dimensional feature space where the model can identify optimal separating hyperplanes.KNN showed varied results; peak AUC was achieved with eleven features for high risk, while for low risk, the model remained robust even with fewer features. This indicates that KNN may require careful feature selection for high-risk but remains resilient for low-risk despite feature reduction.Lastly, the RBF demonstrated increased AUC for high risk with fewer features, signifying its ability to capture complex patterns in the dataset. However, like other models, its performance decreases for intermediate and low risks when features are reduced.Overall, these findings underscore the importance of careful and strategic feature selection in developing efficient classification models. Each model exhibits unique characteristics in response to feature reduction, with no one-size-fits-all rule applicable. Therefore, understanding individual contributions and synergistic interactions among features is crucial in designing robust and reliable classification systems for predicting drug risk in clinical practice.This study has limitations in that the feature importance results from the ANN model are only sometimes clearly interpretable. In some cases, ANN needs to provide an intuitive explanation of feature contributions to prediction outcomes, making it challenging to derive deeper insights from the model and connect them with more robust scientific explanations. The higher complexity of the XGBoost model compared to methods like linear regression makes it a “black box” that is hard to interpret quickly. Despite techniques like feature importance and SHAP values, interpretation can still be complex, especially in highly intricate cases. The feature importance method used in Random Forest tends to be biased toward features with more categories or values, leading to inaccurate assessments of essential features with lower variation. These limitations arise because each method has its approach and characteristics in modeling data. Hence, these limitations underscore the importance of a comprehensive understanding of the data and its context when selecting and applying a particular classification method.In conclusion, this study uses the ANN model to analyze the impact of optimal biomarker selection in predicting drug risk. qInward showed significant influence across all drug risk level predictions. At high risk levels, an extension of action potential duration (\(AP{D}_{50}\) and \(AP{D}_{90}\)) was associated with decreased risk. Conversely, increased \(AP{D}_{50}\) and \(AP{D}_{90}\) at intermediate risk levels contributed positively. A higher calcium concentration at the diastolic stage (\(C{a}_{Diastole}\)) correlated with low and intermediate risk. \(C{a}_{tri}\) negatively impacted the prediction of intermediate risk. The XGBoost model emphasized several biomarkers, particularly qInward, in risk prediction. Reducing less relevant input features or those with minimal contributions to the classification model can diminish the model’s ability to capture significant patterns. Feature selection in classification models is crucial, but reducing the number of input features only sometimes leads to improved model performance. Testing with various classification models and using fivefold cross-validation yielded different results, with the ANN model showing the highest AUC, emphasizing the \(V{m}_{resting}\) feature. However, variability in accuracy occurred in intermediate-risk predictions due to similarities between high and intermediate-risk groups. Limitations in feature interpretation occur in complex ANNs, which tend to be biased towards features with more categories or values. All these limitations highlight the importance of a profound understanding of the data and context in selecting and implementing classification models.

Explainable artificial intelligence (XAI) to find optimal in-silico biomarkers for cardiac drug toxicity evaluation

Repurposing MALDI-TOF MS for effective antibiotic resistance screening in Staphylococcus epidermidis using machine learning

StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

Importance of OCT-derived biomarkers for the recurrence of central serous chorioretinopathy using statistics and predictive modelling

DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification | BMC Bioinformatics

The decomposition of perturbation modeling

Hot Topics

Repurposing MALDI-TOF MS for effective antibiotic resistance screening in Staphylococcus epidermidis using machine learning

StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

Importance of OCT-derived biomarkers for the recurrence of central serous chorioretinopathy using statistics and predictive modelling

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Repurposing MALDI-TOF MS for effective antibiotic resistance screening in Staphylococcus epidermidis using machine learning

StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

Importance of OCT-derived biomarkers for the recurrence of central serous chorioretinopathy using statistics and predictive modelling

DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification | BMC Bioinformatics

Popular Articles

Repurposing MALDI-TOF MS for effective antibiotic resistance screening in Staphylococcus epidermidis using machine learning

StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

Importance of OCT-derived biomarkers for the recurrence of central serous chorioretinopathy using statistics and predictive modelling