An expert rule-based approach for identifying infantile-onset Pompe disease patients using retrospective electronic health records

The study comprises 6 IOPD patients and screened 93,365 subjects. First, exploring the data presents the gender distribution of patients, revealing three males and three females among the IOPD group. In contrast, the screened subjects consisted of 48789 males and 44,404 females. Figure 2 presents a world map showing the population distribution of patients in this study, where the star marker represents IOPD patients. As can be seen from the map, the patient population shows homogeneity, as the majority of the study population appears to be from the West Asia region.We reviewed past literature to explore symptoms of IOPD mentioned by medical field experts to explore and understand the clinical manifestations. Our findings show that IOPD has a broad spectrum of symptoms. However, the following symptoms were the most common symptoms identified and discussed in the past literature19,20,21,22,23,24,25,26,27,28: cardiomegaly, hypotonia, feeding difficulties, respiratory infections, ventilator dependence (breathing difficulties), cardiomyopathy, muscle weakness, and elevated levels of CK.We explored the data to analyze the distribution of the symptoms mentioned above for patients in our data. Our findings are summarized in Figs. 3 and 4. Figure 3 compares the CK levels in IOPD patients versus screened patients, which revealed significant differences, underscoring the profound impact of this genetic disorder on muscular integrity. The mean CK level in IOPD patients was markedly higher (418.0 U/L) than in the screened group (245.17 U/L), indicating an elevated muscle damage or turnover baseline in the IOPD patients, which is consistent with the known pathophysiology of IOPD, where lysosomal dysfunction leads to glycogen accumulation in muscle tissues, resulting in cellular damage29.Figure 4 presents the prevalence of symptoms identified in the reviewed literature in our dataset. It can be noted that cardiomegaly and cardiomyopathy are the most prevalent symptoms in IOPD patients, which aligns with IOPD’s characteristic of causing hypertrophic cardiomyopathy. Feeding difficulties, hypotonia, and respiratory infections are also significantly more common in IOPD patients compared to screened subjects. In this study, muscle weakness and ventilator dependence were not observed among the IOPD patients. This absence of reported cases is because none of the patients diagnosed with IOPD presented with these specific symptoms during our research period and data.Figure 4Prevalence of symptoms in IOPD and screened patients.In addition to the \(\chi ^2\) test, we further analyzed the associations between IOPD and various symptoms, presented in ICD-10 code format, using Cramer’s V statistic, a measure derived from the \(\chi ^2\) statistic that quantifies the strength of association between categorical variables. The results, presented as a heatmap of Cramer’s V values in Fig. 5, offer an insightful visualization of the potential associations between clinical features and Pompe disease. The values range from 0 to 1, where 0 indicates no association and 1 indicates a perfect association.Figure 5Cramer’s V correlation matrix of symptoms. The heatmap displays the strength of association between various symptoms and IOPD, with values ranging from 0 (no association) to 1 (perfect association). Asterisks (*) indicate statistically significant associations (\(p \le 0.05\)).Although Pompe’s disease symptoms are very well known in the existing literature, we wanted to explore the strength of the relationship between symptoms and Pompe diagnosis within our dataset by employing the Cramer’s V statistic. As shown in Fig. 5, there is a strong correlation between Pompe and congenital hypotonia [P94.2], unspecified cardiomyopathy [I42.9], and hypertrophic cardiomyopathy [I42.2]. These findings affirm the dataset’s alignment with established clinical understandings of Pompe disease. Statistically significant associations (\(p \le 0.05\)) are denoted by asterisks in the figure. It’s worth noting that some relationships may not appear as strong as expected due to limitations in the dataset’s comprehensiveness, despite their known clinical significance in Pompe disease.Interestingly, our analysis revealed that several symptoms exhibit statistically significant associations (\(p \le 0.05\), denoted by asterisks in the figure) with Pompe disease, even when their Cramer’s V values are relatively small (< 0.1). This phenomenon is not uncommon in large datasets or complex medical conditions and underscores the importance of considering both effect size (Cramer’s V) and statistical significance (p-value) in interpretation. For instance, cardiomegaly [I51.7] shows a Cramer’s V of 0.052 but is statistically significant, indicating a weak but non-random association with Pompe disease.Building upon the insights from Cramer’s V analysis, we progressed to a more focused feature selection phase to identify potential markers strongly associated with Pompe disease. In our study, we used the importance of the RF feature to find the most important symptoms (features) associated with IOPD. This method helped us pinpoint which symptoms are most strongly linked to the disease. RF feature importance technique is good at identifying the most important features without getting confused by the less important ones. The most important features identified through this method were Cardiomyopathy, Unspecified [I42.9], Other Hypertrophic Cardiomyopathy [I42.2], Pneumonia Due to Other Specified Bacteria [J15.8], Congenital Hypotonia [P94.2], Enterovirus Infection, Unspecified [B34.1], Other Viral Infections of Unspecified Site [B34.8], Chronic Systolic (Congestive) Heart Failure [I50.22], Acute On Chronic Combined Systolic (Congestive) And Diastolic (Congestive) Heart Failure [I50.43], Dilated Cardiomyopathy [I42.0], Unspecified Lack of Expected Normal Physiological Development in Childhood [R62.50].Figure 6 presents an analytical delineation of symptoms associated with IOPD identified by an RF feature importance analysis. The symptoms are represented by bars, with the intensity of the green color distinguishing between the commonality of symptoms and their identification by the RF analysis alone. The dark green bars denote symptoms identified by the Random Forest feature importance method and are widely reported in the literature, affirming their established presence in IOPD clinical profiles. Conditions such as “Cardiomyopathy” and “Other Hypertrophic Cardiomyopathy” exhibit the most significant prominence, underscoring their significance in the disease’s symptomatology.Figure 6Top 20 important features identified by Random Forest feature selection.We further applied the RF feature importance test to each unique symptom identified through RF to explore their associations with other symptoms. Figures 7,  8 and 9 presents network diagrams to illustrate the relationship between all identified symptoms in the feature importance analysis and all other symptoms available in the dataset. The dark blue nodes, as central nodes, represent the primary symptoms of IOPD as identified through feature importance analysis. The yellow nodes are secondary symptoms associated with the primary IOPD symptoms, as identified through the same analysis. Their connection to the central nodes suggests a clinical association that may be consequential or contributory to the presentation of the primary symptoms. These associations may not be causal but could indicate common comorbidities or complications arising from the primary symptoms of IOPD. The complete network diagram can be accessed here https://zahir2000.github.io/pompe.github.io/network.The network diagram depicted in Fig. 7 represents the complex cardiac complications that can arise due to the progressive nature of this lysosomal storage disorder. IOPD often leads to cardiac hypertrophy and progressively impairs cardiac function. The central node, “Unspecified Combined Systolic and Diastolic Heart Failure”, indicates the co-occurrence of both systolic (the heart’s ability to pump blood) and diastolic (the heart’s ability to fill with blood) dysfunction, which is particularly challenging to manage in IOPD due to the glycogen accumulation in the heart muscle. Adjacent to this is “Acute on Chronic Combined Systolic and Diastolic Heart Failure”, denoting an acute decompensation superimposed on chronic heart failure. The progression to “Chronic Systolic Heart Failure” reflects a deterioration of the heart’s pumping ability, a common consequence of the disease’s impact on cardiac muscle. Moreover, “Secondary Pulmonary Arterial Hypertension” can develop as a result of chronic heart failure, leading to increased pressure in the pulmonary arteries, and is linked to both chronic systolic heart failure and the central node, suggesting it is a common sequela in IOPD patients. This condition can further exacerbate the burden on the heart and complicate the clinical management of patients. Lastly, the node “Acute on Chronic Systolic Heart Failure” captures the episodic worsening of heart function that children with IOPD may experience.Figure 7Symptomatic networks in infantile onset Pompe disease: A Random Forest feature importance analysis identifying core symptoms and related conditions (Cluster 1).The network diagram presented in Fig. 8 underscores the susceptibility of IOPD patients to respiratory infections due to compromised diaphragmatic and intercostal muscle function. The central node, representing ‘Other Viral Infections of Unspecified Site’, could indicate the various respiratory viruses that patients with IOPD are at risk for, considering their already weakened respiratory systems. Connections to specific conditions such as adenovirus infection, which can lead to acute upper respiratory infections and enterovirus infection, highlight the complexity and severity of managing IOPD patients who may develop these infections. The further association with conditions such as ‘Unspecified Asthma with Exacerbation’ and ‘Unspecified Bacterial Pneumonia’ demonstrates the potential for viral infections to exacerbate underlying respiratory conditions or lead to secondary bacterial infections.Figure 8Symptomatic networks in infantile onset Pompe disease: A Random Forest feature importance analysis identifying core symptoms and related conditions (Cluster 2).The central node in the network diagram shown in Fig. 9, “Cardiomyopathy”, is significant as IOPD often involves cardiac issues such as hypertrophic cardiomyopathy due to glycogen accumulation in the heart muscle. This can progress to dilated cardiomyopathy, as represented in the diagram, and be associated with conditions like myocarditis, which can complicate cardiomyopathy. The link to “Congenital Hypotonia” underscores the muscle weakness seen in IOPD, which can lead to motor function delays and other developmental issues. The diagram also includes respiratory conditions like pneumonia due to Streptococcus pneumoniae and other specific bacteria, for which patients with IOPD are at increased risk due to compromised respiratory muscles. Acute bronchitis and acute respiratory failure are connected, highlighting the respiratory complications that are a common cause of morbidity in IOPD. The presence of “Wheezing” further denotes respiratory distress. Furthermore, the nodes related to family indicate the hereditary nature of IOPD, and the inclusion of gastrointestinal appliance fitting suggests the feeding difficulties and nutritional support often required by these patients.Acknowledging the rapidly progressive nature of IOPD, which often results in mortality within the first year of life due to cardiac and ventilatory failure, our study developed a new set of expert-derived rules. The expert rules are designed to augment the efficiency of physicians by refining the patient review process within EHRs. By preemptively identifying and flagging potential cases of IOPD, these rules enable healthcare providers to concentrate their review efforts on a targeted subset of patients. This alleviates the workload for clinicians by diminishing the need for extensive manual chart reviews and significantly conserves time-a critical factor considering the urgent need for rapid treatment initiation in IOPD. Early intervention is critical, as the condition is aggressively progressive, often resulting in mortality within the first year of life if left untreated. Therefore, implementing these expert rules is not merely an administrative convenience but a vital measure to expedite the diagnostic process, enhancing the likelihood of timely therapeutic intervention for IOPD patients. Algorithm 1 presents the logical flow of the expert rules. For a patient to fall under consideration based on these rules, they must be 12 months of age or younger, which aligns with the “infantile onset” classification of Pompe disease, marking a period where the disease’s severe symptoms typically emerge. Rule 1 focuses on the co-occurrence of cardiomyopathy and hypotonia, which are hallmark features of IOPD, reflecting the disease’s profound impact on cardiac and skeletal muscle function. Rule 2’s emphasis on delayed physiological development alongside cardiomyopathy captures the broad spectrum of clinical manifestations, recognizing how Pompe disease can affect overall infant growth and development. Rule 3 leverages the diagnostic value of elevated creatine kinase levels-a marker of muscle damage-combined with cardiomyopathy to identify potential cases based on biochemical evidence of muscle dysfunction. Rules 4 and 5 extend the diagnostic framework to include feeding difficulties and recurrent respiratory or chest infections, respectively, in the aforementioned conditions. These additions acknowledge the multi-systemic nature of IOPD, where feeding and respiratory challenges often compound the primary muscular and cardiac symptoms.Figure 9Symptomatic networks in infantile onset Pompe disease: A Random Forest feature importance analysis identifying core symptoms and related conditions (Cluster 3).Algorithm 1Expert-rules to identify high-risk IOPD patients.The efficacy of the expert rules is further evident when we explore the associations presented in the network diagrams based on the data used in this study in Figs. 7,  8 and 9. As shown in Fig. 9, there is a direct association between cardiomyopathy and congenital hypotonia. This combination of conditions is the first rule in the expert-derived rules. The second rule is the indirect association between an unspecified lack of expected normal physiological development in childhood and cardiomyopathy conditions. Next, pneumonia due to other specified bacteria, a respiratory infection, is directly associated with other hypertrophic cardiomyopathy, the last rule in the expert-derived rules.The study employed a thoroughly designed dashboard to expedite the identification process of IOPD patients using a set of expert-derived rules. Purpose-built to enhance user experience, the dashboard’s layout allows for the seamless visualization of complex patient data, facilitating quick and informed decision-making. As presented in Fig. 10, the dashboard features an organized table that displays patient identifiers, associated diagnostic codes, creatine kinase levels, and the outcomes predicted by the expert rules. It also includes a confusion matrix that explains the performance of the expert rules. Data categories within the dashboard are deliberately chosen for relevance to IOPD diagnosis, including a range of symptoms and biochemical markers such as creatine kinase levels. The expert rules, previously delineated, are integrated within the dashboard, providing a structured approach to patient screening. The performance of these rules is quantified using accuracy metrics-specifically, accuracy, specificity, and sensitivity measures-which are critical in evaluating the reliability of the diagnostic criteria. Descriptive analytics enrich the dashboard, offering a count of patients meeting each symptom criterion and a confusion matrix providing a breakdown of true positives, false positives, true negatives, and false negatives. Enhanced interactivity is a hallmark of this tool; users can filter patient data by age or specific conditions, toggle the display to focus on various predictive outcomes and identify which patients satisfy any given rule. This adaptability makes the dashboard an invaluable asset in navigating the data landscape. As detailed in the methodology, the dashboard development utilized PowerBI for its robust data visualization capabilities and SQL for data manipulation, ensuring high precision and user engagement. The implications of such a tool for research are profound. The dashboard streamlines the data analysis process and provides a flexible platform for generating insights and flagging patients for review. It embodies a significant advancement in the utilization of clinical data, offering a potent means to leverage expert knowledge in the pursuit of improved patient outcomes.Figure 10Developed dashboard with integrated expert rules.Upon implementing the expert rules into the dashboard, our study delineated five true positives (TP) instances of IOPD, one false negative (FN), and four high-risk patients for Pompe disease out of the patients within the extracted EHRs. The expert physician manually reviewed the flagged patients. The true positive cases reaffirm the efficacy of the expert rules in accurately detecting IOPD. Conversely, the false negatives represent instances where the rules did not flag the diagnosis. The case pertained to a patient with concomitant Pompe disease and congenital heart disease, manifesting as ventricular septal defect, patent ductus arteriosus, and atrial septal defect, a clinical emphasis on CHD, overshadowing the Pompe disease diagnosis. Notably, an absence of CK measurement further contributed to the oversight. The high-risk cases for Pompe disease were evaluated and found to have the following diseases: Mitochondrial DNA depletion syndrome 12-A linked to the SLC25A4 gene, Immunodeficiency-71 characterized by a mutation in ARPC1B c.392+1G>C, Niemann-Pick disease type C resulting from a mutation in the NPC1 gene (c.2972_2973delAG) leading to a frameshift (p.Gln99Argfs*15), and Group B Streptococcus (GBS) meningitis. Next, using the dashboard, we identified a single patient with elevated CK levels, hypotonia, and physiological development. Upon reviewing the patient’s records, it was found that the patient had suffered severe hypoxic-ischemic encephalopathy, indicative of significant oxygen deprivation and consequential brain injury. Afterward, we identified patients with cardiomyopathy (true negative cases only) using the dashboard, as cardiomyopathy is the most common and obvious marker. The expert physician manually reviewed the records of the identified patients, and the review results are presented in Table 2. The review shows that none of the (true negative) patients with cardiomyopathy alone have IOPD disease, further proving the efficacy of the expert rules.Our expert rule-based method has significant potential to enhance clinical practice through its integration into existing EHR systems. The system can automatically analyze patient data against predefined criteria for identifying IOPD by embedding these rules using programming languages such as Python or SQL. When patient data meets these criteria, the system generates alerts within the EHR, notifying the attending physician or clinical team to review the case. This automation not only streamlines the identification process but also ensures timely intervention. Furthermore, the expert rules can be incorporated into Clinical Decision Support (CDS) tools, which provide real-time feedback and suggestions to clinicians based on patient data. These tools can link to relevant clinical guidelines and research articles, enhancing the clinician’s understanding and supporting informed decision-making. Training sessions will ensure clinicians and staff can interpret and act on these alerts effectively. This training can be integrated into routine clinical education programs. The successful implementation of this method depends significantly on the quality and comprehensiveness of the hospital’s data. The expert rules are most effective when the hospital’s EHR contains detailed patient records, including the symptoms specified in the expert rules. There is no predefined method for applying these rules; implementing the logic that underpins them correctly is key. Each healthcare institution may need to tailor the rules to fit their specific data structures and clinical workflows. Implementing this method requires a collaborative effort between IT staff, clinical informaticians, and healthcare providers to ensure accurate integration and clinical relevance. Continuous monitoring, feedback collection, and regular audits will be crucial for refining the rules and improving their performance. This iterative process will help optimize the system’s effectiveness in identifying and managing IOPD cases. By leveraging this expert rule-based approach, healthcare institutions can improve the accuracy and timeliness of IOPD diagnosis and treatment, ultimately leading to better patient outcomes and more efficient use of healthcare resources.As discussed above, our study’s findings highlight significant insights into the early detection and diagnosis of IOPD using EHR-based expert rules. The concluding section succinctly summarizes these results and their implications.Table 2 List of patients with cardiomyopathy and their actual diagnosis.

Hot Topics

Related Articles