Salmonella enterica virulence databases and bioinformatic analysis tools development

Not all the existing 2,600 Salmonella serovars exhibit equal pathogenicity to humans. Specific serovars or strains of Salmonella, especially those in subspecies enterica, are more apt to cause invasive infections in both humans and/or animals27. This feature suggests that these isolates may harbor specific VFs crucial for infection. In this project, WGS data from S. Typhi and 13 different NTS ranging from common causes of human illness (e.g. Enteritidis, Typhimurium and Newport) to thus less common (e.g. Schwarzengrund and Indiana) were analyzed using a database with a non-redundant, comprehensive list of Salmonella VFs and accompanying tools known as VF Profile Assessment and VF Profile Comparison tools. The database was created by compiling existing datasets and conducting an extensive literature review to account for those that were not represented in these databases. The current version of the database contains 594 VFs or putative VFs, including approximately 157 predicted to be located in an SPI-19 on SGI-1 and 21 that commonly located on plasmids (Supplemental Table 3). Some of these plasmid-associated genes, such as the sitoperon, can also be located in the chromosome. Among the SPIs, genes from all 24 currently identified SPIs are included and more details about their functions was recently reviewed15. To establish the Virulence and Plasmid Transfer Factor Database to facilitate the prediction of virulence genes, the nucleotide and amino acid sequences of reference genes, and other related information such as the predicted product, locus tag, and accession numbers, were extracted from GenBank to create the backend reference VF dataset accessed by the analysis tools.Table 6 Genes that are missing in all the isolates analyzed in the serotypes. *The number of serotypes missing the genes. **Abbreviation for serotypes: I,4,[5],12:i:-; AG-Agona; DU-Dublin; ET-Dublin; HA-Hadar; HD-Heidelberg; ID-Indiana; IF-Infantis; JV-Javiana; NP-Newport; SP-Saintpaul; SW-Schwarzengrund; TY-Typhimurium; TP-Typhi. ***Genes that are missing are highlighted in red color.The VF Profile Assessment tool was developed to facilitate the prediction of the presence of VFs in an uploaded sequence and provide detailed information on the nucleotide percent identity and matching location to the reference virulence genes in the database. The results of this tool can be viewed in the program online and/or downloaded and exported into a spreadsheet to facilitate further data analysis. To evaluate the utility of the tool, WGS data from 810 strains from 14 different serovars were combined and analyzed using Profile Assessment tool (Supplemental Table 3). An observed sequence diversity among individual virulence genes present in strains/serovars could offer valuable insights into their effects on host and/or tissue specificity, gene expression, and other related factors. For example, differences in the percent identities to reference genes among the various fimbrial gene (e.g., bcfoperon, Supplemental Table 3) across different serovars may influence their tropism to interact with host epithelial cells28. The fimbriae can play key roles in adhesion to host tissues. With the capability of preferential binding to various glycans, fimbriae enhance Salmonella’sability to adhere to different host cell surfaces, whether within the same host (tissue tropism) or across different hosts (host tropism), aiding in the colonization and infection process28,29. Figure 4 clearly showed that many VF similarity profiles within specific serovars cluster closely together while remaining distinct from those of other serovars. The pattern of serovar dispersion is similar to what is observed when examining the presence or absence of genes but reveals a greater diversity of genotypes. These data highlight that beyond simply identifying the presence or absence of particular VFs, understanding genetic diversity is crucial in shaping the pathogenicity and virulence of Salmonella strains.The VF Profile Comparison tool allows users to upload multiple sequence files at once, which facilitated the comparison of multiple sequences simultaneously. The results display a binary matrix output indicating the presence or absence of VF genes across the uploaded sequences. The resultant comparison data can be visualized online in the program window or downloaded for further analyzed using other software programs. In this current assessment, WGS data of 43,853 Salmonella isolates from 14 different serovars were analyzed, extracted and collated in a spreadsheet and uploaded into BioNumerics for further analyses. The resulting PCA output demonstrated that isolates belonging to the same serovars predominantly clustered together (Fig. 5), suggesting a high degree of similarity in VF profiles within individual serovars and but diversity between the different serovars, with some exceptions noted above. Some of the key factors that drive the diversity among the serovars are the fimbrial operons present in the respective serovars (Supplemental Table 3). These phylogenetic results were consistent with the clustering generated by the Profile Assessment tool.To assess the utility of the database for Salmonella characterization, differences in the VF profiles for the strains from the different serovars were compared in detail. Not surprisingly, the PCA results show that the S. Typhi isolates (n = 1, 536) separated from the NTS serovars (n = 42, 317). There were 28 genes present in the majority (more than 97%) of S. Typhi isolates but absent from the majority of other serovars that belong to three different gene clusters and their detailed functions are listed in Table 3. These included genes encoding a S.Typhi-specific fimbriae, type VI pilus, and the Vi antigen which are important for their pathogenesis30,31,32. Indeed, the Vi antigen production distinguishes S. Typhi from the NTS Salmonella32.Table 7 The list of the genes that are in majority (more than 97%) of S. Typhi isolates, but less than 5% in other serotypes studied.The differences in the VF profiles of the representative isolates with the predominant virulence profiles from S. Typhimurium and S. I,4,[5],12:i:- were analyzed, and the genes unique to each isolate are listed in Table 3, along with their overall prevalence in these two serovars. Although there are 31 virulence genes listed as different between the predominant virulence profiles of these two serovars; with the exception of fljA, the overall presence rates of the other genes are not significantly different between the serovars. This further confirmed that the monophasic variant of S. Typhimurium, S. I,4,[5],12:i:- is closely related to S. Typhimurium. While four genes (allD, gip, hyi, and STM0520) are absent in the predominant virulence profile of S. Typhimurium, their presence rates in all the S. Typhimurium isolates analyzed in this study are more than 65%. Meanwhile, the other 27 VFs are absent in the predominant profile of S. I,4[5],12:i:- and present in the predominant virulence profile of S. Typhimurium, but their presence rates in all the S. Typhimurium isolates analyzed are relatively low. Except for fljA, the presence rates of each of the other 26 genes in the S. Typhimurium isolates is less than 60% (Table 3). The reasons for this phenomenon are that the VF profiles of S. Typhimurium are notably diverse, with a total 227 distinct VF profiles identified among the isolates analyzed (n = 1,081), and the predominant profile of S. Typhimurium are only present in 27.94% (302/1,081) of the strains. Notably, the majority of the 27 genes that are absent in the predominant VF profile of S. I,4,[5],12:i:- are located on pSLT virulence plasmid or SGI1. The spv locus (genes spvABCD and spvR), which is strongly associated with strains that cause NTS bacteremia and not present in typhoid strains, is missing in the majority (around 70%) of S. I,4,[5],12:i:- and 40% of S. Typhimurium isolates. The spv operon is associated with the survival and proliferation of Salmonella spp. within macrophages33. It encodes the primary virulence factors associated with serovar-specific virulence plasmids in S. enterica. The loss of the spv region eliminates the virulence phenotype of the serovars in their animal hosts and frequently in the mouse model, introducing a pSV (Salmonellavirulence plasmid) into a serovar naturally lacking it does not enhance the virulence properties of the strain, which implies that other chromosomally encoded factors are essential for the virulence phenotype19. The low presence rate of the spv locus in Salmonella shown in this study is consistent with earlier research indicating that only a small fraction of Salmonellaserovars contain this virulence operon34,35. Another operon that is missing from the predominant virulence profile of S. I,4,[5],12:i:- and has a low presence rate in both serovars is the pef fimbrial operon (plasmid-encoded fimbriae), which is responsible for the adhesion of Salmonellaspp. to the surface of various cell lines15,36. Since most plasmids impose fitness costs on their hosts, the loss of the plasmid-encoded VFs in Salmonella isolates may have evolutionary advantages that have resulted in its emergence over the past decade. Also, the genes located on SGI-1, a genomic island containing an antibiotic resistance gene cluster, are missing in 99% of the S. I,4,[5],12:i:- isolates and exist in only around 31% of S. Typhimurium isolates. Other VFs that have lower presence rates in S. I,4,[5],12:i:- include fliA, rck, and traT. fljA encodes an inhibitor of fliC, which encodes a phase 1 flagellin protein, FliC, that is important to flagellar motility and biofilm formation37. This result is consistent with the previous finding that S. I,4,[5],12:i:- is closely related to S. Typhimurium but lacks the expression of fliA and fljB(encoding phase 2 flagellin) common to all Typhimurium isolates24. rck is located close to the pefoperon on pSV, and it encodes a protein with resistance to complement killing that can recruit various complement inhibitors to resist the attack of the innate immune system and has been implicated in the invasion of epithelial cells15. traTencodes a 27 kDa protein that imparts weak resistance to serum killing and is a component of the plasmid transfer region15.The major difference between the VF profiles of S. I,4[5],12:i:- and S. Saintpaul is the presence of a fimbrial gene cluster (stkABCDEFG) that occurs in about 67% of S. Saintpaul isolates, but only about 3% in S. I,4[5],12:i:- (Supplemental Table 4). The stk fimbrial operon encodes putative Stk fimbriae and was initially reported to be specific for S. Paratyphi A38. However, subsequent studies revealed the presence of this operon in other NTS, such as S. Heidelberg, and S. Kentucky38. Our results showed that this operon has high presence rates in serovars Hadar, Indiana, and Heidelberg with a presence rate of more than 99% in S. Hadar and S. Indiana and more than 97% in S. Heidelberg (Supplemental Table 4). The presence rate in the rest of the serovars analyzed in this study was around or less than 1%. The genes that are missing in almost all S. Saintpaul isolates, but present the great majority of S. I,4[5],12:i:- are the T3SS effectors sseK1 and sseK3.These effectors encode SseK proteins, which are reported to help inhibit antibacterial and inflammatory host responses15,39. While the presence rate of sseK1 and sseK3 in S. Saintpaul is only about 3%, another gene variant, sseK2, is detected in more than 99% of all serovars analyzed in this study, including S. Saintpaul (Supplemental Table 4). In all the isolates analyzed in this study, the presence rate of sseK1 and sseK3 is consistent, either more than 99% or less than 3% in a particular serovar. This phenomenon is logical given the collaborative inhibition of the NF-κB signaling pathway by SseK1 and SseK3 during Salmonellainfection in macrophages19. Although SseK2 can inhibit TNF-α-induced NF-κB reporter activation, its impact on the NF-κB pathway during Salmonellainfection in macrophages is minimal15. Further research is needed to explore the role SseK2 plays in Salmonella virulence, considering its high prevalence in S. enterica strains. The other two genes, gogBand STM2585, encode T3SS effectors that are involved in the inflammatory response40.The differences in the major VF profiles between serovars S. Enteritidis and S. Dublin are due to 12 genes (Table 5). The plasmid encoded fimbriae genes (pefACD), and resistance to complement killing (rck) are missing from more than 99% of the S. Dublin isolates. Conversely, the genes sciR, sciS, tssA, and xis are missing from the majority isolate of S. Enteritidis, but present in greater than 97% of S. Dublin. The genes sciRS, and tssA encode a T6SS, which is a contact-dependent contractile apparatus that contributes to Salmonellacompetition with the host microbiota and its interaction with infected host cells41,42,43,44. These findings highlight that these S. Enteritidis lack the T6SS, which is consistent with the previous finding that several genomic islands appear absent or degenerate in S.Enteritidis45.Salmonella virulence systems are very complex, as many genes are involved in contributing to their virulence. Numerous VFs, including adhesion molecules, invasins, lipopolysaccharides, polysaccharide capsules, iron acquisition factors, host defense-subverting mechanisms, and toxins, have been identified in Salmonella, and these VFs play different roles during infection to enable the bacterial cells to colonize the host, disseminate, and cause disease15. The difference in the presence/absence of the virulence genes in each isolate/serovar might indicate their relative virulence to humans or other animal species. Therefore, the development of enhanced tools to identify Salmonella VFs can help to predict virulence potential and explain the observed clinical disparities in disease pathogenesis, which is important to understand risks associated with different Salmonella genotypes.

Hot Topics

Related Articles