In silico comparative analysis of cestode and human NPC1 provides insights for ezetimibe repurposing to visceral cestodiases treatment

Identification of NPC1 orthologs annotated in public databasesParalog NPC1 sequences from Echinococcus multilocularis (EmuJ_001107700 and EmuJ_001107950) and from Hymenolepis microstoma (HmN_003003490 and HmN_003003500), previously annotated as such, were downloaded from the WormBase ParaSite 15.0, and were used as probes to search for orthologs from other tapeworms. Two ortholog NPC1 sequences were recovered for Mesocestoides corti (MCU_005849_RA and MCU_005850_RB), E. granulosus (EgrG_001107700 and EgrG_001107950), Echinococcus canadensis (EcG7_07448 and EcG7_07485), Taenia multicepis (Tm5G011420 and Tm5G011422), Taenia saginata (TSAs00030g04723 and TSAs00030g04725) and Hymenolepis diminuta (HDID_0000769801 and HDID_0000850401). Three ortholog NPC1 sequences were recovered for Taenia asiatica (TASs00067g06239, TASs00067g06241 and TASs01471g12258) and Taenia solium (TsM_000003800, TsM_000245600 and TsM_001088500).All probe and recovered tapeworm NPC1 ortholog sequences were revised based on available RNA-seq data. As shown in Fig. 1, most of the tapeworm NPC1 sequences presented lengths between 878 to 1392 aa. However, eight of them (EcG7_07448, TASs00067g06239, TASs00067g06241, TASs01471g12258, Tm5G011420, TSAs00030g04725, TsM_000245600, TsM_001088500) presented noticeable differences in length and/or in domain composition. Among these orthologs, four (TASs00067g06239, TASs01471g12258, TsM_000245600, and TsM_001088500) missed some domains found in all other NPC1 ortholog sequences, with two of them (namely TASs00067g06239, and TsM_001088500) being shorter than expected, with lengths from 494 to 534 aa, respectively. These four NPC1 genes have their corresponding nucleotide sequences in the end of the assembled contigs, being likely incomplete, and, therefore, were excluded from further analyses. On the other hand, the other four (namely EcG7_07448, TASs00067g06241, Tm5G011420 and TSAs00030g04725) were longer than expected, with lengths from 1679 to 1795 aa. These longer orthologs, presented extra domains in their N-terminal ends, corresponding to domains found in a protein encoding gene located upstream to NPC1 gene in other related species. Assuming that this was due to misannotation, the sequence from the extra domains were excluded, while the remaining (corrected) sequences of these four longer orthologs were kept for further analyses (see Fig. 1).Fig. 1Domain composition of cestodes and human NPC1 orthologs. Full-length proteins after correction based on RNA-seq data are represented by horizontal black lines, with indication of their N-terminal and C-terminal amino acids; original amino acid numbering prior to the correction is shown between angle brackets, for those proteins (marked by an asterisk) that required correction due to the presence of extra domains in the N-terminus of the sequence (see text). Predicted InterPro matches are represented by colored bars, with NPC1_N domains shown in blue; SSD domains shown in green; NPC1-like family (IPR004765) in red; and protein patched/dispatched family (IPR003392) shown in magenta. Corresponding WormBase IDs are listed on the right. Domain representations marked with asterisk (*) indicate the presence of extra N-terminal domains, which were excluded from our analyses.Cestode NPC1 ortholog sequences, along with reference human NPC1 and NPC1L1 proteins (O15118 and Q9UHC9, respectively), comprehending an overall set of 20 NPC1 sequences, were functionally analyzed for family and domain predictions, and the generated results are summarized in Fig. 1. Two characteristic domains of NPC1, namely the NTD (NPC1_N, Pfam: PF16414) and the SSD (Pfam: PF12349), were predicted for all assessed orthologs. These results suggest that the 18 cestode sequences may be true NPC1 orthologs providing reliability to our initial dataset. Further corroborating that, subcellular localization analyses (see Supplementary Table S1) performed for NPC1A and NPC1B cestode orthologs have essentially predicted their association to the cytoplasmic membrane, as expected for true cholesterol transporting proteins.In line with the functional predictions, a local NPC1 database with 3114 proteins was built based on the presence of the NTD for following evolutive analyses. This NPC1 database was filtered, keeping the sequences with lengths between 1000 and 1500 amino acids, and with ≥ 30% identity with at least one of the reference NPC1 proteins (i.e., those from E. multilocularis, H. microstoma, or H. sapiens). Overall, a filtered subset of 1142 non-redundant NPC1 sequences was established, representing 15 phyla from Animalia kingdom, and the Plantae and Fungi kingdoms.Phylogenetic analysesThe established filtered subset of 1142 non-redundant NPC1 sequences was aligned based on protein structure information, and the resulting alignment was analyzed and edited to exclude non-informative sites. As shown in the Supplementary Fig. S1, the performed analyses and edition decreased the overall length of the alignment in 79.75%, based on a threshold that excluded sites with less than 90% of informative characters. The final alignment of NPC1 sequences, post exclusion of non-informative sites, was then compared to the original dataset retrieved from HMMER (3114 sequences), as well to the filtered subset of non-redundant NPC1 sequences (1142 sequences). The comparison between the identity profiles of the six reference NPC1 sequences (two from E. multilocularis, two from H. microstoma and two from H. sapiens) (Supplementary Fig. S2) demonstrated the representativeness of the resulting final alignment against the precursor datasets. The established final alignment of NPC1 sequences were then used to generate the corresponding nucleotide retroalignment and they were then used for the subsequent evolutionary analyses.The best fit evolutionary models for our datasets were determined as WAG + I + G + F, for the final amino acid aligned subset, and as GTR + I + G, for the nucleotide retroalignment. The construction of phylogenetic trees was carried out using the maximum likelihood method, and branches with support lower than 0.8 were collapsed. Two good quality trees, one based on amino acid alignments, and one based on nucleotide alignments, were reconstructed. Virtually the same phylogenetic relations, with well resolved clades, were observed in both trees at the kingdom and phylum levels (Supplementary Fig. S3) and a representative overview of these established relationships based on the amino acid tree is shown in Fig. 2. Most of the NPC1 sequences were resolved in well-defined monophyletic clades at the phyla. An apparent monophyletic clade, of basal position in the tree, was formed by NPC1 sequences from the Plantae and Fungi kingdoms and was regarded as an outgroup. Within the large clade comprehending the NPC1 sequences from the Animalia kingdom, two similar paraphyletic clades (regarding lower taxonomic levels) were formed for the Chordata phylum, with the Echinodermata NPC1 sequences appearing basal to both. The organization of Chordata NPC1 sequences (and those from Echinodermata) in two clades separated their NPC1 and NPC1L1 sequences. The two species of Annelida included in our analysis grouped separately and unspecifically with other phyla, likely due to the weak Annelida sampling representation in our dataset.Fig. 2Overall architecture of the NPC1 amino acid based phylogenetic tree. (A) Cladogram showing the overview of the predicted clades at the phylum and kingdom levels. Branches with support value below 0.8 were collapsed. Total numbers of sequences (leaves) in each represented taxon are indicated. Branch lengths are arbitrary. (B) Phylogram, with branch lengths drawn to scale, representing the complete NPC1 amino acid tree topology, with branches colored according to the tree in A.The representation of the phylogenetic relationships within the Platyhelminthes clade can be seen in Fig. 3. Free-living species are more basal in this clade. The trematode species formed a monophyletic clade, but it did not present a well-resolved relationship with the other Platyhelminthes subclades. Two clades were formed for the cestode species, each essentially formed by one of the two paralogous NPC1 sequences found in these species. The cestode groups were named NPC1A and NPC1B as shown in Fig. 3. These results suggest a duplication event in a common ancestor of cestodes, resulting in two paralogous proteins that may undergone some degree of functional specialization.Fig. 3Phylogenetic relationships within the Cestoda clade. The amino acid based phylogeny followed the WAG + I + G + F evolutionary model, and the nucleotide based phylogeny followed the GTR + I + G evolutionary model. Branches with support value below 0.8 were collapsed. The inferred NPC1A and NPC1B subclades are colored accordingly to the legend. WormBase IDs are shown between brackets, and UniProtKB ids between square brackets. Trematoda and free-living cestodes clades relationships are also indicated.Selection pressure analyses NPC1 orthologsIt would be expected that the existence of functional divergence between NPC1A and NPC1B would be the consequence of different selective pressures acting on them since duplication. To investigate this selection pressures, several methods from Datamonkey server were used. The BUSTED method provided evidence of the occurrence of diversifying selection in the cestode NPC1A and NPC1B clades. For both clades the method predicted that at least one site has undergone diversifying selection. To identify which NPC1 sites are under selection, three different statistical methods were used to analyze the whole tree: FUBAR, SLAC and FEL. Overall, 1151 out of 1157 NPC1 sites were predicted as under purifying selection by FUBAR, while the other two methods predicted 1156 sites in this condition, being the remaining ones under neutral selection. No evidence of diversifying selection was pointed out by any of the used methods.The FEL method, which allows the analysis of individual clades, was then used to separately analyze the whole Platyhelminthes clade, as well as the cestode NPC1A and NPC1B subclades. The FEL results are summarized in Fig. 4. In the whole Platyhelminthes clade, the performed FEL analysis predicted one site under diversifying selection, and 944 under purifying selection. In the NPC1A subclade, 6 sites were predicted as under diversifying selection, and 803 sites were predicted as under purifying selection. Finally, in the NPC1B subclade, 9 sites were predicted as under diversifying selection, and 721 were predicted as under purifying selection.Fig. 4Darwinian selection analyses through Datamonkey tools. (A) Representation of whole alignment generated by PROMALS3D to NPC1 sequences. Excluded regions are shown in grey, while retained ones are in red. Final edited alignment used in our analyses, containing only informative sites, is represented below. (B) Semi-circular view of FEL charts, showing synonymous (α) and non-synonymous estimated rates (β) at each site as bars. The plotted line indicates the estimates under the null model (α = β). Estimates above 5 are censored at this value. Predicted type of evolutive selection pressures are also indicated for each site, accordingly to the legend. The FEL charts related to four analyzed groups are represented, namely the whole tree (ALL), Platyhelminthes clade (PLATY), cestode NPC1A subclade (NPC1A) and cestode NPC1B subclade (NPC1B). Alignment retained sites from (A) are represented by the red circle, and predicted domains are shown. Internal connections in the red circle represents coevolving interactions between sites of the alignment, predicted by Bayesian Graphical Model (BGM). SSD, sterol-sensing domain; Ptc/Disp, protein patched/dispatched family. The semi-circular view was generated with Circos 0.69–9 software package (https://circos.ca).The BGM method was used to verify the occurrence of co-envolving sites in the whole NPC1 amino acid tree. As also shown in Fig. 4, 492 pairs of sites were found under coevolution by the performed analysis. However, it was not possible to identify a clear correlation between these sites and the different functional domains predicted for the aligned sequences. The RELAX method was then used to evaluate relaxation of the predicted selective strengths (diversifying and purifying) for NPC1 sequences. In the comparison between the NPC1B subclade against the NPC1A as a reference subclade, the performed analysis indicated relaxation in the selections acting on the NPC1B (K = 1.16, with p = 0.05 and LR = 3.85), indicating that NPC1A is under the effect of stronger selective forces. These two subclades were also individually tested against the trematode clade as reference, but, in both comparisons (NPC1A vs. trematodes, and NPC1B vs. trematodes), there were no statistically significant prediction of relaxation (p < 0.05). The NPC1A subclade was then tested against the whole NPC1 phylogenetic tree as reference, and, again, there was no statistically significant prediction of relaxation (p < 0.05). On the other hand, the comparison of the NPC1B subclade against the whole NPC1 amino acid tree indicated relaxation of the predicted selective strengths acting on NPC1B sequences (K = 0.82, with p = 0.005 and LR = 7.89).Overall, these results suggest that the proteins of the NPC1A and NPC1B subclades are under different selective pressures, corroborating the separation of these two subclades in the NPC1 tree topology and suggesting functional specialization.Comparative structural analyses between E. multilocularis NPC1A and NPC1B, and human NPC1L1To investigate whether the proteins of the NPC1A and NPC1B subclades underwent functional divergence, we performed comparative structural analyses. We took the E. multilocularis NPC1A and NPC1B (henceforth EmNPC1A and EmNPC1B) proteins as references for these structural analyses. Among available structures in PDB, the NPC1L1 structure of Rattus norvegicus (PDB ID: 6V3H21; henceforth RnNPC1L1) was one of the few that presented the whole protein structure, and it was also conveniently experimentally determined in complex with an ezetimibe analog. It is also in agreement with previous results24 that demonstrated that the interactions with Phe532 and Met543 are critical to the high-affinity binding of ezetimibe with NPC1L1. Moreover, it presented enough homology with the EmNPC1A (89% of query cover and 37.75% of identity) and EmNPC1B (94% of query cover and 34.43% of identity), as well as with the human NPC1L1 (henceforth HsNPC1L1; 98% of query cover and 76.36% of identity), being suitable to use as a template for homology modelling with good quality according to MolProbity validation (Supplementary Data 3–5). Overall, the models generated for EmNPC1A and EmNPC1B and for HsNPC1L1, adopted the expected NPC1 folding (Fig. 5). As expected, deviations in the structures were seen in the cytosolic moieties of EmNPC1A and EmNPC1B, which showed a higher degree of disorder. Such deviations were likely caused by two small structural gaps in RnNPC1L1 and a third larger stretch corresponding to an insertion in the E. multilocularis proteins, with fewer amino acids in EmNPC1B (Asn697 to Leu729), and longer and more evident in EmNPC1A (Val686 to Leu758). This additional stretch could be identified in the sequences of all cestode NPC1A and NPC1B orthologs included in our phylogenetic analyses.Fig. 5Homology modelling generated structures. Overall comparison of the structures indicated that all models adopted the expected fold. The cytosolic moieties of E. multilocularis structures were more unstructured than expected, mainly because of an insertion in this region that is consistent through all cestode sequences included in this work. Structure of the R. norvegicus template (RnNPC1L1) and the generated structures through modelling to HsNPC1L1 and EmNPC1A and EmNPC1B are shown in ribbon and surface representations. Ribbon colors indicate the three luminal domains: In blue the N-terminal domain, in green the middle luminal domain and in pink the C-terminal luminal domain. Cytosolic and transmembrane domains are colored in cyano. The membrane region is represented by the dotted highlighted box.To assess the degree of conservation of the cholesterol delivering tunnel among generated models, cross-sections views focusing on this feature were generated (Fig. 6). The tunnel structure was very conserved in HsNPC1L1 model in comparison to the RnNPC1L1 template, with minor local structural variations. The EmNPC1B model presented less conservation, with variations distributed along the whole tunnel structure in comparison to the same template. The HsNPC1L1 and EmNPC1B models, presented both a continuous delivering tunnel, in structures that seem to be functional, with enough inner space to cholesterol transport. The EmNPC1A model, however, presented major changes, with an interruption of the delivering tunnel that could restrain cholesterol transport. MOLEonline was used to measure the tunnel radius in this region (Supplementary Fig. S4), confirming a significant alteration in EmNPC1A (EmNPC1A radius: 1.6 Å, EmNPC1B radius: 3.6 Å, HsNPC1L1 radius: 3.8 Å) This interruption occurred in the same region of the expected ezetimibe binding site and is apparently caused mainly by two amino acid substitutions among the proteins. The first substitution is in the N-terminal domain of the proteins, being a change from an alanine in RnNPC1L1 template and in HsNPC1L1 model (A180 in both), to a serine in EmNPC1B model (S165) and to a phenylalanine in EmNPC1A model (F148). The second one is in the C-terminal luminal domain, being a change from an alanine in RnNPC1L1 template, in the HsNPC1L1 model (A1032 in both) and also in EmNPC1B model (A1066), to a glutamine in the EmNPC1A model (Q1079). Therefore, in both cases EmNPC1A undergoes substitutions of simple and small amino acids for larger and more complex ones, which would explain the observed tunnel interruption. The same substitution pattern was observed for NPC1A and NPC1B of most of the cestodes included in our analyses (data not shown). These results are suggestive that EmNPC1A has undergone some degree of functional divergence when compared to the EmNPC1B, HsNPC1L1 and RnNPC1L1 proteins, possibly losing its ability to transport cholesterol.Fig. 6Conservation of the cholesterol delivering tunnel of NPC1 models. The cross-section visualization of the NPC1 proteins allows us to analyze the tunnel whereby cholesterol molecules are transported through, from the extracellular environment to the plasma membrane. The RnNPC1L1 template and the three modelled NPC1 whole structures are shown in ribbons and surface representation, with the cholesterol delivering tunnel structure showed in detail. The ezetimibe analog present in RnNPC1L1 experimental determined template structure is shown, and its location is indicated in the modelled NPC1 protein structures. HsNPC1L1 and EmNPC1B generated model structures presented an apparently functional delivering tunnel, while EmNPC1A model presented an interruption in it, that may restrain the cholesterol transportation.Molecular docking analyses were than carried out to verify whether ezetimibe could interact with EmNPC1A, EmNPC1B, HsNPC1L1 proteins. The RnNPC1L1 template, with structure already determined in a complex with an ezetimibe analog (Fig. 7A) was used as reference. The HsNPC1L1 docking with ezetimibe produced a very similar interaction pose (Fig. 7B) to that of the RnNPC1L1 template, with an affinity prediction of − 10.189 kcal/mol. Most of the amino acids involved in the interaction with ezetimibe in RnNPC1L1 complex (9 out of 12) were also observed interacting with the ligand in the resulting HsNPC1L1-ezetimibe docked complex, including the interactions with Phe532 and Met543, which have been described as critical to the high-affinity between RnNPC1L1 and ezetimibe.Fig. 7Molecular interactions between ezetimibe and NPC1 models submitted to docking. NPC1 proteins are shown in ribbon representation, with ezetimibe, ezetimibe analog and protein interacting amino acids shown as sticks. (A) Interactions between RnNPC1L1 template, shown in cyano, and the ezetimibe analog, shown in yellow, as available on the PDB databank (PDB ID: 6V3H). Interactions between the NPC1 protein and regions of the analog that are not present in the ezetimibe molecule were omitted. (B) Interactions between HsNPC1L1 protein, shown in blue, and the docked ezetimibe molecule, shown in green (affinity prediction = − 10.189 kcal/mol). Interactions are very conservated in respect to the template (A). (C) Interactions between EmNPC1B protein, shown in pink, and the docked ezetimibe, in green (affinity prediction = − 9.376 kcal/mol). Less interactions were observed in this complex, and it presented some relevant amino acids changes when compared to the template (A) and human (B) complexes. A simplified alignment of the sequence, predicted secondary structures, and interactions showed in the regions depicted in this figure is shown in Supplementary Fig. S5.The docking of EmNPC1B with ezetimibe also resulted in a similar interaction pose (Fig. 7C), with an affinity prediction of − 9.376 kcal/mol. However, fewer ezetimibe-amino acid interactions were observed, but still sharing 7 out of the 12 interactions observed in the complexes with the RnNPC1L1 template or HsNPC1L1. Also, some changes were observed in the interacting amino acid. The interacting amino acids Met543, Ala548 and Asn1022 in the RnNPC1L1 template changed to a Leu510, an Ile515 and an Asp1057 in EmNPC1B-ezetimibe complex, respectively. This could have some impact in the affinity between EmNPC1B and ezetimibe, since, as previously mentioned, the Met543 from the RnNPC1L1 template is known to be critical in the interaction with ezetimibe. Moreover, the also critical RnNPC1L1 Phe532 has no counterpart in the EmNPC1B-ezetimibe complex.It was not possible to perform ezetimibe docking with the EmNPC1A, as its ezetimibe binding site was blocked by the amino acids that interrupt the cholesterol delivering tunnel.

In silico comparative analysis of cestode and human NPC1 provides insights for ezetimibe repurposing to visceral cestodiases treatment

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Quick chemistry crossword #055

High-throughput identification of repurposable neuroactive drugs with potent anti-glioblastoma activity

Hot Topics

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Quick chemistry crossword #055

Popular Articles

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery