Identifying vital nodes for yeast network by dynamic network entropy | BMC Bioinformatics

Specific network time division based on time series plateau intervalThe selected 8 genes are subjected to segmentation of the time series plateau interval. A p value of 0.01 is set to segment the time-series data set into 7 plateaus. Since [60, 80] is merged into a plateau interval sheet with [80, 100], we can consider gene expression at [60, 80] as that at [60, 100]. Therefore, we divide the time into [0, 20], [20, 40], [40, 60], [60, 100], [100, 120], [120, 140], [140, 160] seven time periods. The activity rates of the genes stabilized between 60 and 90% across the respective plateau intervals(Fig. 2C), implying that the partitioned plateau sheets all satisfied the requirements. The expressed genes in the seven-time periods are shown in red in Fig. 2D. Since it is not clear the stages of the cell cycle at the beginning of the experiment, the cell cycle periods for the respective periods could not be determined. [0, 20], [20, 40], and [40, 60], in these three intervals, the active genes are mainly expressed as three genes CLN1, CLN2, and CDC28, which are speculated to be probably from the G1 phase to S phase [30]. [60, 100] and [100, 120], the two individual interval regulatory processes are complex, and almost all genes are involved in the expression. Within these two intervals, it is known by CDC5 gene expression that the stage is in the S phase and G2 phase [22]. Also by the fact that the SWI5 gene is not expressed in these two intervals, it was judged that this interval may be the S phase versus the early middle G2 phase [30]. During the interval [120, 140] and [140, 160], CDH1, SWI5, and SIC1 are more strongly expressed, inferring the G2 phase, M phase, and the early G1 phase [31].K2 algorithm to construct the specificity networkBased on the segmentation results of the time series plateau interval of the gene expression data of CLN1, CLN2, CDC28, SWE1, CDC5, CDH1, SWI5, and SIC1, relevant computer programs are written using the BNT toolbox in MATLAB. This specific network over time is shown in Fig. 2D.It is clear that 0–60 min is predominantly the mid-late G1 phase. This process mainly involves the activation of cyclin CDC28 kinase by CLN1 and CLN2 kinases and the accumulation of CDC28-associated proteins. When the CDC28 protein passes a certain threshold, the related genes that it regulates become activated to promote the transcription of CLN1, CLN2, and other genes required for S phase progression. At the same time, CLN1 and CLN2 interact with CDC28 to promote the activation of B-type cyclin-associated CDKs, which bind to CDC28 expressing proteins and promote the transition of the cell cycle from the G1 phase to the S phase [30, 31]. SWE1 is also expressed starting in the late G1 phase [22].Followed by 60–120 min, it is predominantly in the S phase with early G2 phase. Multiple genes are expressed continuously during this period. During the S phase, SWE1-related genes continue to be expressed and accumulate, become sequentially hyperphosphorylated, give rise to multiple isoforms, and then undergo ubiquitin-mediated degradation. Defective septal filament assembly at the bud neck leads to hypophosphorylation and stabilization of SWE1 and, as a result, SWE1-dependent inhibition of CLB-CDC28. In parallel, CDC5-associated genes are expressed and reach a certain number in the G2 phase, and subsequent CDC5-mediated phosphorylation prompts SWE1 downregulation, promoting efficient degradation of SWE1 for efficient activation by CLB-CDC28 [21, 22].From 120 to 160 min is mainly in the mid-late G2 phase, M phase to early G1 phase. SWI5 begins to be expressed during the G2 phase, and the mRNA level of SWI5 peaks in G2/M, with nascent proteins entering the nucleus and promoting the transcription of SIC1 and many other periodically expressed genes. This results in an M/G1 specific transcriptional burst of SIC1, which encodes a potent B-type cell cycle kinase inhibitor. SWI5, SIC1, and CDH1 subsequently dephosphorylate, leading to the inhibition of CDC28 and degradation of cyclins required for mitotic exit. SIC1 and APC activities persist through G1, resulting in a B-type cell cycle kinase deficient state required for the establishment of the pre-replication complex on genomic DNA [30]. It can be known that the constructing network process coincides with the cell cycle, which proves the correctness of our used method to some extent.The resulting network is shown in Fig. 2E. From the experimental results (Fig. 2E), we can yield that the network constructed by the improved K2 algorithm has a total of 34 regulatory relationships. We use the protein interaction relationships of KEGG and corresponding literature as prior information and fuse the results of EVEX data mining to obtain a deterministic relationship network (Fig. 3C) [32, 33]. Comparing the experiments with known networks inferred from the literature indicates that 17 regulatory relationships have been proven in biological experiments, but there are still 17 relationships that have not been proven, with an accuracy rate of 50%. The results are compared with the REVEAL algorithm 36% correct [33] and the DBCMC algorithm 29% correct [34], and the method presented here has a higher correct rate than the REVEAL algorithm and the DBCMC algorithm, implying the method presented here is effective.Fig. 3A Results were tested for normal distribution of gene expression. B Specificity network after selecting by network entropy. C Network of KEGG, EVEX, and relevant literature. D Heat map of correlation coefficients for expression quantities between individual genes. E Simulated network relative error contrastsSelecting of networks using network entropyFirst, the level of gene expression is tested for L normal distribution. We selected all the data with p values greater than 0.05 at the 95% confidence level from the normal distribution test results (Fig. 3A), that is, the selected data are all normally distributed. We subsequently calculate the entropy values of the respective genes using Eq. (1) and select this specific network following the algorithm of network screening for specificity. Considering the number of unproven edges in the network, we divide it into two groups. The first one includes eight uncertain edges that set the threshold for a at 0.3 and threshold b at 100. The other nine edges are the second group with a threshold of a at 0.3 and a threshold of b at 200. The resulting specific network after selection is shown in Fig. 3 B. From the experimental results, after the network entropy selection, eight relationships are added: CDH1 regulates CLN2, CLN1 regulates SWE1, CDC28 regulates SWE1, CDH1 regulates SWE1, CLN1 regulate SWI5, SWE1 regulate SWI5, CDC5 regulate SIC1, SWE1 regulate SIC1. Among them, Skotheim et al. [35] demonstrated that CDH1 mutations can partially salvage G2 stagnation in CLN1/CLN2 dual mutants, indicating that CDH1 regulation of CLN2 may exist. Ahn et al. [36] demonstrated that when using wild-type CDC28, CLN1 overexpression-induced silk formation is significantly reduced in SWE1 deficiency, meaning a certain regulatory relationship among CDC28, SWE1, and CLN. The other sets of relationships have not been experimentally proven, so a definitive relationship cannot be obtained. The above results indicate that the network selected by our method is correct in biological significance, which is helpful for the relationship between gene regulation.Numeric simulation by partial least squares (PLS) for selecting a specific networkTo verify that the selected specific network is mathematically correct, the resulting network is simulated by partial least squares (PLS). PLS integrates the expression data between the gene and other genes to establish a linear equation:$$x_{i} (t) = \beta_{1} x_{1} (t) + \beta_{2} x_{2} (t) + \ldots + \beta_{i – 1} x_{i – 1} (t) + \beta_{i + 1} x_{i + 1} (t) + \ldots + \beta_{n} x_{n} (t)$$where the $x_{i} (t)$ represents the expression level of the gene $i$ at the time $t$; the $\beta_{j}$ represents the coefficient and takes the value of 0 if the gene $j$ is not upstream of the gene $i$; $n$ represents the total number of genes of the specific network.Since the gene expression level conforms to the normal distribution, we use the normal distribution to generate a set of data and then put the data into the established PLS model to compare the average relative error of genes and obtain the network error. Comparing the relative error obtained with the standard network (Fig. 2C), we can see the network relative error of the entropy screening of the known network is better than the other two errors (Fig. 2D), implying that the robustness of the screened network is higher than the other two networks.Node sensitivity ranking based on network entropyFor the screened network, we used network entropy to rank their degree of gene sensitivity. The greater the junction network entropy, the worse its stability. We have to remove the junction that reduces the largest in-network entropy (namely, the node sensitivity is greater), i.e., the greater the increase in network entropy upon inclusion of this gene, the more sensitive it is. We sort the network genes by a specific network sensitive-gene ranking algorithm. The genes are sequentially deleted in specific network evolution based on network entropy, as shown in Fig. 4A. We start this procedure from gene 1, and the rest of the genes are retained. Then, we calculate their entropy cyclically, the sequencing results are shown in Fig. 4B. From the above experiments, the sensitivity of genes sorts from small to large as CDC5, SWI5, SIC1, CDC5, CLN1, CDCD28, and CLN1.Fig. 4A Network evolution of specificity based on network entropy. B Results of gene sensitivity ranking based on network entropyIt is known that if we are to inhibit the activity of this network, then we should preferentially repress the gene CDH1, thus minimizing the entropy of this specific network. CDH1 promotes APC/C production in the late stage of mitosis and serves as an antagonist to the checkpoint of spindle components, guiding the ubiquitination of cell cycle proteins, and resulting in mitotic exit. It targets specific substrates including CDC20p, ASE1p, CIN8p, FIN1p, and CLB5p [15, 37,38,39]. CDH1 plays a crucial role throughout the entire cell cycle, which verifies our results to some extent.

Identifying vital nodes for yeast network by dynamic network entropy | BMC Bioinformatics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Multi-output prediction of dose–response curves enables drug repositioning and biomarker discovery

Hot Topics

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Related Articles

Balancing Act: Pregnancy and Bipolar Disorder

Cohesion at the cellular level: flexible yet stable

Gut bacteria influence responses to immunotherapy in patients with asbestos related cancer

Quick Links

Must Read

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis

Chemistry wordoku #062 | Puzzle

Popular Articles

Zero-shot transfer of protein sequence likelihood models to thermostability prediction

Poisoning medical knowledge using large language models

Semi-supervised recognition for artificial intelligence assisted pathology image diagnosis