Target control of linear directed networks based on the path cover problem

Theoretical resultsIn this work, we propose a novel algorithm for target controllability. The SimpleTarget algorithm introduces several innovations compared to that of the standard maximum matching-based algorithm1, particularly tailored for achieving target controllability in networks. One distinctive feature is the inclusion of self-loops: SimpleTarget adds self-loops to all non-target nodes during a preprocessing step. Unlike the maximum matching-based algorithm1 that focus on controlling entire networks, SimpleTarget specifically identifies the minimum number of driver nodes required for target controllability. This approach not only relies on a simple implementation but also operates efficiently in polynomial time, making it a practical solution for target controllability.In this section, we summarize our theoretical findings, comprising a mathematical theorem (see Methods section for details). This theorem furnishes a rigorous mathematical proof for the correctness and computational complexity of the proposed algorithm. Our algorithm was developed based on the concept of the path cover problem. Before presenting the path cover problem definition, we will introduce the necessary mathematical notation.Let G (V, E) be a directed graph where V is a set of nodes and E is a set of directed edges. A sequence of nodes \(\left( {v_{{i_{1} }} ,v_{{i_{2} }} , \ldots ,v_{{i_{k} }} } \right)\) is called a path if \(v_{{i_{p} }} \ne v_{{i_{q} }}\) holds for all \(p \ne q\) and \(\left( {v_{{i_{p} }} , v_{{i_{p + 1} }} } \right)\) ∈ E holds for all \(p = 1, \ldots , k – 1.\) Similarly, a sequence of nodes \(\left( {v_{{i_{1} }} ,v_{{i_{2} }} , \ldots ,v_{{i_{k} }} , v_{{i_{1} }} } \right)\) is called a cycle if \(v_{{i_{p} }} \ne v_{{i_{p + 1} }}\) holds for all \(p \ne q, _{ }\) \(\left( {v_{{i_{p} }} , v_{{i_{p + 1} }} } \right)\) ∈ E holds for all \(p = 1, . .., k – 1_{ }\), and \(\left( {v_{{i_{k} }} , v_{{i_{k + 1} }} } \right)\) ∈ E holds. Then, the path cover problem is defined as below22.
Definition 1 (Path Cover Problem)
Given a directed graph G(V, E) and a set of target node S ⊆ V , find a set of disjoint cycles and paths with the minimum number of paths that include all nodes in S.
Note that initial nodes in distinct paths should be controlled by distinct external nodes whereas all cycles can be controlled by one external node, as assumed in Refs. 1,20,22. An example of the path cover problem is shown in Fig. 1. The network consists of eight target nodes that can be controlled by two external control nodes \(u_{1}\) and \(u_{2} .\) In this particular example, the target nodes are covered by a combination of two cycles and two paths. Following the methodology described in Ref.1, we assume that all cycles and one path can be controlled by a single external node. Consequently, in this example we only require two external control nodes \(u_{1}\) and \(u_{2} .\)Figure 1Example of the path cover problem. In this figure, filled (red) nodes represent target nodes and dotted nodes (i.e., \(u_{1}\), and \(u_{2}\)) represent external control nodes. In this case, target nodes are covered by two cycles and two paths. As in Refs. 1,20,22, it is assumed that all cycles and one path are controlled by one external node. Therefore, we only need two external control nodes.By utilizing the aforementioned path cover concept, we have devised a novel algorithm named SimpleTarget, which offers a simpler and efficient solution to the target structural controllability problem. The full details of the SimpleTarget algorithm are outlined in the Methods section. Then, by combining this path cover concept with the maximum matching-based controllability approach1, we establish the following theorem:
Theorem 1
Algorithm SimpleTarget solves the target structural controllability problem (in the sense of the path cover problem shown in  Ref.22) in polynomial time.
A detailed proof of the theorem can be found in the Methods section.We have computed the simulation results (state evolution trajectories) to show that the network can indeed be target controlled. This demonstrates that the nodes identified by the proposed SimpleTarget algorithm are the ones that drive the target node states to the desired final state. The detailed analysis is provided in the Supplementary Information file (Supplementary Fig. S1).Computational results derived from data analysisMetrics to evaluate target controlGao et al. primarily focused their analysis of target controllability on artificially generated networks, rather than on experimental biological networks20. Consequently, they adopted two distinct methodologies for selecting a set of target nodes. In one approach, they randomly selected a fraction f of nodes. In a different scenario, they implemented a so-called local schema, which still involved random selection of a fraction f of nodes, but with the additional condition that these nodes must be adjacent. Here f is defined as f = NT/N where NT is the number of target nodes and N the total number of nodes. To gauge the efficiency of target control for a specific fraction f, they introduced the target controllability parameter, denoted as \(\alpha_{D}\). This parameter is defined as \(\alpha_{D}\) = PD/ND, where PD represents the minimum number of driver nodes required to control a fraction f of target nodes, while ND represents the minimum number of driver nodes needed to control the entire network20. As such, \(\alpha_{D}\) serves as a measure of how efficiently target control operates in comparison to full control over the entire network. It quantifies the ratio of driver nodes needed for targeted control to those needed for controlling the entire network. This parameter was employed to evaluate both random and local control schemas in the analyzed scale-free (SF) and Erdős-Rényi (ER) networks 20. In our data analysis, we adopt \(\alpha_{D}\) to assess target control in biological networks. Additionally, we introduce \(\beta_{D}\), defined as \(\beta_{D}\) = \(\alpha_{D}\)/f, to assess how target control efficiency compares to a neutral expectation. When \(\beta_{D}\) is less than 1, it signifies that target control is more efficient than neutral expectation. If target control operates as efficiently as the neutral expectation (\(\beta_{D}\) = 1), it implies that$$P_{D} = fN_{D} ,\,{\text{and}}\,\alpha_{D} = f.$$In our research, our objective is to apply the target controllability algorithm to real biological networks. Specifically, we analyze neural networks from the C. elegans worm12,24,25 and the Drosophila fly insect26, as well as metabolic networks associated with 70 different plant organisms (further details are available in the Methods section)27,28. To achieve this, we must carefully select sets of target nodes related to specific biological functions or pathways of interest.Results for the analysis of connectome networksThe initial segment of our analysis focuses on connectome networks. The neuronal network of the C. elegans worm is composed of 378 nodes and 5256 directed links, and consists of various functional neuron classes, including motor, interneuron, poly-modal, sensory, and muscle neurons12,24,25. As part of our analysis, we designate each of these classes as a set of target nodes for control purposes. Another dataset is derived from the recently discovered connectome of the larval brain of the Drosophila melanogaster insect, comprising a comprehensive network of 2952 neurons and 110,140 directed links26. Similar to the approach taken with the C. elegans worm, we establish specific sets of target nodes based on the functional neuron classification in this network. These classes are further categorized into three major groups: brain inputs, interneurons, and brain outputs. From a target control perspective, it is intuitive to consider the brain outputs as potentially more desirable for control.In the case of C. elegans, when we consider muscles as the target control set, the SimpleTarget algorithm yields a result of \(\alpha_{D}\) = 0.989 (as shown in Fig. 2). Shifting our focus to the Drosophila brain network, we select brain output neuron classes as the target control sets, resulting in the following outcomes: Ring Gland Neurons (RGN) yield \(\alpha_{D}\) = 0.391, Descending Neurons to Subesophageal Zone (DN-SEZ) result in \(\alpha_{D}\) = 0.043, Descending Neurons to Ventral Nerve Cord (DN-VNC) yield \(\alpha_{D}\) = 0.115 (see Fig. 3). These results suggest that the Drosophila connectome appears to be more optimized for brain output neuron control when compared to the muscle class for C. elegans. When assessed in the context of the neutral expectation, DN-SEZ gives a \(\beta_{D}\) = 0.782, indicating a level of control efficiency below one. In the case of C. elegans, the result for muscles a as target control is \(\beta_{D}\) = 3.857, quite above the neutral efficiency threshold of 1. On the other hand, DN-VNC and RGN neuronal classes generate values exceeding one, with \(\beta_{D}\) = 1.88 and 21.39, respectively. While a precise quantitative comparison between the connectomes of both organisms proves challenging due to their intrinsic biological differences, a qualitative assessment suggests that C. elegans demonstrates superior efficiency in controlling input neurons related to sensory classes compared to Drosophila (see brain inputs and sensory classes in Figs. 2 and 3). Intuitively, it seems reasonable that the \(\alpha_{D}\) is for sensory neurons in Drosophila is large since these neurons primary receive stimuli from the external environment and may not have internal input edges. However, in C. elegans the observation of a small value of \(\alpha_{D}\) can be attributed to an information transfer between sensory-sensory connections, which develop a higher abundance of loops or between sensory nodes and other neurons. The prevalence of loops in this particular class may contribute to the enhanced efficiency of input neuron control observed in C. elegans as compared to Drosophila. In contrast, optimization in interneuron control is observed in both Drosophila and C. elegans, requiring minimal external control nodes (see Fig. 4), which is originated by the existence of loops. Furthermore, as discussed above, when evaluating output neurons, the Drosophila connectome appears to exhibit greater efficiency than its C. elegans counterpart.Figure 2Computed \(\alpha_{D}\) and \(\beta_{D}\) metrics for each functional neuron class considered as controllability target set. Notably, muscle neurons display a discernible trend of being accessible to targeting by a set of driver nodes, which is plausible for real-world applications.Figure 3Computed \(\alpha_{D}\) and \(\beta_{D}\) metrics for each functional neuron class considered as controllability target set in the brain of the fruit fly insect (Drosophila). All classes can be categorized into three main groups: input neurons, interneurons, and output neurons. Many classes exhibit controllability by a single external control node (PD = 1) or few nodes, reflecting the existence of loops in the internal network structure. Notably, all three classes belonging to the output neuron group (RGN, DN-SEZ, DV-VNC) can be target-controlled by a larger set of driver nodes. The dashed red line represents \(\beta_{D}\) = 1.Figure 4Visualization of the entire Drosophila connectome with highlighted target control sets (red) and the driver node set (blue). (a) The target control set of interneurons. (b) The target control set of brain output neurons. Notably, the brain output neurons are mapped into more distributed locations and require more driver nodes. In contrast, interneurons are more interconnected, which favors the presence of loops.The elevated \(\alpha_{D}\) observed in sensory nodes of Drosophila is due to their limited number of input edges. Conversely, in C. elegans, the high observed in muscles \(\alpha_{D}\) is a consequence of their sparse output edges. Although the result, a large \(\alpha_{D}\), is the same, the underlying mechanisms differ between the two scenarios. This nuanced contrast, where a high \(\alpha_{D}\) may arise from either a scarcity of input edges or a shortage of output edges (see Fig. 5), adds an interesting dimension to the comparative analysis.Figure 5In the example, both (a) and (b) have PD = 5, ND = 5, so \(\alpha_{D} =\) 1. However, (a) has output nodes as the target, while (b) has input nodes as the target.Notably, it is intriguing to observe that several target node sets linked to functional categories, such as interneurons, within C. elegans and Drosophila exhibit PD = 1. This observation suggests that these target neuronal sets themselves inherently possess controllability. That is, these subsets of neurons contain loops and, therefore, can be controlled by a single external node.To further investigate the topological features of the identified driver nodes, we compared the mean degree of driver nodes < \(k_{D}\) > to the mean degree of each functional neuron class (i.e., target node set) < \(k_{T}\) > in the C. elegans neuronal network and the Drosophila brain connectome. The results shown in Fig. S2 (Supplementary Information) indicate that relatively low-degree nodes are used to control specific target systems. In other works, the driver nodes tend to avoid hubs.Results for the analysis of metabolic networks from 70 plant speciesBy using plants metabolic networks, we may also set as target nodes specific pathways or enzyme classes27,28. Figure 6 displays the \(\alpha_{D}\) and \(\beta_{D}\) metrics computed for plant metabolic networks. The target sets represent specific functional pathways and the enzymes/reactions associated with them. An intriguing trend is observed, suggesting an evolutionary tendency. The \(\alpha_{D}\) and \(\beta_{D}\) metrics appear to increase as we move from the pathways of eudicots/monocots (more modern) to those of basal plants and green algae (more primitive). Moreover, overall, the values of \(\alpha_{D}\) are much smaller compared to those of connectome networks, suggesting an abundance of loops.Figure 6Computed \(\alpha_{D}\) and \(\beta_{D}\) for plant metabolic networks. The target sets are specific functional pathways and the enzymes/reactions involved in them. Colors denote the four major lineages. It seems there is an evolutionary tendency because \(\alpha_{D}\) and \(\beta_{D}\) metrics tend to increase in most pathways from eudicots/monocots (most modern) plants to basal plants and green algae (more primitive).On the other hand, Fig. 7 shows a scatter plot that illustrates the relationship between the \(\alpha_{D}\) metric and f calculated for the full set of plant metabolic networks. In the left panel, each dot corresponds to a pathway, resulting in a total of 70 × 14 data points, with colors representing the main lineage types. The right panel, on the other hand, uses colors to differentiate the data points based on the functional class of the pathways. While it may not be apparent that plants from major lineage groups follow a distinct pattern (Fig. 7(left), the scatter plot that includes indications for functional pathways reveals a more clustered trend (Fig. 7(right)). In other words, pathways with the same biological functions tend to exhibit similar \(\alpha_{D}\) versus f values. Moreover, most of them tend to be more efficient than the neutral expectation \(\left( {\alpha_{D} < f} \right)\), with some samples of specialized metabolism being exceptions.Figure 7Scattered plot between \(\alpha_{D}\) metric and f computed for plant metabolic networks. (Left) Each dot corresponds to a pathway and there are 70 plants so we have 70 × 14 data points. They are colored by main lineage types. (Right) The dots are colored according to the pathway functional class indicated in legend.To get deeper insights into the network features of the identify driver nodes, we compared the mean degree of driver nodes < \(k_{D}\) > to the mean degree of functional pathways (i.e., target node sets) < \(k_{T}\) > in the analyzed plant metabolic networks. Similar to the results observed in neural networks (Fig. S2), the findings shown in Fig. S3 (Supplementary Information) suggest that high-degree nodes are not commonly selected as driver nodes for controlling specific plant metabolic pathways. This observation becomes clearer when comparing the degree distributions of driver nodes with those of the target nodes associated with energy metabolism. As shown in Fig. S4 (Supplementary Information file), the degree distribution of plant metabolic networks across main lineages follows a power-law distribution. This implies that a small fraction of nodes are highly connected (i.e., hubs). Interestingly, our results show that these nodes are typically not chosen as driver nodes (see Figs. S5–S7 in the Supplementary Information). Instead, most sets of driver nodes are small in size and tend to exhibit low or medium-degree values.

Hot Topics

Related Articles