In the subsequent sections, we present the detailed description of the architectures used in the experiments, selection of problem sizes and benchmark application.Description of the architectures used in the experimentsIn our research, we conducted experiments using a set of diverse computing architectures representative of the hardware commonly found in modern high-performance computing environments. Selected architectures were chosen based on their popularity, computational power, and widespread use in research and industry. Centos 7.9 is the running operating system, and Interconnection is IB (QDR) InfiniBand.The selected architectures include:Central Processing Units (CPUs)CPU architecture consists of a total number of 20 nodes. Each CPU node has specifications as in Table 1.Graphics processing units (GPUs)We employ GPU node in Table 2 with many processing cores and high memory bandwidth to accelerate compute-intensive applications. These GPUs support CUDA and OpenCL programming models, allowing for efficient execution of parallel tasks.Many integrated cores (MIC)We deploy MIC node in Table 3. By integrating multiple cores onto a single chip, MIC processors offer massive parallelism and high computational efficiency, making them valuable tools for tackling complex computational challenges in various domains.Selection of problem sizes and benchmark applicationTo ensure the comprehensiveness and realism of our experiments, we carefully selected a range of problem sizes and benchmark applications that encompass a variety of computational complexities and reflect real-world scenarios.Problem sizesWe chose problem sizes spanning a wide range, from small-scale to large-scale, to represent various computational workloads commonly encountered in high-performance computing environments. Small-scale problems provide insights into the performance of architectures when handling lightweight tasks, while large-scale problems challenge the capabilities of the computing resources for more intensive computations. The problem sizes were carefully chosen to be representative of typical data sizes and task complexities found in scientific simulations, data analytics, and machine learning applications.Benchmark applicationDNA Motif Finding Problem (MFP) was selected as a benchmark application as the number of independent tasks grows exponentially. The motif finding problem is a computational biology challenge focused on identifying short, recurring patterns called motifs within a set of related sequences, such as DNA, RNA, or proteins. These motifs are biologically significant and play essential roles in various cellular processes. The problem involves finding a motif of fixed length that appears in each sequence of the input set with few or no mismatches, subject to additional constraints like the maximum allowed number of errors. In this study we use classical MFP as a bioinformatics problem that represents different computational domains and that leverage the strengths of the architectures under study. By using MFP, we aimed at evaluating the performance of each architecture. In motif finding problem, given a set of sequences, each of the same length, the goal is to find a motif of fixed length (usually relatively short) that occurs in each sequence of the set with few or no mismatches9,10,11,12,13,14.Motif finding algorithms can be categorized into two major groups, exact and approximate solutions10,11,13.
Exact solution15,16,17,18:
Apply exhaustive enumerations.
Guarantee global optimality.
Examples: Brute force, Skip Brute force, Recursive Brute force (R-BF).
Approximate solution:
Based on probability.
Apply some form of local search.
Doesn’t guarantee global optimal solution.
Examples: Gibbs sampling and Expected maximization (EM).
Table 1 Regular CPU-based Compute Node.Table 2 NVIDIA CUDA compute Node.Table 3 Xeon Phi Compute Node.Key components of the motif finding problema. Input: The input to the motif finding problem consists of a set of sequences (\(\:S\)), usually represented as strings of (N) characters over an alphabet (A, C, G, T for DNA sequences or A, C, G, U for RNA sequences, etc.), permitted mutation (d) and the desired motif length (L). The motif length indicates the number of characters (nucleotides or amino acids) that the motif should consist of.b. Output: The output of the motif finding problem is the discovered motif, which is a string of characters of the specified length (L) that appears as a subsequence in every sequence of the input set (\(\:S\)), allowing for a certain number of (d) mismatches or errors (permitted mutation).
Accordingly, MFP has 5 dimensions D1, D2, D3, D4 and D5:
• \(\:{D}_{1}\) is a set of sequences (Input sequence number) (\(\:S\)): This parameter represents the total number of sequences in the input dataset. As the number of sequences increases, the search space for potential motifs also grows, making the problem more complex and computationally demanding.• \(\:{D}_{2}\) is the length of each string (Input sequence length) (N): The sequence length refers to the number of characters (e.g., nucleotides or amino acids) in each individual sequence. Longer sequences can increase the problem size, as the potential number of motif occurrences grows with the sequence length.• \(\:{D}_{3}\) is a Motif length (L): The motif length is the number of characters in the motif pattern that algorithm aims to discover. Larger motif length generally increases the computational challenge, as longer motifs are less likely to occur frequently in the sequences.• \(\:{D}_{4}\) is permitted mutation (d): Larger permitted mutation generally increases the computational challenge. This parameter has no effect in case of using Brute Force, where all \(\:\left({4}^{{D}_{3}}\right)\) Lmers are extracted.• \(\:{D}_{5}\) is a set of DNA alphabet (A, C, G, T)c. Scoring: In some variants of the problem, scoring functions are used to quantify the quality of a candidate motif. The objective is to find the motif that maximizes the scoring function.d. Computational Complexity: The motif finding problem is NP-hard, which means that finding an exact solution for large input sets is computationally infeasible in polynomial time19,20,21,22,23,24,25.In this research, Brute force is applied, as an exact solution, to solve the Motif finding problem with the following dimensions: \(\:{D}_{1}=20\:\)strings, \(\:{D}_{2}=600\:\)characters/string, \(\:{D}_{3}=15\) characters/motif, \(\:{D}_{4}=4\) permitted mutations and \(\:{D}_{5}=4\) in case of DNA alphabet (A, C, G, T) set. Brute force extracts all \(\:\left({D}_{5}^{{D}_{3}}\right)\) that results in (415) possible L-mers to be compared with all extracted motifs of each string sequence to find the motif that optimizes the scoring function.Accordingly, MFP solver has a total number of comparison operations as in Eq. (1):$$Total{\text{ }}Number{\text{ }}of{\text{ }}comparison{\text{ }}operations{\text{ }}=~\,\left( {D_{5}^{{{D_3}}}} \right)\,\, * {D_1}\,\, * \,\left( {{D_2} – {D_3}\,+\,1} \right)$$
(1)
$$= \left( {4^{{15}} } \right)\, * \,20\, * \,\left( {600 – 15 + 1} \right)$$Where \(\:{D}_{3}=15\) characters per each L-mer, \(\:{D}_{2}=600\:\) characters per each string, and \(\:{D}_{1}=20\:\) stringsReal-world relevanceThe selected benchmark application has real-world relevance and is widely used in research and industry. It is used to evaluate the performance of high-performance computing systems. Using this widely recognized benchmark application allows us to draw meaningful comparisons with other studies and assess the practical implications of our architecture-aware scheduling approach.Replicability and comparabilityThe choice of problem sizes and benchmark application ensures that our experiments are replicable by other researchers, enabling direct comparisons and validation of our findings. Replicability is crucial for establishing the reliability and generalizability of our results.In the following section, we present measurement results for each selected problem size to deduce the minimum sample size of tasks and actual execution time of a single task for each architecture. The collected data serves as the foundation for our architecture-aware scheduling approach, enabling us to optimize resource utilization and minimize job completion times for large-scale problems in high-performance computing environments. The comprehensive evaluation of the proposed approach against the selected benchmark provides valuable insights into its effectiveness and practicality for real-world applications.