BioPipeline Creator—a user-friendly Java-based GUI for managing and customizing biological data pipelines

The BioPipeline Creator
BioPipeline Creator works on the basis of a client/server architecture (Fig. 1). This concept concentrates on the task of installing and configuring tools onto the BioPipeline Creator server. This approach prevents BioPipeline Creator client users from having to install their devices or customize their operating systems in order to use the tools required for their analyses.Figure 1Processing mode used in the BioPipeline Creator: In this architecture, a message exchange takes place in which data and processing instructions are sent to the server, and once completed, the result is sent back to the client.An important aspect of this processing methodology is its ability to be configured for remote server access, enabling ex situ user utilization. This implies a more flexible use, since users, in exceptional circumstances, are not required to be physically present on the same local network to leverage the functionalities of the BioPipeline Creator. More comprehensive information about this subject is available in the user guide.Figure 2 shows the main interface of BioPipeline Creator, which allows users to design a customized version of their pipeline. In the example given, the tools NGSReadsTreatment, which is responsible for removing redundancies in the reads obtained during sequencing, SPAdes, which assembles the genome, followed by the sorting of the contigs with Mauve, and finally Prokka, which is responsible for the annotation of the genome, are registered in BioPipeline Creator.Figure 2BioPipeline Creator main window: this screen displays the main tasks performed by the software according to the user settings.In the “Tools” section, the available tools of the user repertoire are listed, and to use them in the pipeline, the user can simply select them and drag them into the “Pipeline” workflow section, as shown in Fig. 2. In this way, the user has complete control over the order in which the tools are used to process their data. If a tool outputs a result during the data processing steps that is incompatible with the subsequent processing step, the user will receive a notification about this output-input incompatibility.Within the same window, there is a data input area to receive data for processing. The format of the input data can vary depending on the tool selected. In the scenario shown in Fig. 2, it is a paired-end read library in FASTQ format that is to be processed. It is important to note that the user must enter a reference file in the “Reference File” field if the tool requires one.BioPipeline Creator provides two options for obtaining the results of the analysis: “FullResult” and “LastResult”. This allows the user to decide whether he wants to receive the results of all steps with all intermediate files generated in the constructed pipeline or only the result of the last tool selected for processing.Figure 3 illustrates the windows for adding tools to the BioPipeline Creator repertoire. In the window, the user has the option to add tools that are installed on his workstation or server to make them available to other users. With this setting, the user can specify the executable file of the tool, e.g. “/opt/SPAdes/bin/spades.py”, along with its parameters and default values. It is important that this setting is saved in the BioPipeline Creator database so that the setting step does not have to be repeated later.Figure 3Adding tools to the BioPipeline Creator repertoire: the top screen shows details about adding tools to be available to the user and setting parameters and their values.Access to the repertoire window for adding new tools to BioPipeline Creator is protected by an encrypted password. This protection aims to maintain the integrity of the information already added to prevent unauthorized access and accidental changes. When accessing this menu, the user is prompted to enter the administrator password for the tool, which is initially set to “admin” and can be changed by the user when using BioPipeline Creator for the first time. However, this does not mean that BioPipeline Creator is inaccessible, as the guest user can add parameters and change parameter values without access to the administrator session. However, these added parameters or changed parameter values are only valid for the current project.On the project creation screen, the user has the option to create a new project in which all added tools can be listed and integrated into this specific project. The parameters and values previously entered on the previous screen can also be customized for a specific project. In addition, the user can enter new parameters and values that are specific to the project in question.An additional feature implemented in BioPipeline Creator is the ability to monitor the progress of the user’s pipeline steps directly via a mobile Telegram messenger. This increases dynamism and eliminates the need for the user to be physically present in front of their device to follow the processing phase of their analysis. Further details can be found in the BioPipeline Creator user manual.Although the tools within BioPipeline Creator remain unchanged, the results of the analysis are consistent with those obtained by running the tools independently of BioPipeline Creator. Nonetheless, the introduction of BioPipeline Creator has several advantages, including (1) the possibility for users on a local network to use bioinformatics tools without the need for local installations on each device and the centralization of software and library management; (2) the ability for beginners and advanced users to create and run custom pipelines, (3) improve the accessibility of tools for users with limited computer skills but with omics analysis needs; (4) facilitate long-term analysis customization and reproducibility by adding new tools and changing parameter values for each tool, (5) provide server configuration options for remote BioPipeline Creator runs, if required; (6) introduce the option to add new tools to a personalized and shareable repertoire, thereby expanding execution and application possibilities in a research group; and (7) enabling convenient monitoring of analysis progress within the pipeline through BioPipeline Creator’s graphical user interface or receiving real-time updates on smartphone via Telegram, ensuring seamless tracking of processing status. Compared to currently available web-based tools, the BioPipeline Creator (8) does not require the user to compete with a long queue of jobs, reducing the overall time required to process the analysis. Therefore, users also won’t deal with storage and processing restrictions, being the local resources availability the only limitation experienced; (9) the user is able to edit the pipelines, adding new tools and additional parameters and parameter values directly from the database via the configuration menu (config). When adding parameters, the user must have the administrator password, which prevents accidents such as deleting previously added parameters or tools. Adding them to the database, it can be integrally shared with other users, while parameters added directly to the current project are only valid for that specific project/user.It is important to emphasize that the BioPipeline Creator Client is compatible with Windows, Linux, and MAC operating systems. The server requires a Linux system, and the client can run on simpler devices or workstations with the mentioned OS, allowing users to keep their operating system without the need for a dual-boot installation or migration to another compatible system.With BioPipeline Creator, users gain access to their personal toolbox of bioinformatics tools that are robust, easy to use, and versatile. This opens countless possibilities for the creation of data analysis pipelines.Scalability testIn order to evaluate the scalability of running the BioPipeline Creator, ten workstations with the client version were used, five of which used the Windows 10 operating system and two of which used the Ubuntu 22.10 operating system. The server, which is responsible for running BioPipeline Creator_Server and processing the pipelines, is equipped with an Intel® Xeon® Silver 4214 R CPU @ 2.40 GHz×24, has a hard disk capacity of 3.5 TB and runs the Ubuntu 20.04.4 LTS operating system.The extended Docker version of the BioPipeline Creator_Server was used, which offers flexibility in customizing the number of clients and the number of parallel tools. These configurations can be changed in the “config.properties” file under the parameters “maxClients” and “maxParallelTools” The aim of the test was to simultaneously process a pipeline that included removing duplicate reads from the input dataset (NGS reads), assembling the treated reads (SPAdes) and subsequent genome annotation (Prokka). While the same dataset was used for all available workstations, each pipeline was treated as a separate processing instance to ensure that the repetition of datasets did not affect the test results.Details regarding the SRA number, genomic library, and dataset size can be found in Table 1, 2. The entire test was concluded within 52 min, 15 s, and 19 ms.Table 1 Summary of tools: list of all tools that have been added to the BioPipeline Creator version 1.0 database.Table 2 List of Organisms: the list presents information regarding the validation datasets, such as the SRA number, the dataset size (in the case of paired data, the sum of the two tags), and the genomic library type.It is worth mentioning that the installation of BioPipeline Creator_Server can be performed on any workstation or server as long as it has the appropriate hardware configurations to run the tools selected by the user. The resource requirements, such as RAM, hard disk space and processor, vary for each tool and are usually described in the respective manuals and can be set by the user. Consequently, resource limitations or requirements are not directly related to BioPipeline Creator, as its purpose is to provide users with the flexibility to aggregate omics tools and create personalized pipelines. If BioPipeline Creator_Client is used in a local network, it can also be installed on other users’ machines, simplifying access to the repertoire tools without the need for a separate installation on each local machine.To determine the minimum hardware requirements needed to run the BioPipeline Creator Client, tests were performed on a laptop with the following configuration: Intel i5 processor, 4 GB RAM, 320 GB disk space, operating system can be Windows 10+ or Linux.

Hot Topics

Related Articles