ENCORE: a practical implementation to improve reproducibility and transparency of computational research

We presented ENCORE as a practical implementation, based on eight main requirements (Methods) and four practical principles (Supplementary Method 1, Section 2), guiding researchers on how to structure and document computational projects according to published guidelines5,19,31,32 in order to improve transparency and reproducibility, and to improve harmonization within and across research groups. ENCORE does not consider replicability (sometimes referred to as repeatability), which is about strengthening scientific evidence from replication studies by other research groups using independent data, and experimental and computational methods43,44,45,46. An important aspect of ENCORE is the integration of all project information (concepts, data, code, results, documentation) in a single directory structure that can easily be shared and archived. Although we didn’t have a pre-ENCORE baseline measurement for reproducibility in our group, ENCORE harmonized the way we work in a broad range of projects and provided a big step forward in terms of the organization, transparency, and reproducibility of our projects. Integration and documentation to achieve transparency have also been referred to as the third dimension of open science47 and are in our view key to reproducibility of computational research.The most important lessons learned from ENCORE are summarized by the following five points:

A significant barrier to enhancing transparency and reproducibility is the lack of incentives for researchers to invest sufficient time and effort in these areas.

Successful implementation requires incremental steps to minimize disruption to the current ways of working of individual researchers, along with regular discussions and evaluations to monitor and maintain progress.

Harmonizing the approach within a research group facilitates the joint development of best practices for reproducibility and makes the inspection and use of projects from colleagues much easier.

The next version of ENCORE should explicitly incorporate best practices for software engineering and methods to preserve the computing environment.

Further development of ENCORE should involve research groups from diverse domains for further evaluation, improvement, and extension.

Although ENCORE has been developed and tested from the perspective of bioinformatics and computational modeling in the cellular and molecular biology field, it can be applied in other scientific disciplines to virtually any type of computational project: it is agnostic to the computational infrastructure, and it can be used with any programming language or software tool the researcher is using. We emphasize, however, that ENCORE does not neglect existing tools that contribute to the further improvement of reproducibility. ENCORE users are encouraged to utilize complementary tools that enhance reproducibility, including those for (i) preservation of the compute environment, (ii) software development, (iii) workflow management, (iv) (software) documentation, and (v) project management. ENCORE does not impose these tools on researchers; instead, it allows them to choose the most suitable options for their specific needs (Supplementary Method 4). However, for many researchers, changing their individual and favored project organization to the ENCORE sFSS may provide a barrier. Currently, we are taking initial steps to evaluate and further develop ENCORE in collaboration with other research groups. For this, the ENCORE documentation is essential for introducing new ENCORE users to the underlying philosophy and approach and to provide background on the structure and desired level of project documentation. Over the past few years, we have experienced that it is crucial to use ENCORE from the start of a project and to have sufficient self-discipline to keep everything up to date. For project documentation, we follow the guiding principle that it should be at a level of detail that one’s peer or supervisor should be able to understand all aspects of the project (concepts, data, code, results). Early versions of (prototype) code, trial results from data analyzes, etc. that are not expected to be kept in the final compendium can be excluded from documentation to minimize overhead. Nevertheless, in practice, the level of documentation detail often leads to discussion and over the years has led to changes in the instructions and template in the README files. In the future, emerging AI-based tools might assist to automatically generate project documentation from, for example, rough notes, or audio/video recordings of project meetings.Different areas of reproducibility have been distinguished by Stodden40. (i) Empirical reproducibility refers to physical experiments and the (reporting) standards associated with these. (ii) Computational reproducibility is concerned with the reproduction of results using the same data, computational methodology, and software versions. (iii) Statistical reproducibility focuses on the correct use of experimental design (including sample size calculations) and statistical analyzes. (iv) Ethical reproducibility refers to reporting ethics methods in biomedical research48. ENCORE focuses on computational reproducibility and statistical reproducibility, at the same time considering that documentation about the physical experiments can be important for the computational analyzes. Ethical reproducibility may come into play for specific (artificial intelligence) applications e.g.49, but currently is not explicitly considered by ENCORE.ENCORE promotes a pre-defined directory structure, integration of data, methods, and results, and detailed project documentation. This supports the continuity of research lines which are sometimes compromised by the continuous flux of scientific personnel in academic groups. ENCORE contributes to transparency and, as such, helps to detect software errors and conceptual methodological flaws by (external) researchers, supervisor(s), and reviewers. Transparency allows project supervisors to provide timely and more constructive feedback. Co-authors of a manuscript can more easily inspect details of the project before manuscript submission to, for example, comply with the ICMJE authorship rules50. A recent editorial in Nature Human Behavior51 proposed that software be part of the peer review process. ENCORE would facilitate such efforts since, in principle, it provides user documentation to install and use the software and datasets. Generally, it would be very time-consuming and difficult to check the code itself, but reviewers could at least check if the results presented in a paper can be reproduced by executing the software.Increasing reproducibility, what is the problem? The main hurdle to increasing the reproducibility of computational projects is neither the lack of guidelines nor substantial technical barriers. However, despite all discussions and initiatives concerning reproducibility, and our personal observation that many researchers agree on the importance of reproducibility, there still is a long way to go. Regularly, during the development of ENCORE, group members brought forward various arguments for not following (ENCORE) guidelines. For example, one argument being that it consumes time while we are virtually never asked by peers, reviewers, or funding agencies whether our research is reproducible. In fact, there is often no penalty for being non-reproducible and, moreover, there also is no clear reward for working reproducibly. Markowetz gives several examples of benefits of working in a reproducible manner35, but he also encountered resistance from researchers when advocating reproducible research such as “I’d rather do real science than tidy up my data”. Indeed, an often-heard argument concerns the amount of overhead that comes along with (ENCORE) reproducibility approaches. However, for a typical research project, the time spent on following the ENCORE approach (e.g., documentation, structuring) is in our view negligible compared to the time spent on the actual research. There are clear advantages of working reproducibly, which has also been argued by other groups (e.g.4,19,52. For example, reproducibility is important for trust in science, it helps writing a publication, it will save time in the long run, and it supports the sustainability of the research. Although all true, for some researchers these arguments do not seem to provide enough incentives to improve their practices. Bottom line is that reproducibility requires intrinsic motivation, a dedicated working attitude, and self-discipline, or otherwise can only be enforced with penalties and/or rewards. Indeed, it is well-recognized that the way in which we reward science should be changed. For example, the Declaration on Research Assessment (DORA) is a global initiative that proposes to change the evaluation of researchers and scholarly research output by funding agencies, academic institutions, and other parties53. At a broader level, UNESCO is also discussing the costs and benefits of open science and the incentives or motivations for open science54. Complementary, at the national level similar initiatives emerged. For example, the Dutch public knowledge institutes and research funders wrote a position paper that, among others, encourages Open Science55. The recently initiated Dutch OpenScience.nl organization will contribute to the implementation of some recommendations in this position paper. In addition, projects like Osiris aim to identify incentives for reproducibility by stakeholders, and embedding reproducibility in research design56. Due to such initiatives, we are slowly witnessing a change in the reward system, and we expect that this will largely contribute to more reproducible research. Stodden and co-workers suggested to have journals and/or professional societies discern a yearly award for (young) investigators for excellent reproducible practice52. Another proposal is to use scientific reproducibility as a litmus test for deciding between paper correction versus retraction57. Adoption of this guideline by journals would strongly support scientific reproducibility. ENCORE perfectly aligns with these suggestions and contributes to building the researcher’s scientific track record. An ENCORE project compendium can be shared through public repositories and assigned a DOI. We recently submitted the first ENCORE project to Zenodo42 and most of our future publications will be accompanied by an ENCORE project. In addition, we are preparing a publication describing specific challenges encountered when using ENCORE for a benchmark study for spatial transcriptomics deconvolution methods.We consider ENCORE to be a step towards reproducible science, but it is not without several limitations and weaknesses. First, ENCORE is a compromise based on previous ways of working and, therefore, may not always fit the preferred way of working of an individual researcher. However, we believe ENCORE provides sufficient flexibility to accommodate most researchers and research practices. Second, the current sFSS Navigator has limited functionality, ways to present information and configuration options. We are in the process of improving the Navigator, which includes (i) configuration of panel locations (including the possibility to undock to a full window), (ii) Improve the presentation of figures and tables. (iii) improved possibilities to browse results and link results with code and data. (iv) Ensure proper formatting of text-based information (e.g., markdown, code) while ensuring that all relative hyperlinks work in the context of the Navigator. Third, a shortcoming of ENCORE is the lack of and clear and easy approach to specify explicit links between results, code, data, and concepts. Currently, such links are imposed by the sFSS structure, the paths in the code, and/or manually added links specified in the documentation. The documentation has an important function in gluing the project parts together, but it requires effort from the researcher to specify relations between parts of the project, and to maintain this information. ENCORE would benefit from improved integration approaches to enhance transparency and reproducibility. One possible approach is the use of JSON58 or YAML59 to annotate links between items in a project compendium as demonstrated by Spreckelsen and co-workers31. Fourth, another challenge that is only partially addressed by ENCORE concerns the preservation of the full computing environment. This environment is defined by (interdependencies of) the operating system, software tools, versions and dependencies, programming language libraries, etc. Gruning and co-workers proposed a software stack of interconnected technologies to preserve the computing environment33. This stack comprises (i) (Bio)Conda60,61 to provide virtual execution environments addressing software versions and dependencies, (ii) container platforms such as Docker62, Apptainer (formerly Singularity)63, and Podman64 to preserve other aspects of the runtime environment, and (iii) virtual machines using cloud systems or dedicated applications such as VMware, to overcome the dependencies on the operating system and hardware. We are currently investigating how to best approach this within the context of ENCORE but we are already actively using Anaconda and renv65 to manage Python libraries and R packages respectively. Reproducibility can further be improved by using scientific workflow systems, which have been developed to modularize and execute steps in computations. Many workflow systems are available, including Galaxy66, KNIME67, Snakemake68 and Nextflow69. These workflow systems improve reproducibility and help to maintain and share computational analyzes. They also allow incorporation of steps that otherwise would have been performed manually. Our group has used workflow systems in the past (e.g.70,71), but we decided not to make a workflow system an obligatory part of the ENCORE approach since we believe this may be too disruptive for some researchers. Yet, we encourage our group members to use a workflow system of choice. However, workflow systems are not the holy grail since these don’t solve the versioning problem of their (remote) components. In addition, the workflow system itself may become outdated, remote webservices may no longer be available, or older workflows may not run with newer versions of a workflow management system. Fifth, ENCORE projects can easily be shared but currently we do not have a good mechanism to remove parts that should not be shared such as sensitive (patient) data, controlled access data obtained from public repositories, or PDF copies of copyright protected scientific publications. Finally, ENCORE requires that all data used by the computations are within the sFSS structure. For large datasets this implies that sufficient storage space must be available on the computer that hosts the project, which can be a local (private) desktop computer, computer servers of the research institute, or remote (cloud) compute systems (e.g., as provided by Amazon Web Services; Fig. 3). If data storage within the sFSS is not possible then documentation and/or software should be available to retrieve the data from another persistent location and to ensure that this step does not break reproducibility.Over the past two decades, an increasing number of biomedical researchers have become involved in computational research. Many of these researchers have never been formally trained in scientific computing and software engineering (e.g., design, programming, documentation)72,73, software version control25,27, the use of high-performance computing infrastructures, the use of Unix/Linux which is still the major platform for scientific computing, algorithm design, the use of (Jupyter, R) notebooks74, etc. Lack of such skills may negatively affect reliability and transparency of software and, consequently, reproducibility. For example, software may be poorly designed and documented, making it difficult to understand, use, modify, and debug. One resulting problem is that we have no way of knowing whether the code being used to generate the computational results is doing what the researchers think it is doing. This is one reason why ENCORE proposes to start organizing and documenting from the start of a project, since this increases the chance that conceptual errors or software bugs are detected in an early stage by the researchers or their supervisors. Software engineering is a discipline in its own and includes the design, implementation, documentation, testing and deployment of software. Following best practices for scripting, functional programming, or object-oriented programming may significantly improve the quality of the code but requires training and experience. The use of integrated development environments, automated quality checks, and (unit) testing would also help to improve software75. Furthermore, Large Language Models will increasingly play a role in software development, testing, and documentation76,77. Often, software documentation leaves much to desire. In a recent report, it was concluded that researchers are generally not aware for whom they write documentation and what documentation is required73. Currently, ENCORE does not provide specific instructions for coding style (e.g., PEP 8 for Python and tidyverse for R78,79) and documentation design because it is probably more effective to train scientists in the art of software engineering. Instead, general guidelines are provided in a README file (Supplementary Method 3). Awareness of guidelines (e.g.80) and tools to (automatically) generate documentation such as Sphinx81 for Python, and r2readthedocs82 and roxygen283 for R, will also help to improve reproducibility. We used Sphinx for the documentation of the sFSS Navigator42. Another useful resource is the software management plan developed by the Netherlands eScience Center and the Dutch Research Council (NWO)84. In general, appropriate training on reproducibility approaches could already significantly improve the current situation and will at least create awareness of the tremendous amount of literature about many aspects of sound scientific computing practices20,39,85,86,87,88. In addition, senior researchers should strongly promote reproducibility and support, explain, and instruct junior researchers.To facilitate and improve reproducibility throughout the complete research lifecycle (Fig. 1), numerous guidelines, policies, and standards have been developed89 to guide and facilitate the detailed documentation of all steps. Many reporting guidelines provide structured tools specifying the minimum information required for specific study types and largely contribute to the transparency, understanding, and reproducibility of a study90. Examples include guidelines for clinical trials (CONSORT)91, diagnostic accuracy studies (STARD)92, and observational studies (STROBE)93. Virtually all these minimum information standards and reporting guidelines require the specification of statistical and computational methods that were used in a study. However, precise requirements to specify such methods are often lacking. In addition, an increasing number of scientific journals have their own guidelines. One example is the Nature Reporting Summary94 that partially relies on existing reporting guidelines and FAIR. Nature also has a specific software and algorithm policy with requirements about availability (e.g., using GitHub or Zenodo), use of an open-source license10, and code review95. Nature Computational Science and several other Nature journals have adopted Code Ocean, which is a cloud-based platform to share executable code to enable review96,97. The PLOS Computational Biology journal requires that editors and reviewers can access the software, reproduce the results, and run the software on a deposited dataset with provided control parameters98. The Science journal TOP guidelines require data and computer code to be available99. Interestingly, a study published in 2018 showed that despite these guidelines, many computational studies were not reproducible17. It has also been suggested to rethink the concept of a “journal” as a community-driven information repository containing, among others, the data and software, to enable reproducibility, reuse, and comparisons100. In this scenario, academic publishers would have a key role in stimulating (standard-based) approaches to research dissemination. We believe that such an effort should be a joint undertaking of research communities and publishers aiming to improve reproducibility. Computational projects require their own specific guidelines and standards to guarantee transparency and reproducibility. This was also recognized by the artificial intelligence community, which started initiatives to develop specific AI-oriented guidelines101,102,103. To the best of our knowledge, there are not many (practical) reporting guidelines nor standards available for computational research that are routinely used in practice. Nevertheless, there are various initiatives to improve this situation. For example, the use of directory structures to organize research projects has been proposed in the past, However, in general these initiatives do not provide specific templates or are limited to a specific programming language20,36,104. In particular, the approach presented by (Marwick, 2018) is useful for R-based projects and uses the ‘rrtools’ package to setup a project compendium suitable for writing a reproducible manuscript. It supports Quarto (an open-source scientific and technical publishing system), Docker, package versioning using renv, and integration with GitHub Actions. ENCORE is inspired by the standardized file system structure that was proposed by Spreckelsen and co-workers31. Their file system structure comprises four top-level directories denoted as categories (Experimental data, Simulations, Data analysis, and Publication). Each of these categories contains subdirectories to hold specific projects in which other subdirectories may exist. This implies that, in practice, each of the four top-level directories will contain subdirectories and files related to multiple projects. The subdirectories and files are manually annotated and connected using YAML headers (key-value pairs) in README Markdown files, which allows, for example, to trace simulations and data belonging to a specific project. The ENCORE sFSS is project-oriented, which facilitates sharing with peers without first having to reassemble a project. At the same time this reduces the need to manually annotate and define relationships, since these are implied by the sFSS structure. Nevertheless, ENCORE would also benefit from integration approaches such as YAML to improve transparency and reproducibility. However, ENCORE does not yet require the use of YAML since it may require too much effort from researchers to specify and maintain. Like ENCORE, the approach of Spreckelsen et al. has a strong focus on transparently organizing and linking data, code, and publications to support reproducibility. However, there are no further requirements except that one should be able to re-run code from within their file system structure. In addition, there are no requirements by design for further user and code documentation and the use of GitHub. Compared to the approach of Spreckelsen et al., ENCORE also aims to translate published guidelines into specific instructions and templates in, for example, the README Markdown files to guide researchers in making their computational work reproducible. This is supported by providing our sFSS as a template together with predefined files. In contrast to the file system structure proposed by Spreckelsen, the ENCORE sFSS is much more detailed. ENCORE goes beyond a structured file system and through regular internal and external evaluations we aim to gradually improve the ENCORE approach to improve reproducibility. Other examples of guidelines and standards for computational projects include the application of the FAIR principles to software105,106, the ICERM implementation and archiving criteria for software52, the Adaptive Immune Receptor Repertoire (AIRR) software guidelines107, the Software Ontology to describe software used to store, manage and analyze data108, and the EDAM ontology to describe bioinformatics operations109. For simulation-based research, there are initiatives like the Minimum Information About a Simulation Experiment (MIASE) guidelines110, the corresponding simulation experiment description markup language (SED-ML)111, COMBINE/OMEX to share and reproduce modeling projects112, and a range of others113. For the further development of ENCORE, we will need to consider which of these standards are relevant for ENCORE and how to incorporate them in the ENCORE approach. This may require the development of ENCORE specifications for different types of computational projects by different specialized working groups. However, the main challenge we see is the development of software tools to support and use ontologies and standards in the context of ENCORE without introducing much overhead while providing clear benefit.Currently, the focus of ENCORE is on human readability and not machine readability since most documentation will be provided in any of the common file types (e.g., plain text, Markdown, Word, PowerPoint and PDF), which has obvious advantages for the researcher. However, the additional use of text-based standards such as JSON, YAML, and RDF114, would allow to (partially) document projects in a machine readable manner and provide several benefits. For example, one might more easily search for specific information within a project compendium, specify links between items in a compendium, and to evaluate compliance of the project organization and documentation with the ENCORE requirements. To advance ENCORE towards a machine-readable compendium, one approach would be to associate subdirectories and/or files with JSON annotation (meta-data) and, where possible, to provide part of the documentation directly in a JSON file. This approach would still allow the researcher to use the file type of choice. Alternatively, one may integrate YAML annotation directly into Markdown files as proposed by Spreckelsen115. Finally, realizing that these machine-readable formats can be rendered by front-end GUI applications (e.g., as text fields for users to enter), yet another approach would be the development of interactive tools for users to set up and populate the documentation in their projects. This would remove much of the burden from the researcher to specify and maintain the JSON/YAML/RDF annotation throughout a project.To promote reproducibility practices using ENCORE, automating specific steps becomes invaluable in lightening the load on researchers. Leveraging AI-based tools can play a pivotal role in tasks such as code documentation or testing, summarizing project meetings, ensuring compliance with ENCORE specifications, and using machine-readable files effectively. These automation efforts streamline workflows, enhance accuracy, and ultimately contribute to the robustness of research endeavors. We have created a separate GitHub repository (https://github.com/EDS-Bioinformatics-Laboratory/ENCORE-AUTOMATION) to which we will gradually add tools to automate specific tasks. Currently, we provide scripts to automatically setup an ENCORE project. In addition, we provide a script that generates an ENCORE template for B-cell/T-cell repertoire sequencing experiments116, which is prefilled with code and documentation. An example of such generated template is found on GitHub (https://github.com/EDS-Bioinformatics-Laboratory/ENCORE-AIRRseq-TEMPLATE). A similar template is currently being developed for lipidomics analyzes117.

Hot Topics

Related Articles