Technologies and Methods

Container Technologies

Addressed challenges: Availability & Accessibility, Documentation, Dependency Hell & Software Evolution

Container technologies provide operating system-level virtualization. In contrast to virtual machines, they are bound to the host operating system’s kernel. While this approach creates a dependency to the operating system, it has less overhead and therefore offers faster startup times and better performance compared to virtual machines. In the context of SURESOFT, we employ the specific container technologies Docker and Singularity [35], [36]. We use Docker especially in combination with continuous integration while Singularity is used in high performance computing (HPC) environments e.g. HPC clusters due to its easier integration with the Message Passing Interface (MPI) and general purpose graphics processing units (GPGPUs). The two container technologies play an important role in the context of reproducibility. By introducing an image format that allows the definition of environment templates that contain all the required components to run a software, they provide a solution to the challenge of missing dependencies. Since these images can be exported to simple file archives, they can also be uploaded to appropriate archiving platforms, allowing other researchers to easily find and access the software. Moreover, the environment definitions for images are plain text files that can therefore also serve as a very basic form of documentation in regard to installation and execution of the software [24].

Version Control

Addressed challenges: Availability & Accessibility, Documentation, Collaboration & Versioning

Version Control Systems allow developers to track and manage changes made to their source code files and commit them to a source code repository. With each commit the current state of the software is recorded and provided with a unique identifier, documenting the changes made to the source code over time and allowing developers to roll back to previous versions if something turns out to be wrong. In addition, collaborative development is supported by using a centralized source code repository as a means for synchronization. Popular choices for hosting repositories are platforms like GitLab or GitHub that can help make the software easily available and accessible for contributors and users. At TU Braunschweig, the IT center hosts our own GitLab instance.

Continuous Integration

Addressed challenges: Documentation, Software Quality & Design, Collaboration & Versioning

Continuous integration (CI) is a development practice that was introduced as part of Kent Beck’s Extreme Programming [37]. As the name suggests, it aims at integrating newly developed code often and in short intervals into the main production code. With every integration, the entire system is tested by an extensive suite of automated tests that provide rapid feedback to the developers if an integration has compromised the application’s functionality. The short integration cycle plays an important role in this context. Keeping the cycle short ensures that the amount of changes committed to the main code line is low. Therefore, if the test suite fails, locating the defect is easier. Of course, for this procedure to be really effective the presence of the aforementioned extensive and reliable test suite is an essential requirement. In addition, the test suite also serves as low level documentation. Moreover, a prerequisite for easy testability is good software quality. Therefore, continuous integration can indirectly foster good software design. Although not part of the initial definition of continuous integration, its usage nowadays implies the employment of some kind of continuous integration service that automates the build process and execution of the test suite on every integration.

Continuous Analysis

Addressed challenges: Availability & Accessibility, Dependency Hell & Software Evolution

The term continuous analysis was coined by Beaulieu-Jones and Greene [32]. It combines containerization with the continuous integration approach with the goal of improving the reproducibility of research. In order to run the test suite of an application the continuous integration pipeline must of course provide a runtime environment that is able to execute it. Continuous analysis suggests providing this environment in form of a Docker container. When the continuous integration pipeline for a computational analysis completes successfully, the environment is extended by the compiled application and published as a new Docker image. Since this image contains all the required dependencies and configuration alongside the application, it serves as an executable package that can be distributed to other researchers who can use it to reproduce the computational analysis run in the test pipeline

Archiving

Addressed challenges: Availability & Accessibility, Dependency Hell & Software Evolution

Archiving corresponds to the aspect of collecting and indexing digital copies of published papers or datasets in an accessible and usable format into public repositories. This ensures long-term availability and makes them citable as well as findable via meaningful metadata including a unique identifier (DOI), following the FAIR principles [10], [11], [12]. Furthermore, to enable researchers to reproduce the computational results, it is also essential to archive the software assets including the input data and the research software itself. Ideally, archiving the entire runtime environment, for example in form of the aforementioned containers, as a ready-to-use appliance will ease reproducibility and prevent any struggle with running the software.