General Guidelines
The availability of software is essential to reproducible research. Therefore the we encourage all software projects to be open source and available under a free license to ensure that they can be reused and extended by others. A closed source solution is still possible if agreements with other parties (e.g. industry partners) require it. However, the open source approach should be the default for all software projects.
Quality Standards
Quality standards regarding the source code of the project as well as additional quality assurance measures should be chosen depending on its complexity and use cases. A higher complexity implies higher quality standards. Likewise, publically available projects that are intended to be used by other researchers are also encouraged to follow high quality standards. The Suresoft Project proposes to select one of the application classes as outlined in the following table that best matches the project criteria.
Application Class | Description |
---|---|
0 | Small scripts only intended for personal use |
1 | Software intended to be used and extended by others |
2 | Software with long-term development and maintanability requirements |
3 | Mission-critical software |
The chosen application class determines the quality standards that the project should fulfill.
License
All Research Software Projects with application class 1 or higher must be assigned a license. Otherwise, the software is considered to be under exclusive copyright by default. In other words, it can not be used, distributed or modified by third parties. Vice versa, any foreign contribution to the code applies the same restrictions to the original authors as well. [1]. It is recommended to use a recognized Open Source license to facilitate the reuse of the Software project.
Documentation
Documentation plays an important role for users and developers of software. A detailed documentation describes the purpose of the project, enables users to install and run the software and provides potential contributors with instructions on how they can help with further development. Moreover, the docu- mentation should also contain metadata that makes the software discoverable and citable. Therefore, it is encouraged to provide additional metadata in a machine readable format. The Suresoft Project recommends following the DataCite schema. The following table provides an overview of what the documentation should cover:
Application Class | Recommendations |
---|---|
>= 0 | Installation and usage instructions for the software*. Reference to used third party assets. |
>= 1 | Metadata following the DataCite, CodeMeta or CFF schema |
>= 2 | Documentation of relevant concepts and software architecture |
* Documentation for installation and usage should include: Necessary environment variables
- Supported platforms
- Specific hardware requirements
- Step by step guide from build to usage
Version Control
Using a version control system (VCS) is highly recommended for all projects regardless of complexity. A VCS is a system where source code is managed in a central repository while developers work on local copies of the code. The VCS tracks the changes made to the local copies. The changes can then be com- mitted to the main repository. This creates a version history that provides multiple advantages. Commits to the repository contain the name of the developer and which changes were made. Therefore it is always clear which parts of the code base were contributed by a certain developer. The commit history also pro- vides a documentation of the changes made to the software over time. For developers the most relevant aspect is that this allows changes to be rolled back if a change breaks some functionality of the software. Modern VCS usually offer the possibility of working on different development branches, so new features or experiments don’t conflict with the main source code until they are ready to be merged.
The list below provides an overview of what should be included in the repository besides the source code.
- A Readme file with a short description as well as documentation about installation and usage. It is recommended to name this file README.md as this will be displayed by modern hosting platforms like GitLab automatically on the repository page.
- A metadata file following the DataCite schema.
- Stable versions of the software are marked explicitly for easy identification and citation
Continuous Integration and Continuous Analysis
Continuous Integration (CI) is a development practice that builds on the usage of a VCS. The basic idea is that developers build, test and integrate their work with the main source code frequently. Frequent integration reduces the amount of changes included in a single merge, therefore limiting the possible sources of conflict. Moreover, building and testing the software on every merge provides rapid feedback and allows developers to fix breaking changes quickly. CI is usually automated using a CI Server that performs the build and test actions.
Continuous Analysis extends the idea of CI by utilizing modern Container technologies. This approach proposes that the CI pipeline runs inside a container providing an environment that allows the software to be compiled and tested. For software releases a new container image based on the CI environment should be created that includes the produced software binaries. Releasing this newly created image en- sures that end users always have access to an environment able to run the software and therefore enabling reproducibility of results.
Recommendations for CI:
- Every change to the repository triggers a CI Pipeline that builds and tests the Software
- For every stable release of the Software the artifacts produced in the CI Pipeline are published automatically
Recommendations for CA:
- For every stable release of the Software a Container Image containing the compiled Software and all its dependencies is published
Code Standards
For every project it is recommended to decide on a set of code standards. The standards should cover at least a code style guide and the usage of automated tests.
A code style guide regulates formatting as well as naming of variables, classes and functions of source code. Therefore, it helps keeping the code style consistent which helps developers understand code that was contributed by their peers. As different programming languages follow different conventions for formatting and naming it is recommended to base the code style on the style guide of the language the project is written in.
Automated tests are pieces of code that perform actions on the whole or isolated parts of the application and assert the correctness of the results of these actions. Most programming languages offer testing frameworks that help setting up automated tests.
Application Class | Recommendations |
---|---|
0 | Automated tests are recommmended but not required |
>=1 | The software should have unit tests that verify the most important features |
>=2 | The software should have an extensive test suite including unit, integration and acceptance tests |
3 | The previous recommendations are mandatory for applications of this class |
Development Culture
As pure functionality is not enough to create sustainable research software, the development culture in a software project should enable developers to produce high quality code. While the algorithms at the core of research software are usually of mathematical nature, developing the software structure and architecture is a creative and iterative process. Therefore, to create a system that produces the desired results while being extendable and maintainable it is important that software development is recognized as part of the scientific activity. This means that developers must be given the time needed to create such a system. To ensure that the code base meets the desired quality standards new code should be reviewed regularly by the development team or even a third party. Moreover, as the field of software development evolves rapidly it is important that developers are given the opportunity to continuously educate themselves on best practices and new technologies.
- The code is reviewed regularly to ensure a high code quality
- Developers are given the development time needed to ensure a high code quality
- The team recognizes development and maintenance of Research Software as scientific activity
- Developers are given the opportunity to continuously educate themselves on best practices and new technologies
Releases & Publications
n order to make it clear which versions of research software are intended for general public use, stable versions (releases) of software should be considered publications. Software publications must include documentation as outlined above and at least an executable version of the software. Publishing the source code is not strictly required, but is encouraged in order to allow other scientists to understand how results of that software were produced. Like published articles stable software versions should be made available through a searchable repository that complies with the FAIR Principles. For possible choices see section 2.5. While it is highly recommended to allow public access to the software, its source code and its results the mentioned publication services also allow restricted access to the research.