Towards Sustainable Research Software
The increasing number of publications that base their findings on scientific software indicates its continuously growing importance. However, scientific software is usually developed by the researchers themselves who seldom have the necessary education to ensure a high software quality , . At the same time, reproducibility is generally not a focus of the software development process. Part of the reason for these problems is the publish or perish mentality in research in correspondence with the still prevalent lack of reputation for software development . In the past, these factors have led to the production and publication of incorrect or irreproducible scientific results , . This prevailing situation has taken on an extent so that some scientists even speak of a reproducibility or credibility crisis , .
According to a study by Collberg et al. , the problems concerning reproducibility primarily manifest themselves in lacking documentation, unavailable environment, and missing packages, often ending in non-compilable systems. This implies that it is not enough to share the codebase to ensure reproducibility; it is also important to enable other researchers to run research software with minimal efforts. It also appeared that researchers are often reluctant to publish the code of their tools or disregard publishing the raw data , . Indeed, authors often see reproducibility as extra effort without benefit for their submission because publishing reproducible work takes time and must be thought of at the earliest stages. In contrast, in order to enable researchers to reuse it and reproduce results, the research software should comply with the FAIR principles (Findable, Accessible, Interoperable and Reusable) , , . Hence, to drive the scientific discovery process forward, scientists not only need to be able to reproduce prior results but also need to be able to build upon them to answer new scientific questions. In other words, the reproducibility of scientific results in itself is not sufficient. Instead, the results as well as the process to produce them must be accessible and adaptable. For scientific software, this means that it must not only be available and executable, but other scientists must also be able to fix bugs, add features, and port existing implementations to new environments later on. The reproducibility of results as well as the ability of software to endure and evolve over time can be summarized as software sustainability , .
The question of how to design sustainable software systems is one of the grand challenges in the field of software engineering . Many decades of research and experience have made it clear that there is neither a magical tool nor any easy path to achieve it , . However, there is an agreement on certain fundamentals of software engineering such as cohesion and coupling, modularisation, abstraction, information hiding and separation of concerns as well as striving for simplicity , , , , . Various guidelines were proposed to achieve sustainability in general and reproducibility in particular, but their implementation remains difficult . This work focuses on supporting the sustainability of research software by providing practical methods and tools that help sustain essential and large software from different fields of study. We address the key concerns researchers face when looking to extend existing research software. This includes the effort required to build the software, the difficulty of reproducing the results, and the long-term maintenance of the software. In section III, this paper describes SURESOFT, a conceptual approach to develop sustainable software that allows widespread applications. To demonstrate the SURESOFT approach, we applied it to five research projects at Technische Universität Braunschweig (TU Braunschweig) and Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) from different scientific fields:
- elPaSo: a vibroacoustic simulation tool which utilizes popular numerical methods to provide acoustic and structural analyses of various complex material and element types
- PyADF: a software from the branch of theoretical chemistry that is characterized by large amounts of numerical data and complex computational tasks
- SiMoNe: a system-level simulator for simulating and modeling realistic mobile networks
- THEMIS: a fault-tolerant distributed framework
- VirtualFluids: a computational fluid dynamics research software that provides fast and reliable numerical solutions for various kinds of flow problems
You can learn more about these projects here