Challenges

As previously discussed, there is no general agreement on the definition of software sustainability [14], [15]. Still we have identified reproducibility of results and the capability of software to endure and evolve over time to be the most important aspects from our perspective. However, the implementation of these criteria presents certain conceptual, technical as well as organizational challenges. Some of these challenges especially in the context of reproducibility have been addressed e.g. by Boettiger [24]. However, our discussion covers a wider context.

Availability & Accessibility

Publications in the field of scientific computing usually present models, methods and result data. However, the software itself is usually not published alongside the paper as it is considered nothing more than a highly advanced calculator and does not add to the scientist’s reputation [25]. This poses an issue in terms of reproducibility. Without access to the source code of the software, it is impossible for other scientists to trace back the calculation of results and ensure that the implementation of the underlying models is correct. Furthermore, without access to the binaries and a compatible computing environment, results cannot be reproduced. To ensure long-term availability of the software and its source code, it needs to be archived on an appropriate platform. Another question raised in the context of availability and accessibility is the one of the corresponding license. Frequently, software will be published using a free or open-source software license, but also a proprietary software license might be an option to make your software and its source code available.

Documentation

Providing a thorough documentation is essential to enable others to reuse and extend the research software as well as reproduce results. However, the documentation process requires a commitment from early development stages and also consumes a larger amount of time than most researchers can afford. Since scientific software undergoes heavy modifications as part of the discovery and development process, writing the documentation is often pushed to the very end of the project at which point there is no sufficient time to do it thoroughly. Thus, the documentation is either neglected or in an incomplete state. Moreover, most of the time scientific software is not published and therefore writing a documentation to enable other scientists to understand, reuse and extend the software is often not even considered. As a consequence, even the installation or build processes become hard or even impossible to reproduce [7].

Software Quality & Design

The majority of research software is developed by scientists themselves rather than by professional software developers [26], [27]. The main reason for this is that in order to develop the software, in-depth domain knowledge is required. Hence, being domain experts themselves, scientists usually strive to get the necessary software development knowledge through self-studies or receive it from colleagues [27], [28]. Unfortunately, self-education only happens to a limited extent that is sufficient to achieve their primary short-term goal of getting a scientific reputation. In other words, the software itself is of a limited value and only serves as a tool to gain new insights as soon as possible [29]. Therefore, scientists often follow a quick and dirty software development approach as opposed to focusing on high quality, long-term sustainable software [30]. Accordingly, they have little motivation to learn the corresponding skill set. Nevertheless, code that is easy to read, understand, change and reuse needs to be structured appropriately [31]. Due to the lack of software engineering knowledge among scientists as well as the time pressure, scientific codes often suffer from bad quality and design. The result being software that is tightly coupled, unstructured, hard to understand and not well tested.

Collaboration & Versioning

Research software often starts out as a project of a single scientist but eventually ends up being used and maybe even developed further by their peers as well. These collaborators can either be located within the same institution or from the wider research community. As software development progresses, it becomes harder to track the changes in code and documentation over time. This is especially true if multiple developers collaborate while working on their own code copies. Moreover, the integration of these versions becomes a challenge in different aspects. On the one hand, they could contain conflicting changes. On the other hand, there is no guarantee that the software still works as intended after integrating. Also, untraceable changes complicate the reproduction of scientific results, as it is not clear how to match a version of the software with the corresponding result data.

Dependency Hell & Software Evolution

Many software systems use existing solutions in the form of third-party libraries. These libraries reduce the workload necessary to solve a problem. However, they also represent dependencies that must be available in order to run the software. For users, this can become a problem with respect to reproducibility if dependencies are either not documented, no longer available, or not compatible with certain platforms. Another aspect of this issue arises when the dependencies evolve. Further development of libraries either to implement new features or to fix bugs from previous versions may eventually result in different behavior, making it incompatible with older versions. As a result, data from older versions cannot be reproduced with the new version. In addition, evolving dependencies may even prevent developers from compiling and running the software, making it unusable.