Popis: |
Domain-specific Data Management Plans and Cross-Disciplinary Interoperability Sebastian Netscher sebastian.netscher@gesis.org GESIS – Leibniz-Institute for the Social Sciences Anna Schwickerath anna.schwickerath@gesis.org GESIS – Leibniz-Institute for the Social Sciences Anja Perry anja.perry@gesis.org GESIS – Leibniz-Institute for the Social Sciences Reiner Mauer reiner.mauer@gesis.org GESIS – Leibniz-Institute for the Social Sciences Abstract While the relevance of research data management and data sharing is increasing, processing FAIR data is still challenging for researchers. Guidance, such as templates for data management plans, to support researchers in doing good data management quite often fail to do so being too general and at the same time providing little information relevant for particular research domains. In order to overcome this challenge, Science Europe (2018a) suggests developing so-called Domain Data Protocols, as intended by the projectDomain Data Protocols for Educational Research,described in the proposed paper. DDPs not only support RDM in a given research domain. They also provide an opportunity to figure out differences as well as common practises in research data management across research domains, which is a matter of fostering interoperability and re-usability of data, as required by the so-called FAIR data principles. Introduction As a prerequisite of their financial support, funding institutions increasingly demand the re-usability of research data in order to make efficient use of tax money. At the same time the research community in general, and journals in particular, require research findings to be replicable as an easily verifiable form of quality control. For researchers, these developments are not without challenges, as they need to engage in various forms of research data management (RDM) in order to meet these requirements. Not every researcher has the appropriate knowledge of RDM techniques, particularly in the early stages of their career (Whitmire et al. 2015). Depending on the design of their research projects, they are faced with several time-consuming RDM activities and a large variety of templates and guidance. One example for such guidance are various templates for data management plans (DMPs), aiming to assist researchers in doing sufficient research data management. However, for researchers it can be challenging to choose appropriate templates for their particular research project. Due to the fact, that most of the DMP templates and related guidance are quite unspecific, they do not take the specific needs of a research domain into account. Consequently, there is an increasing demand for more domain-specific advice to better assist researchers in their RDM activities, especially regarding specific data types or research populations. The challenges described here were the starting point of the projectDomain Data Protocols for Educational Research in Germany (DDP project). Starting in June 2019 with twelve German partner institutions, it sets out to develop so-called domain data protocols, following a concept of Science Europe (2018:8). The main aim of the DDP project is supporting researchers in creating Open Data following the FAIR Data Principles to increase reusability of data and transparency in research. Subsequently, with this paper we argue for a more domain-specific approach in research data management. We therefore begin by stating the reasons for this approach. In a second step the paper gives an overview of the current developments in the DDP project, and thereby illustrates the benefits of providing researchers with tailored guidance for their domain. The need for Domain-Specific Guidance Requirements of funders and journals regarding RDM pose several challenges for researchers. Especially because not all of them are sufficiently familiar with the implications and good practise examples of RDM to produce high quality data for re-use. According to Whitmire et al. (2015:382) researchers’ “skills are crucial to ensuring data quality, integrity, shareability, discoverability, and reuse over time”. Although various DMP templates exist - some focussing on the re-usability of research data for the benefit of the research community, others with a focus on meeting the demands of funding institutions or journals - the challenge remains that most of these templates offer relatively little information on what FAIR data should look like. Usually, they are quite unspecific, providing a set of questions on different aspects of RDM and listing activities that should be undertaken, formulated in a superficial style. Grootveld et al. (2018:9) state that researchers need “much more tailored guidance and domain-specific examples to help them apply the DMP questions to their context”. Moreover, many of these templates have been developed by a particular research institution with a special orientation difficult to transfer to other contexts. For instance, theResearch Data Management Organizer(RDMO 2021) currently lists ten different templates, six of them from an institutional context. Of course, such DMP templates have several advantages. Not only does creating cross-disciplinary templates make it easier to provide guidance. But general templates also ensure that RDM activities are comparable between projects as well as across domains, at least to some degree. In this regard, general templates increase interoperability of DMPs and maybe also other researchers’ understanding of the research data and can thus foster the use of data across domains. However, a lot of such DMP templates include RDM activities that are inapplicable for many research projects. For example, while almost all templates contain questions on data protection (and the GDPR), a highly relevant topic for the social sciences, these are much less relevant if not obsolete for most projects in natural sciences, as they are unlikely to deal with humans and human behaviour. Finally, many DMP templates do not cover the entire data management process, namely the data life cycle, but focus on specific topics, e.g., processing FAIR data that can be shared with others, such as the EU-Horizon2020 DMP template (2021; Smale et al., 2020). Consequently, Smale et al. (2020) conclude in their study that they “have not been able to find any evidence that DMP use directly leads to improved data management practices”. In short, such general templates tend to be ineffective in assisting researchers to do sufficient RDM. This conclusion is in line with findings from the OpenAire DMP-Survey (Grootveld et al. 2018), examining the needs of researchers regarding DMP templates. Asked about desirable improvements for the H2020 DMP template, the study reveals the following among the top-five answers: more domain-specific standards (highest priority), best practise advice (third highest priority) and domain-specific examples (fourth highest priority). In addition, researchers recognised the “level of expected knowledge" on RDM as “too high” (Grootveld et al. 2018:17). The authors of the study thus conclude that: “one of the most frequent requests for support was domain-specific guidance. Projects would really welcome examples answers or ranges of options based on good practice for their field” (Grootveld et al., 2018:37). They therefore suggest a need for more domain-specific DMPs and RDM guidance instead of various general DMP templates by different institutions. Developing Domain-Data-Protocols In order to provide better guidance on RDM for researchers and to develop more domain-specific DMP-templates, Science Europe suggested to develop so-called Domain Data Protocols (DDP): “generally agreed-upon guidelines, or predefined written procedural methods. One might also conceive a DDP as a ‘model DMP’ for a given domain or community that shares common methods” (Science Europe 2018a:8; see also Science Europe 2018b). Unlike traditional DMP templates, DDPs are predefined data management plans providing answers to the questions listed in DMP templates as well as including best practise examples and domain-specific standards on how to achieve sufficient RDM. Moreover, DDPs are adapted to a given research domain, examining its specific needs, e.g., regarding the types of data used/produced, the research method or the research objects analysed. For the above mentioned DDP project, a consortium of twelve German research institutes, most of them directly involved in educational research, began developing DDPs for this domain in Germany in 2019. The project aims at developing such DDPs in order to better assist researchers in doing sufficient RDM and to support the educational research community in processing data of high quality according to the FAIR Data Principles (see Wilkinson et al 2016; Force11 2021). DDPs provide a minimal set of requirements for FAIR data, combined with domain-specific standards, use cases and further resources. Thus, they are not limited to assisting researchers in their RDM activities and processing FAIR data but also to support them in their attempts to acquire funding. Reasons to develop such DDPs for educational research in Germany are manyfold. First, research projects in educational research are characterised by common methods of data collection and analysis, as well as a similar research population, which is – at least to some degree – highly sensitive, e.g., when examining the behaviour of children. A large variety of different types of data and methods of data collection, e.g., in terms of conducting standardised surveys, observing classes and analysing exams, doing expert interviews, group discussions etc., gathered online, face-to-face, by video- or audio recording can be found in these projects. Second, the most important funder in this field in Germany, the Federal Ministry for Education and Research, has made data sharing mandatory a couple of years ago. Third, an infrastructure for data archiving and sharing was established, theGerman Network of Educational Research Data(VerbundFDB 2021), which also serves as a source for templates, guidance and best practise examples on various RDM activities in the context of educational research. Discussion While the need for more domain-specific guidance on RDM increases, DDPs offer a solution for the benefit of a particular domain and the research community in general. Developing domain-specific data protocols might lead to more diversity in RDM, as different domains will consequently use different DDPs. Some may thus argue that compared to universal DMP templates, the use of DDPs may decrease the comparability of RDM activities between domains, resulting in research data being less interoperable and less easy to re-use across domains. But this does not necessarily have to be the case. We argue that DDPs will instead lead to an increase in the understanding across domains. As processing FAIR data is central to the protocols, improving findability, accessibility, interoperability and re-usability of data, making RDM clearer and more transparent. In contrast to traditional DMP templates, which usually consist of lists containing open questions on RDM activities, DDPs provide clear guidance and support for their particular domains. It is this domain specificity that makes it easier for researchers to implement various RDM activities and to process shareable data that can be understood and re-used by others in new (research) contexts. In short, DDPs increase interoperability and re-usability of data, making such data somewhat FAIRer. Furthermore, DDPs thereby differ from traditional approaches, as they focus on the data, i.e., the FAIR outcome of the research process, instead of the RDM activities. The idea is to give answers rather than asking more questions. But DDPs not only make data FAIRer, they also foster comparability and interoperability of RDM activities. For example, data protocols for educational research can be used in other social science domains dealing with similar types of data and research objects. Once developed and tested for educational research, the protocols can be adapted to fit other domains, e.g., by modifying domain-specific terminology. This applies not only for social sciences but for the entire scientific community. For example, data documentation, enabling other researchers to understand and evaluate the data, is a common requirement for all research data, regardless of its domain. Other domains can thus use domain data protocols on educational research to recapitulate their own requirements on data documentation in distinction to the guidance provided here. In turn, DDPs for educational research would benefit from data protocols for other domains, e.g., in natural sciences, by taking over best practise advice on dealing with and storing data of high volume. In short, developing DDPs for different domains will highlight similarities as well as differences in RDM activities between these domains, involving a spill-over effect for all domains and the entire research community. In sum, DDPs provide an opportunity not only to better assist researchers in doing sufficient RDM and processing FAIR data but also to figure out common practises and differences across domains. This in turn can increase the understanding of data across domains and thus contribute to interoperability and re-usability as intended by the FAIR data principles. Literature Brandt, D. S. (2007). Librarians as partners in e-research: Purdue University Libraries promote collaboration. College & Research Libraries News, [S.l.], Vol. 68.6, pp. 365-396. DOI:https://doi.org/10.5860/crln.68.6.7818. Federal Ministry of Education and Research (2021). Available at:https://www.bmbf.de/de/bildungsforschung-76.html, latest access: 2021/01/14. Force11 (2021). The FAIR Data Principles. Available at:https://www.force11.org/group/fairgroup/fairprinciples, latest access: 2021/01/14. Grootveld, M., E. Leenarts, S. Jones, E. Hermans and E. Fankhauser (2018). OpenAIRE and FAIR Data Expert Group survey about Horizon 2020 template for Data Management Plans (Version 1.0.0). DOI:http://doi.org/10.5281/zenodo.1120245. EU-Horizon2020 (2021). Horizon 2020 FAIR Data Management Plan (DMP) Template. Available at:https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-issues/open-access-data-management/data-management_en.htm#A1-template, latest access: 2021/01/14. RDMO - Research Data Management Organizer (2021). Available at:https://rdmo.aip.de/, latest access: 2021/01/14. Science Europe (2018a). Science Europe Guidance Document Presenting a Framework for Discipline-Specific Research Data Management. D/2018/13.324/1. Science Europe (2018b). Practical Guide to the International Alignment of Research Data Management. D/2018/13.324/4. Whitmire, A. L., M. Boock and S. C. Sutton (2015). Variability in academic research data management practices: Implications for data services development from a faculty survey, Electronic Library and Information Systems, Vol. 49.4, pp. 382-407. DOI:https://doi.org/10.1108/PROG-02-2015-0017. VerbundFDB (2021). Verbund Forschungsdaten Bildung - German Network for Educational Research Data. Available at:https://www.forschungsdaten-bildung.de/index.php?la=en, latest access: 2021/01/14. Wilkinson, M. D. et al. (2016). The FAIR Guiding Principles for Management and Stewardship. Sci. Data 3.160018 DOI:https://doi.org/10.1038/sdata.2016.18. |