Popis: |
The more recent discussion of research data practices at relevant conferences, workshops and respective publications suggest substantially different foci of problems and solutions in managing data between scientific disciplines. There seems to be a particularly profound gap in natural science and humanities whereas social and life sciences are placed somewhere in between. Indeed data centers tailored to the specific needs of a single discipline (physics, chemistry, climate studies) are numerous in science and tend to be nearly absent for a specific humanities subject. While the former ask for and report solutions on scaling up (larger quantities of data can be run by the same application) and scaling out (larger quantities of data can use the same infrastructure), the latter are concerned with the heterogeneity of relatively small amounts of data (long-tail problem) and a divergence of agreed standards; something we may term as cross scaling. In either case, an efficiency problem has to be solved. On the one hand, huge amounts of data have to be handled within an acceptable time frame, on the other hand, many different applications with diverse functionalities have to be handled with an acceptable number of resources. We would like to argue here that independent from the discipline either optimization problem should be addressed. Throughout the last decade, we have also observed that projects in science diversify and prefer individualized solutions which additionally hints at increasing data heterogeneity in natural science as well while, at the same time, some humanities projects produce petabytes of data. To show the necessity of a differentiated approach, the research data center of Universität Hamburg is offered as a case in point. The evolution of the center specialized in humanities projects to a research data center offering services for the whole university whereas other disciplinary data centers continue to exist side by side illustrates the entire range of tasks of data stewardship. It includes the continuous development of services while getting more and more involved in natural science projects as well as task sharing and communication with other data institutions. A core asset to understand the requirements of each discipline is a multidisciplinary team. Yet, the main organizing principle of the offered services centers around the stages of the data life cycle (1. data creation and deposit, 2. managing active data, 3. data repositories and archives, 4. data catalog and registries). The interdigitation of these stages is paramount in the long term strategy. Starting with the first stage, which includes planning and designing a data project, we offer a tool (RDMO) that produces data management plans in accordance with the standards of funding agencies or, if required, disciplinary standards. Users follow a questionnaire that adjusts all items in correspondence to the given answers. For users of small scale projects, it will be possible in the future to use their meta data via an API in a collaborative web database service such as Heurist. This system belongs to the second phase in the life cycle and is a first attempt in solving the cross scaling optimization problem mentioned above since it allows to define any complex data model and offers a wide variety of functions for recording, managing, analyzing, visualizing, publishing and archiving richly interlinked and heterogeneous research data in one application. This road to a common understanding of data stewardship provides a good platform to offer the data as open as possible and thus creates a model for good scientific practice. At the boundary line between the second and the third phase of the research data life cycle, the databasing-on-demand approach, for which a steadily growing number of applications in various projects augment the existing service portfolio, calls for large datasets in machine-readable formats. They make the research data usable across disciplines and allow for the interaction between disciplines. It is, however, important to note that databasing-on-demand is an ongoing process that is applied step by step whereas very specific requirements of larger collections that are usually not subject to sudden changes of ongoing research projects have to be taken care of right away. These scientific collections represent an essential part of the research data, especially in natural sciences, but also in other disciplines. Making digital representations of these (mostly) physical objects available, visible and searchable, as well as supporting the systematic further development, networking, sustainable digitization and secure storage are among the key issues in the process of digital transformation of science and culture and are thus part of the services of our center. The pivotal role in the third stage of the life cycle plays a zenodo based data repository connected to an object store technology. Data that is archived and published here receives registered digital object identifiers unless the data is organized in registered communities with restricted access. As a prioritized future objective, all research data from finished projects (e.g. from Heurist) or existent in one of the other solutions described above will be transferred via API to this repository. The services offered in the third stage of the data life cycle (and the overall strategy of the center) is driven by the idea of open access and open data. Openness is demanded by research funders and increasingly also by society, but in many cases it is difficult to implement. Open access in publications interferes with established publication channels and in some cases requires considerable financial resources, especially in the humanities when publishing books. Open data always reaches its limits when the research data are protected by copyrights or restrictive conditions and is almost impossible when the data have a connection to people. Openness is a challenge across the university, and we are meeting it with pragmatic solutions. Although not explicitly mentioned in the main stages of the data life cycle and often displayed as an inner circle, we have identified data training and knowledge transfer to the researchers as one of the most important success factors for raising data awareness. And it directly impacts our daily work. Researchers with a solid knowledge in RDM diminish useless work by several multiples and take most of the burden of miscommunication. As a university with eight faculties and all scientific levels, there are also very diverse levels of knowledge about research data management. Since the relevance is increasing and is also crucial for research funding, the center has built up a help desk, but workshops for students, doctoral candidates and senior scientists are also regularly offered. If supervisors are familiar with RDM and know the services of the center, they can in turn be good role models for their students and doctoral candidates. Finally, every research institution is requested to present specific scientific research outcomes to the public but also to create internal profound reports. As this is part of the final stage in the research data life cycle, we offer CRIS (current research information system). To keep track of those outcomes, the aim of CRIS is to reflect the actual state of research at the UHH by storing, managing and exchanging metadata. In addition to manual data filling, existing IT systems of the university are connected to CRIS as data providers by linking data spaces like HR, financial systems, or publication online databases. The usage of CRIS eases the regular reportage of the particular scientific institution but also increases the communication between scientists and the public, i.e. by providing a broad range of easily accessible research information. |