Open Access and Research Data

Intro

The research data management cycle

The key takeaways from this article are

1

Research data and their independent publication are becoming increasingly important.

2

The FAIR principles are a quasi-standard for research data.

3

Organizational, legal and infrastructural hurdles before publication can be overcome.

Research Data

Scientific findings in text form are based, as a rule, on research data. Research data come in a wide variety of forms and types. They comprise all (digital) data generated during the scientific process, for example, through measurements, simulations, interviews, or source work. In recent years, the management of these research data has increasingly attracted the attention of scholars and scientists as well as research institutions and infrastructure facilities. Whereas in the past, research data were often treated somewhat dismissively as a mere accessory to publications, made available as a matter of form, or disclosed only upon request, a strong trend towards the independent and prominent publication of research data in open formats (open data) is now apparent.

Reasons for Publishing Research Data

Research data facilitate the reproducibility and transparency of scientific results, the reusability and re-analysis of data, the merging of data from different sources, and thus the opportunity to conduct further research with existing data and to generate new knowledge. Ideally, reusability includes the right to download, copy, disseminate, and automatically process the data and to use them without financial, technical, or legal restrictions. The publication of research data facilitates their citability and thus enhances the scientific reputation of the authors.

Positions and Drivers

In 2016, the European Union (EU) integrated the Open Research Data Pilot into the Horizon 2020 funding program. This provides for the publication of research data under the premise of:

"as open as possible, as closed as necessary".

Participation is voluntary. In the successor program Horizon Europe, Open Science is appointed as the modus operandi and open access is prescribed for text and data publications as well as a provision of data according to the FAIR principles.

The Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) mentions the provision of data according to the FAIR principles both in its guidelines for ensuring good scientific practice and in its separately issued guidelines for handling research data. The Federal Ministry of Education and Research (BMBF) and the Volkswagen Foundation, for example, also require mandatory information on the further use and exploitation of data. Another driver will be the National Research Data Infrastructure (NFDI), the 10-year development of consortia for the

"sustainable, qualitative and systematic securing, indexing and utilization of research data via regional and networked knowledge repositories."

Since 2019, the Österreichische Wissenschaftsfonds (FWF, Austrian Science Fund) expects open access to research data for projects that are approved by it.

"For research data on which the scientific publications of the project are based, open access is mandatory. [...] If [...] open access to these data is not possible or only partially possible, this must be justified in the data management plan (DMP)".

The Schweizerische Nationalfonds (SNF, Swiss National Science Foundation) also considers open access to research data to be an essential contribution and titles its policy statement on Open Research Data as follows

"Research data should be open and accessible to all - to science as well as to society".

Positions and Drivers

As in the case of research literature, one argument for making research data available in open access is that their production was financed with public funds. At an early stage in the history of open access, the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities recognised data as objects that should be made openly available.
Besides the intrinsic motivation to be able to work more efficiently in increasing­ly data-driven research with the help of good data management and to benefit from open data oneself, the main drivers of the publication of data are the research funders.

Practical tip

Here you can find the slides of the presentation Open Data in EU Projects in German, which highlights the legal requirements of the EU Commission for research data in projects funded by it.

FAIR Principles

The guidelines of various national and international research funders – for example, the European Union and the German Research Foundation (DFG) – are aimed at encouraging compliance with the FAIR data principles. The German National Research Data Infrastructure (NFDI) has set itself the aim of making data “FAIRfügbar”. This is a play on the German word verfügbar, which means “available”. The acronym FAIR stands for Findable, Accessible, Interoperable, and Reusable. The “FAIR” concept was developed by the FORCE11 community and published in the journal Scientific Data on 15 March 2016 (Wilkinson, Dumontier, Aalbersberg et al., 2016). Support for the FAIR principles can be found inter alia in the G20 Leaders’ Communiqué issued at the end of the Hangzhou Summit in 2016. The FAIR principles are well on the way to becoming an internationally recognised standard for the handling of research data. “FAIR data” does not necessarily mean that all the data are openly available.

The four individual elements of FAIR mean:

  • Findable: In order for the data to be reusable, they must be easily findable. To render them findable, the data are described with rich human- and machine-readable metadata.
  • Accessible: Access to the data found must be possible according to clear rules; authentication and authorisation must be defined.
  • Interoperable: In order to use data and integrate them with other data, an accessible, shared, and broadly applicable language is needed for knowledge representation. Metadata use standardised vocabularies.
  • Reusable: The description of the data and the metadata facilitates their use in different contexts. Suitable data usage licences are used, and the data meet domain-relevant community standards.

Publication

When publishing research data, a suitable repository should be chosen – where possible one that provides open access to the data. A disciplinary repository that is well-established in the community in question should always be the pre­ferred choice, because one’s own data are thus in a good specialised context, and findability is easier. The Registry of Research Data Repositories, re3data, can be used to select a suitable data repository. If no suitable disciplinary repository can be found, general or institutional repositories can be used.
To guarantee the long-term provision and findability of the data, a permanent address must be assigned. This persistent identifier also ensures the citability of the datasets. Preferred identifiers are the Digital Object Identifiers (DOIs) provided by the DataCite consortium.
As the reusability of research data is greatly limited if they lack adequate descriptions and metadata, it is imperative that they be curated in accordance with the FAIR principles before publication. Metadata should be assigned at the earliest possible point in time during the research process. They comprise both technical metadata (e.g.: When and by whom was the dataset collected?) and substantive metadata (e.g.: What is the content of the individual variables?). During curation, the data are, above all, technically checked. This includes checking the data format, the basic access, and the formal accuracy. With regard to the data format, long-term accessible and open data formats should be used. The checking of the data content must be carried out mainly by the researchers themselves. The curation ends with the choice of a suitable licence. The Creative Commons licences have proved their worth (more information can be found on the English-language pages of the research data portal forschungsdaten.info).

Practical tip

Tips and tricks on how to make research data openly available to the community are available here in the slides for the Open Access Talk Research Data & Open Access - How to Publish Your Data in German.

Challenges

Three major challenges should be mentioned in the research data management (FDM) and publication process as surmountable hurdles:

Organizational

The management and curation of data requires additional skills. New job descriptions such as data curator, data steward, or data scientist are emerging, and research and infrastructure institutions must provide appropriate resources and train and educate employees accordingly.

Legal

Research data can be personal and very sensitive. This data must be anonymized before publication or its access must be restricted in such a way that no data protection rights are violated. Consideration of copyright aspects in connection with research data should also not be neglected. Legal advice at an early stage in the research process is strongly recommended.

Infrastructure

The resulting data volumes can become very large very quickly - especially in the natural sciences - and from past experience, the amount of data will continue to grow. Handling data in the petabyte range places demands on the storage, backup, archiving and transmission of this data.

Reservations

One criticism frequently voiced by researchers stems from their concern that others will benefit excessively from their wealth of data, and that they will not achieve the reputation they need for their scientific careers. It should be made clear in this connection that publication should be sought as early and as comprehensively as possible. However, the data may still be published at a later point in time – after the analysis or subject to an embargo period. Sovereignty over the data remains with the data producer.

The additional costs associated with the required handling of the data are also frequently mentioned. It is already possible to also request funding for these costs when submitting applications for research funding. Data management must be regarded as a key element of scientific research and be adequately staffed and funded.

Outlook

The process observed in the last few years will further intensify. The publication, reuse, and linking of research data will become standard scientific practice. There is good reason to believe that the boundaries between open access, text publications, and research data will become increasingly blurred; that the topics and tasks will overlap; and that a change of mentality in science will be brought about under the umbrella term open science. In order to achieve acceptance in science, data and their handling must be recognised as a scientific achievement.

References

  • Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Bonino da Silva Santos, L., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3. https://doi.org/10.1038/sdata.2016.18

Further Reading

Further Links

  • forschungsdaten.info – Link to the English-language pages of the German-language information platform about research data
  • go-fair.org/ – GO FAIR initiative
  • re3data.org – Registry of Research Data Repositories

Content editor of this page: Matthias Landwehr (Last updated: March 2021)