This document was developed as part of The Long Tail of Research Data Interest Group of the Research Data Alliance.

It was WRITTEN BY Wolfram Horstmann, Amy Nurnberger, Kathleen Shearer, Malcolm Wolski

  1. Recognize and understand the diversity of data created at your organization, or through your funding support, and develop appropriate frameworks for managing those data.

Given the varying dimensions of data sets (e.g. by size, subject, provenance, funding, format, longevity, location or complexity of research data), dealing with them is highly context-sensitive. When drafting policies, designing funding programmes, producing data or building technical infrastructure it is paramount to understand the nature of data being produced, along with the inherent opportunities and limitations of the data being generated. The use of data management plans, along with local institutional support for data management will contribute to ensuring that long tail data are managed and shared appropriately.

  1. Scale existing funding mechanisms to support research data management for small research projects

Funding for data management is often available for large research activities, but much less so for the data produced through smaller scale research projects. Additionally, some disciplines have subject-specific data-services, but these are not available to less well-established fields. There is a need to allocate funding for data management across all fields and project scales in order to support the management of long tail data.

  1. Expand and strengthen the institutional role in managing research data.

Many long tail datasets are at risk of being lost because they are not managed appropriately. Local support for researchers generating data will increase the adoption of standards and best practices earlier on in the research process improving the likelihood that data are preserved, understood, and reused by others. We encourage universities and institutions to offer support services for research data management (RDM). In particular, RDM services should become part of the standard service provision of research libraries, where libraries supply expertise in issues of information management from the initial stages of data management planning, through active data management challenges, to careful consideration of the requirements for longer term data management, such as repositories.

  1. Develop and apply common standards across institutions and domains to ensure greater interoperability across datasets.

The integration of disparate datasets offers tremendous potential for new discoveries. A distributed network of research data management services has many advantages including greater support for local needs and requirements, more comprehensive coverage and increased resilience against loss. These advantages, however, come with corresponding challenges around the coherence and integration of research data, one of the major objectives of open science. Many of the current standards for research data are discipline specific, and therefore are not immediately applicable for interoperability and/or integration for the diversity of  long tail data. We recommend the development of common, high level metadata elements that will support data integration across diverse types of research data and disciplines.

  1. Support reproducibility and transparency of research by linking data, software, and  literature.

One of the great opportunities in the digital environment is the improved capacity to use research data and methods to reproduce research findings. Reliably linking the literature to the underlying data and tools, such as software and code (as well as the physical samples that are the sources of data) supporting research conclusions, will make it easier for others to verify claims, whilst also facilitating greater reproducibility of research. We encourage the community to work together to identify best practices for linking research data with related literature and associated tools.

  1. Establish governance structures that reflect the diverse dimensions of research data.

In order to ensure the appropriate mechanisms are in place to support long tail data, RDM governance should reflect the diversity of data. We need to ensure that the diversity of long-tail data, both in terms of scope and discipline, are well represented in the evolving RDM governance structures. This can be accomplished by ensuring greater involvement by subject specialists from both novel and well-established disciplines, technology experts, and research data managers from diverse institutions.

  1. Develop coherent principles and policies for the collection and preservation of long tail data. 

In the context of the long tail, not all data may have value for future use or there may be budget restrictions around collecting and preserving all data. Institutions and funders need guidance to determine good practices for assessing the potential value of research data, and data repositories need to develop policies for the selection, collection, curation, and stewardship of data and for evaluating which data have long term value. Related to this, there are also need to be better established tools for calculating costs of long-term data stewardship and curation.

The full document is available on the RDA website