COAR and SPARC have published a joint response to the OSTP Request for Public Comment on Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research.

Good data management is critical for ensuring validation, transparency of research findings, as well as to maximize impact and value of publicly-funded research through data reuse. Repositories provide crucial services that manage and provide access to data, articles, and a wide array of other types of scholarly content and are essential community tools for good data management.

Our response seeks to achieve a balance between the use of best practices for managing data in repositories while at the same time ensuring that requirements are not so overly onerous that they result in excluding a large number of repositories.

We propose a framework that provides essential practices for repositories, based on specific objectives. Our proposal is based on input from the repository community in the US and internationally, and with consideration of the current recommended characteristics outlined in a number of other contexts: Data Citation Roadmap for scholarly data repositories, Core Trust Seal, FAIR data principlesPLOS “Criteria that Matter”, TRUST, and COAR Next Generation Repositories Technologies.

Currently there are initiatives and assessment models for repositories that focus on different objectives (for example, FAIR criteria are focused on discovery and reuse, while the Core Trust Seal is focused on sustainability and preservation). COAR would like to bring these various criteria together into a comprehensive framework for best practices in repositories, that would also provide a tiered approach which include “essential”, “highly recommended”, and “nice to have” criteria.

Over the next several months, an international working group at COAR will refine, expand and validate the initial framework below, bringing together community-accepted norms and practices across all key areas. Widespread community input will be a critical aspect of this process.

Objective

Essential Characteristics

Discoverability of data
  • High quality metadata (discipline-based or general metadata schema (e.g. Datacite or Dublin Core metadata) with an OAI-PMH feed
  • Repository has well documented APIs
  • Repository assigns a citable, persistent unique and universal identifier (PUID) that points to the landing page of the dataset (even in cases where data is no longer available or data is not available for security purposes)
Equitable, free and ongoing access to data
  • There is no cost to the user for accessing data once it is published
  • Repository ensures ongoing access to data for a publicly stated time frame
  • Repository has a contingency plan to ensure data are available and maintained during and after unforeseen events
Reuse of data
  • Repository supports the use of machine readable licenses (e.g. Creative Commons Licenses)
  • Repository provides citable PUIDs
Data integrity and authenticity
  • Repository provides information about data provider(s) including contact information of the person(s) responsible for the data.
  • Repository provides a record of all changes to metadata and data in the repository
  • Repository provides documentation of its practices that prevent unauthorized access/manipulation of data
Quality assurance
  • Repository undertakes basic curation of metadata and data
  • Repository provides documentation about what curation processes are applied to the data and metadata
Privacy of sensitive data (e.g. human subjects, etc.)
  • In cases where the repository is collecting sensitive research data, the repository provides tiered access based on the different levels of security requirements of data
  • In cases where the repository is collecting sensitive research data, the repository has mechanisms that allow data owners to limit access to authorized users only
Sustainability and preservation
  • Repository (or organization that manages repository) has a long term plan for managing and funding the data repository
  • Repository has a public data retention policy that defines the duration of time the data will be preserved and documentation about preservation practices
Other
  • Repository has a contact point or helpdesk to assist data depositors and data users
  • Repository provides documentation about the scope of data accepted into the repository