What should be the essential baseline practices for repositories that manage research data?

COAR and SPARC have published a joint response to the OSTP Request for Public Comment on Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research.

Good data management is critical for ensuring validation, transparency of research findings, as well as to maximize impact and value of publicly-funded research through data reuse. Repositories provide crucial services that manage and provide access to data, articles, and a wide array of other types of scholarly content and are essential community tools for good data management.

Our response seeks to achieve a balance between the use of best practices for managing data in repositories while at the same time ensuring that requirements are not so overly onerous that they result in excluding a large number of repositories.

We propose a framework that provides essential practices for repositories, based on specific objectives. Our proposal is based on input from the repository community in the US and internationally, and with consideration of the current recommended characteristics outlined in a number of other contexts: Data Citation Roadmap for scholarly data repositoriesCore Trust SealFAIR data principles,  PLOS “Criteria that Matter”, TRUST, and COAR Next Generation Repositories Technologies.

Currently there are initiatives and assessment models for repositories that focus on different objectives (for example, FAIR criteria are focused on discovery and reuse, while the Core Trust Seal is focused on sustainability and preservation). COAR would like to bring these various criteria together into a comprehensive framework for best practices in repositories, that would also provide a tiered approach which include “essential”, “highly recommended”, and “nice to have” criteria.

Over the next several months, an international working group at COAR will refine, expand and validate the initial framework below, bringing together community-accepted norms and practices across all key areas. Widespread community input will be a critical aspect of this process.

ObjectiveEssential Characteristics
Discoverability of dataHigh quality metadata (discipline-based or general metadata schema (e.g. Datacite or Dublin Core metadata) with an OAI-PMH feedRepository has well documented APIsRepository assigns a citable, persistent unique and universal identifier (PUID) that points to the landing page of the dataset (even in cases where data is no longer available or data is not available for security purposes)
Equitable, free and ongoing access to dataThere is no cost to the user for accessing data once it is publishedRepository ensures ongoing access to data for a publicly stated time frameRepository has a contingency plan to ensure data are available and maintained during and after unforeseen events
Reuse of dataRepository supports the use of machine readable licenses (e.g. Creative Commons Licenses)Repository provides citable PUIDs
Data integrity and authenticityRepository provides information about data provider(s) including contact information of the person(s) responsible for the data.Repository provides a record of all changes to metadata and data in the repositoryRepository provides documentation of its practices that prevent unauthorized access/manipulation of data
Quality assuranceRepository undertakes basic curation of metadata and dataRepository provides documentation about what curation processes are applied to the data and metadata
Privacy of sensitive data (e.g. human subjects, etc.)In cases where the repository is collecting sensitive research data, the repository provides tiered access based on the different levels of security requirements of dataIn cases where the repository is collecting sensitive research data, the repository has mechanisms that allow data owners to limit access to authorized users only
Sustainability and preservationRepository (or organization that manages repository) has a long term plan for managing and funding the data repositoryRepository has a public data retention policy that defines the duration of time the data will be preserved and documentation about preservation practices
OtherRepository has a contact point or helpdesk to assist data depositors and data usersRepository provides documentation about the scope of data accepted into the repository

READ THE FULL OSTP RESPONSE



Discover more from COAR

Subscribe now to keep reading and get access to the full archive.

Continue reading