The potential to create a unified body of scholarly materials relies on interoperability – specifically, that repositories follow consistent guidelines, protocols, and standards that allow them to communicate with each other and with other systems in order to transfer information, metadata, and digital objects. COAR is working with the repository community to improve the quality and comprehensiveness of metadata in repositories.
Below are the metadata requirements for a number of use stories developed by the COAR Metadata Working Group. In addition, the Working Group will be developing and publishing a number of strategies – such as shared curation models and automated metadata enhancement – that can be used to help repositories improve their metadata quality and comprehensiveness
User stories and related metadata requirements
User Story: “As a researcher, I want my publications and other research outputs to be available to users via Google and Google Scholar.“
- Author name / ID
- Publication date
- Permanent URL (e.g. DOI or Handle)
- Domain subject headings or key words
User Story: “As a teacher, I want to be able to find permissions, or licensing information about an article so I can determine if I can reuse it for my course.”
- reuse license (e.g. https://creativecommons.org/platform/toolkit/#license-metadata)
User Story: “As a researcher, I want to be able to reuse a data set found in a repository.”
- link to article
- link to related protocol / software from research study
- reuse license
- domain metadata requirements
User Story: “As a research funder, I want to monitor compliance with our open access policy.”
- Author name / ID
- Publication Date
- Funder name / ID
- Open access status
- Article version
- Journal name and publisher name
User Story: “As a research institution, I want to document and track the publications of our affiliated researchers.”
- Author name / ID
- Institution name / ID
User Story: “As a repository manager, I want to ensure my repository content is included in national research assessment activities.”
- Author name / ID
- Institution name / ID
- Funder name / ID
User Story: “As a researcher, I want the metadata for my article to be visible and findable in other indexing services.”
- PIDs: ORCID, DOIs, etc.
- Standard vocabularies (e.g COAR)
- Mapping to meta-schemas (e.g. schema.org)
User Story: “As a repository manager with content in non-English language, I want my records to be available to local users in their local language as well as through international indexes and discovery services.”
These requirements are pending as COAR examines best practices recommendations for managing and exposing content for non-English languages.
User Story: “As a scholar, I want my research outputs to be available over the long term and remain as a permanent part of the scholarly record.”
- Format information
- Size of bitstream
The use of controlled vocabularies for bibliographic metadata “ensures that everyone is using the same word to mean the same thing”. The continuous revision, update and maintenance of the COAR Controlled Vocabularies and its adoption by the most commonly used open repository software is a way to enhance the interoperability across repositories and with other related systems such as harvesters, CRIS systems, data repositories and publishers.
The COAR Controlled Vocabularies are governed and maintained by an Editorial Board. In order to define the controlled vocabularies, the Editorial Board analyzes existing vocabularies and dictionaries and will adopt the most appropriate existing terms and definitions whenever possible. In cases where there are gaps identified by the community, new terms are defined by the group. The COAR Controlled Vocabulary Editorial Board also translates vocabulary terms into numerous languages.
COAR Controlled Vocabularies
The Resource Type vocabulary defines concepts to identify the genre of a resource. Such resources, like publications, research data, audio and video objects, are typically deposited in institutional and thematic repositories or published in ejournals.
This vocabulary supports a hierarchical model that relates narrower and broader concepts. Multilingual labels regard regional distinctions in language and term. Concepts of this vocabulary are mapped with terms and concepts of similar vocabularies and dictionaries.
The Version Type vocabulary defines concepts to declare the version of a resource. Multilingual labels regard regional distinctions in language and term. The concepts are adopted from the “Journal Article Versions (JAV): Recommendations of the NISO/ALPSP JAV Technical Working Group“.
The Editorial Board manages the COAR Controlled Vocabularies and comprises the following members:
- Jochen Schirrwagen, University of Bielefeld, Germany – (Co-Chair)
- Isabel Bernal, Consejo Superior de Investigaciones Cientifícas (CSIC), Spain – (Co-Chair)
- Alberto Apollaro, Ministerio de Ciencia, Tecnología e Innovación Productiva (MinCyT), Argentina
- Dom Fripp, Jisc, United Kingdom
- Gültekin Gürdal, Izmir Institute of Technology Library, Turkey
- Hilary Jones, Jisc, United Kingdom
- Ilkay Holt, COAR, United Kingdom
- Ku (Alan) Liping, The National Science Library, CAS, China
- Laurence Le Borgne, ADBS, France
- Liu Dan, Peking University Library, China
- Milan Ojsteršek, University of Maribor, Slovenia
- Nie Hua, Peking University Library, China
- Paola Azrilevich, Ministerio de Ciencia, Tecnología e Innovación Productiva (MinCyT), Argentina
- Pedro Príncipe, Universidade do Minho, Portugal
- Sawsan Habre, Lebanese American University, Lebanon
- Susanna Mornati, 4Science, Italy
- Tomoko Kataoka, JPCOAR, Japan
- Wilko Steinhoff, Data Archiving and Networked Services (DANS), Netherlands
- Yutaka Hayashi, JPCOAR, Japan
Editorial Board members served in the past:
- Imma Subirats, Food and Agriculture Organization of the United Nations, Itay
- Sandor Kopacsi, University of Vienna, Austria
- Shenghui Wang, OCLC (Online Computer Library Center), Netherlands
- Ilaria Fava, State and University Library, University of Göttingen, Germany
- Iryna Solodovnik, ood and Agriculture Organization of the United Nations, Italy
- Sophie Aubin, INRA – the French National Institute for Agricultural Research, France
- Nathalie Vedovotto, Inist-CNRS, France
COAR provides repository community with an Implementation Guide for the controlled vocabularies. It is also available on Github. This guide includes implementation of such vocabularies on different repository platforms and Open Journal Systems (OJS) as well as a list of repositories which have implemented COAR Controlled Vocabularies. If you would like to contribute to the guide for a new repository platform or add your repository to the list of use cases, please create an issue on Github or email us.
A controlled vocabulary is an organized arrangement of words and phrases used to index content and/or to retrieve content through browsing or searching. It typically includes preferred and variant terms and has a defined scope or describes a specific domain. Controlled vocabularies capture the richness of variant terms and promote consistency in preferred terms and the assignment of the same terms to similar content.
Controlled vocabularies are beneficial at the indexing process so that data providers and repositories apply the same term to refer to the same concept (e.g., person, place or thing) in a consistent way. This helps with search and discovery of content. Controlled vocabularies guide end-users to formulate their searches better as they may not know the correct term for a given concept. In fact, the most useful function of controlled vocabularies is to gather together variant terms and synonyms for concepts and link concepts in a logical order or organize them into categories. Thus, consolidating many different synonyms into one controlled term increases the number of useful hits returned by the search.
There are different types of controlled vocabularies including subject heading lists, controlled lists, synonym ring lists, authority files, taxonomies, alphanumeric classification schemes, thesauri, and ontologies.
Subject heading lists, authority files, taxonomies, alphanumeric classification schemes and ontologies.
The aim of this Controlled Vocabulary is to provide concepts that describe the genre of a digital resource
The Resource Type Controlled Vocabulary is using the SKOS standard. Each term has properties for the concept-URI, the definition of the concept, labels in multiple languages and may have relations to terms in other controlled vocabularies. Moreover concepts in this vocabulary are organized in a hierarchical way.
In order to describe the genre of a digital resource the most appropriate concept should be chosen. It is not necessary to include broader concepts as they are already logically related in the vocabulary. When referring to a concept from the controlled vocabulary the concept-URI must be included and optionally one or more labels associated with the concept.
It is up to a concrete application profile to decide if a resource can be tagged by only one or more concepts.
The proposed hierarchy is an attempt to structure all the concepts from a generic down to a granular level. It is however not without contradiction, e.g. to have ‘thesis’ under ‘text’. The vocabulary is going to be recommended in repository metadata guidelines. And it is up to those guidelines to decide to include all concepts or only a subset as long as the original concept-URLs and labels are used.