Metadata and Vocabularies

Metadata and Vocabularies2021-10-22T14:55:59+01:00

Project Description

The potential to create a unified body of scholarly materials relies on interoperability – specifically, that repositories follow consistent guidelines, protocols, and standards that allow them to communicate with each other and with other systems in order to transfer information, metadata, and digital objects. COAR is working with the repository community to improve the quality and comprehensiveness of metadata in repositories.

Metadata

Below are the metadata requirements for a number of use stories developed by the COAR Metadata Working Group. In addition, the Working Group will be developing and publishing a number of strategies – such as shared curation models and automated metadata enhancement – that can be used to help repositories improve their metadata quality and comprehensiveness


User stories and related metadata requirements

User Story: “As a researcher, I want my publications and other research outputs to be available to users via Google and Google Scholar.

Metadata Requirements

  • Title
  • Author name / ID
  • Publication date
  • Permanent URL (e.g. DOI or Handle)
  • Abstract
  • Domain subject headings or key words

User Story: “As a teacher, I want to be able to find permissions, or licensing information about an article so I can determine if I can reuse it for my course.”

Metadata Requirements

User Story: “As a researcher, I want to be able to reuse a data set found in a repository.”

Metadata Requirements

  • format
  • link to article
  • link to related protocol / software from research study
  • reuse license
  • domain metadata requirements

User Story: “As a research funder, I want to monitor compliance with our open access policy.”

Metadata Requirements

  • Author name / ID
  • Publication Date
  • Funder name / ID
  • Open access status
  • Article version
  • Journal name and publisher name
  • DOI

User Story: “As a research institution, I want to document and track the publications of our affiliated researchers.”

Metadata Requirements

  • Author name / ID
  • Institution name / ID

User Story: “As a repository manager, I want to ensure my repository content is included in national research assessment activities.”

Metadata Requirements

  • Title
  • Author name / ID
  • Date
  • Institution name / ID
  • Funder name / ID
  • DOI

User Story: “As a researcher, I want the metadata for my article to be visible and findable in other indexing services.”

Metadata Requirements

  • OAI-PMH
  • PIDs: ORCID, DOIs, etc.
  • Standard vocabularies (e.g COAR)
  • Mapping to meta-schemas (e.g. schema.org)

User Story: “As a repository manager with content in non-English language, I want my records to be available to local users in their local language as well as through international indexes and discovery services.”

Metadata Requirements

These requirements are pending as COAR examines best practices recommendations for managing and exposing content for non-English languages.

User Story: “As a scholar, I want my research outputs to be available over the long term and remain as a permanent part of the scholarly record.”

Metadata Requirements

  • Format information
  • Size of bitstream

Controlled Vocabularies

The use of controlled vocabularies for bibliographic metadata “ensures that everyone is using the same word to mean the same thing”. The continuous revision, update and maintenance of the COAR Controlled Vocabularies and its adoption by the most commonly used open repository software is a way to enhance the interoperability across repositories and with other related systems such as harvesters, CRIS systems, data repositories and publishers.

The COAR Controlled Vocabularies are governed and maintained by an Editorial Board. In order to define the controlled vocabularies, the Editorial Board analyzes existing vocabularies and dictionaries and will adopt the most appropriate existing terms and definitions whenever possible. In cases where there are gaps identified by the community, new terms are defined by the group. The COAR Controlled Vocabulary Editorial Board also translates vocabulary terms into numerous languages.

COAR Controlled Vocabularies

The Resource Type vocabulary defines concepts to identify the genre of a resource. Such resources, like publications, research data, audio and video objects, are typically deposited in institutional and thematic repositories or published in ejournals.

This vocabulary supports a hierarchical model that relates narrower and broader concepts. Multilingual labels regard regional distinctions in language and term. Concepts of this vocabulary are mapped with terms and concepts of similar vocabularies and dictionaries.

The Access Rights vocabulary defines concepts to declare the access status of a resource. Multilingual labels regard regional distinctions in language and term. The Access Rights vocabulary builds on access rights defined in info:eu-repo/semantics.

The Version Type vocabulary defines concepts to declare the version of a resource. Multilingual labels regard regional distinctions in language and term. The concepts are adopted from the “Journal Article Versions (JAV): Recommendations of the NISO/ALPSP JAV Technical Working Group“.

Resources

The Editorial Board manages the COAR Controlled Vocabularies and comprises the following members:

  • Jochen Schirrwagen, University of Bielefeld, Germany – (Co-Chair)
  • Isabel Bernal, Consejo Superior de Investigaciones Cientifícas (CSIC), Spain – (Co-Chair)
  • Alberto Apollaro, Ministerio de Ciencia, Tecnología e Innovación Productiva (MinCyT), Argentina
  • Brigit Nonó, Universitat de Girona, Spain
  • Cristina Azorín, Universitat Autònoma de Barcelona, Spain
  • Dom Fripp, Jisc, United Kingdom
  • Gültekin Gürdal, Izmir Institute of Technology Library, Turkey
  • Hilary Jones, Jisc, United Kingdom
  • Ilkay Holt, COAR, United Kingdom
  • Irina Razumova, NEICON, The Russian Federation
  • Juha Hakala, The National Library of Finland, Finland
  • Ku (Alan) Liping, The National Science Library, CAS, China
  • Laurence Le Borgne, ADBS, France
  • Liu Dan, Peking University Library, China
  • Marina Losada, Universitat Pompeu Fabra, Barcelona, Spain
  • Milan Ojsteršek, University of Maribor, Slovenia
  • Nie Hua, Peking University Library, China
  • Paola Azrilevich, Ministerio de Ciencia, Tecnología e Innovación Productiva (MinCyT), Argentina
  • Pedro Príncipe, Universidade do Minho, Portugal
  • Sawsan Habre, Lebanese American University, Lebanon
  • Susanna Mornati, 4Science, Italy
  • Tomoko Kataoka, JPCOAR, Japan
  • Wilko Steinhoff, Data Archiving and Networked Services (DANS), Netherlands
  • Yutaka Hayashi, JPCOAR, Japan
Editorial Board members served in the past:
  • Imma Subirats, Food and Agriculture Organization of the United Nations, Itay
  • Sandor Kopacsi, University of Vienna, Austria
  • Shenghui Wang, OCLC (Online Computer Library Center), Netherlands
  • Ilaria Fava, State and University Library, University of Göttingen, Germany
  • Iryna Solodovnik, ood and Agriculture Organization of the United Nations, Italy
  • Sophie Aubin, INRA – the French National Institute for Agricultural Research, France
  • Nathalie Vedovotto, Inist-CNRS, France

COAR provides repository community with an Implementation Guide for the controlled vocabularies.  This guide includes implementation of such vocabularies on different repository platforms and Open Journal Systems (OJS) as well as a list of repositories which have implemented COAR Controlled Vocabularies.

FAQs

What is a controlled vocabulary?2019-10-21T17:22:52+01:00

A controlled vocabulary is an organized arrangement of words and phrases used to index content and/or to retrieve content through browsing or searching. It typically includes preferred and variant terms and has a defined scope or describes a specific domain. Controlled vocabularies capture the richness of variant terms and promote consistency in preferred terms and the assignment of the same terms to similar content.

What is the benefit of controlled vocabularies?2019-10-21T17:23:05+01:00

Controlled vocabularies are beneficial at the indexing process so that data providers and repositories apply the same term to refer to the same concept (e.g., person, place or thing) in a consistent way. This helps with search and discovery of content. Controlled vocabularies guide end-users to formulate their searches better as they may not know the correct term for a given concept. In fact, the most useful function of controlled vocabularies is to gather together variant terms and synonyms for concepts and link concepts in a logical order or organize them into categories. Thus, consolidating many different synonyms into one controlled term increases the number of useful hits returned by the search.

What types of controlled vocabularies do exist?2019-10-21T17:23:22+01:00

There are different types of controlled vocabularies including subject heading lists, controlled lists, synonym ring lists, authority files, taxonomies, alphanumeric classification schemes, thesauri, and ontologies.

Which “controlled vocabularies” are the most relevant for repositories?2019-10-21T17:23:33+01:00

Subject heading lists, authority files, taxonomies, alphanumeric classification schemes and ontologies.

What does the resource type vocabulary describe ?2019-10-21T17:23:50+01:00

The aim of this Controlled Vocabulary is to provide concepts that describe the genre of a digital resource

How can resource type controlled vocabulary be implemented?2019-10-21T17:25:04+01:00

The Resource Type Controlled Vocabulary is using the SKOS standard. Each term has properties for the concept-URI, the definition of the concept, labels in multiple languages and may have relations to terms in other controlled vocabularies. Moreover concepts in this vocabulary are organized in a hierarchical way.

How can controlled vocabulary be used in metadata record?2019-10-21T17:24:42+01:00

In order to describe the genre of a digital resource the most appropriate concept should be chosen. It is not necessary to include broader concepts as they are already logically related in the vocabulary. When referring to a concept from the controlled vocabulary the concept-URI must be included and optionally one or more labels associated with the concept.

Can I tag a resource with two concepts?2019-10-21T17:25:21+01:00

It is up to a concrete application profile to decide if a resource can be tagged by only one or more concepts.

Why does the resource type vocabulary have a complexity in terms of hierarchy?2019-10-21T17:25:29+01:00

The proposed hierarchy is an attempt to structure all the concepts from a generic down to a granular level. It is however not without contradiction, e.g. to have ‘thesis’ under ‘text’. The vocabulary is going to be recommended in repository metadata guidelines. And it is up to those guidelines to decide to include all concepts or only a subset as long as the original concept-URLs and labels are used.