Multilingualism is critical for a healthy, inclusive, and diverse research communications landscape. After decades of policies that have directed researchers to publish in English, we are starting to see a reversal of this trend. The UNESCO Recommendation on Open Science, for example, calls on member states to encourage “multilingualism in the practice of science, in scientific publications and in academic communications”. In China, Europe, and other jurisdictions, policy makers are introducing new measures that encourage researchers to publish in local languages.
In August 2022, COAR launched the COAR Task Force on Supporting Multilingualism and non-English Content in Repositories to develop and promote good practices for repositories in managing multilingual and non-English content. Based on 17 use cases contributed from different stakeholders communities (repository managers and users, authors and translators, aggregators and discovery systems), the Task Force identified three relevant areas for their work:
- Enhancing discoverability of non-English content
- Curating multilingual content in a repository
- Supporting translations
In June 2023, the Task Force published an initial set of draft recommendations for community review. The consultation resulted in a wide range of input, which was reviewed by the Task Force and incorporated into the recommendations document. This document presents the updated recommendations based on this community input. The recommendations identify good practices for repository managers and repository software developers, and focus on the topics of metadata, multilingual keywords, user interfaces, formats, and licences that will improve the visibility, discovery and reuse of repository content in a variety of languages.
Summary of Recommendations
Creating and Curating Metadata
- 1
Declare the language of the resource at the item level
- 2
Declare the language of the metadata (e.g. xml:lang attribute)
- 3
Use standard (two-letter or three-letter) language codes (ISO 639)
- 4
Enable UTF-8 support in your repository and use the original alphabet / the writing system whenever possible. If it is necessary to transliterate metadata, use recognized standards (e.g. ISO)
- 5
If the repository software supports multiple interface languages, set up the user interface in the native language(s) of the target group, along with that in English
- 6
Write personal name/s using the writing system used in the deposited document and provide a persistent identifier enabling unambiguous identification, such as ORCID
- 7
Include keywords in many languages, use multilingual vocabularies and thesauri if possible
- 8
Recommendations for repository managers on translated content
Repository Software / Platform Developers
- 1
Ensure that language codes can consistently be used across the repository collections
- 2Expose the language of metadata via metadata exchange protocol, e.g. OAI-PMH, GraphQL API, etc.
- 3Improve support for ISO language codes, e.g. three-letter codes needed for some languages.
- 4Ensure that persistent identifiers are exposed via OAI-PMH. PIDs in Dublin Core™ Working Group has developed recommendations to make it possible to expose persistent identifiers including ORCID, via OAI-PMH.
- 5Provide support for multilingual keywords to increase the discoverability of multilingual repository content. For example, enable a real-time integration of Wikidata – e.g. when a user starts typing in the appropriate metadata field, relevant Wikidata terms appear in a drop-down list for the user to select
- 6Enable automatic assignment of controlled terms based on the existing metadata