Multilingualism is a critical characteristic of a healthy, inclusive, and diverse research communications landscape. The Helsinki Initiative on Multilingualism in Scholarly Communication asserts that the disqualification of local or national languages in academic publishing is the most important – and often forgotten – factor that prevents societies from using and taking advantage of the research done where they live.
While the dominant position of a lingua franca – English – is useful for the widespread dissemination of ideas across the world, it also impedes the use of research results at the local level. And after decades of policies that have directed researchers to publish in English, we are beginning to see a reversal of this trend. The UNESCO Recommendation on Open Science, for example, calls on member states to encourage “multilingualism in the practice of science, in scientific publications and in academic communications”. In China, Europe, and other jurisdictions, policy makers are introducing new measures that encourage researchers to publish in local languages.
In August 2022, COAR launched the COAR Task Force on Supporting Multilingualism and non-English Content in Repositories to develop and promote good practices for repositories in managing multilingual and non-English content. The task force is focusing on identifying good practices for metadata, multilingual keywords, user interfaces, translations, formats, licenses, and indexing that will improve the visibility of multilingual and non-English content across the world.
The COAR Task Force is pleased to announce its initial recommendation towards improving the discovery of repository content in a variety of languages.
All records in the repository should include a tag in the language metadata field that identifies the language of the resource, and a tag that identifies the language of the metadata (even if the resources are in English).
Why? This is a very simple, but extremely powerful recommendation. When the language of the metadata and the language of the resource are correctly attributed, this allows discovery and indexing services to properly process and parse the text. Indexing involves text analysis practices such as stemming, lemmatization (grouping together the inflected forms of a word so they can be analysed as a single item), and the appropriate treatment of stop-words, all of which are language specific. Including the language tag enables information seekers, aggregators, and other discovery services to correctly identify the language of the metadata and full text and treat items accordingly.