Multilingualism is a critical characteristic of a healthy, inclusive, and diverse research communications landscape. Publishing in a local language ensures that the public in different countries has access to the research they fund, and also levels the playing field for researchers who speak different languages. The Helsinki Initiative on Multilingualism in Scholarly Communication asserts that the disqualification of local or national languages in academic publishing is the most important – and often forgotten – factor that prevents societies from using and taking advantage of the research done where they live.

Multilingualism presents a particular challenge for the discovery of research outputs. Although researchers and other information seekers may only be able to read in one or two languages, they want to know about all the relevant research in their area, regardless of the language in which it is published. Yet, discovery systems such as Google Scholar and other scholarly indexes tend to provide access only to the content available in the language of the user. In addition, the language of a scholarly resource is often not labelled appropriately, meaning a large portion of non-English resources are excluded from search results. Furthermore, many scholarly communications infrastructures are sub-optimal in their support for a variety of languages since little attention was paid to this issue during their design process.

Jagadish Aryal, Social Science Baha, Nepal
Aysen Binen, Izmir Institute of Technology İYTE, Türkiye
Andreas Czerniak, Bielefeld University – Library, Germany
Claudia Córdova Yamauchi, CONCYTEC, Peru
Christophe Dony, ULiège Library,Belgium
Joe Cera, Berkeley Law Library, University of California, USA
Sebastiano Giorgi-Scalari, Open University of Catalonia, Spain
Gussun Gnes, Marmara University Libraries, Türkiye
Gultekin Gurdal, Izmir Institute of Technology İYTE, Türkiye
Johanna Havemann, AfricArXiv, Germany
Nie Hua, Peking University, China
Libio Huaroto Pajuelo, Universidad Peruana de Ciencias Aplicadas, Peru
Alan Ku (Gu Liping), National Science Library, Chinese Academy of Sciences, China
Iryna Kuchma, EIFL (chair), Lithuania
Pierre Lasou, Bibliothèque de l’Université Laval, Canada
Norma Aída Manzanera Silva, Centro de Investigaciones sobre América del Norte, Universidad Nacional Autónoma de México
Lautaro Matas, LA Referencia, Spain/Latin America
Ayako Mikami, Hokkaido University, Japan
Tomoki Nagase, National Institute of Informatics, Japan
Andrea Mora Campos, University of Costa Rica, Costa Rica
Tomasz Neugebauer, Concordia University, Canada
Jean-Francois Nomine, INIST, France
Milica Sevkusic, ITS SASA, Serbia
Kathleen Shearer, COAR, Canada
Freddy Sumba, CEDIA, Ecuador
Ben Trettel, Translate Science

In August 2022, COAR launched the COAR Task Force on Supporting Multilingualism and non-English Content in Repositories to develop and promote good practices for repositories in managing multilingual and non-English content.  The task force is focusing on identifying good practices for metadata, multilingual keywords, user interfaces, translations, formats, licenses, and indexing that will improve the visibility of multilingual and non-English content across the world.Some of the use cases that are driving the recommended practices are as follows.

  • I want to find all the articles that are relevant to my interest, regardless of the language in which they are published
  • I would like to know whether a translation of an article exists or whether this document is a translation of another document
  • I want to know how best to label articles, thesis or dissertations that are written in more than one language so readers are aware of the various languages
  • I want to offer metadata in both my local language and in English so the content is part of the international scholarly record and visible to everyone
  • I would like to expose the language of the item in OAI-PMH
  • I want to know what is the language of the full text document I am indexing, so I can assist users in finding content in their preferred language

On November 1, 2022, the COAR Task Force published its initial recommendation towards improving the discovery of repository content in a variety of languages, along with implementation guidance for the repository community.

Further recommendations that address the different use cases will be released in the coming months. Please stay tuned!

Is there a case for accepting machine translated scholarly content in repositories?

May 8th, 2023|0 Comments

May 8, 2023                                                                                                                                                                    (Photo by Romain Vignes on Unsplash) Christophe Dony, Iryna Kuchma, Tomasz Neugebauer, Jean-François Nomine, Milica Ševkušić, and Kathleen Shearer Multilingualism is a critical characteristic of a healthy, inclusive, and diverse research …