Linguistic data citation in Slovene scientific publications: Analysis and recommendations

Jakob Lenardic

Tomaž Erjavec

Darja Fišer

SUMMARY

Open science is based on freely and openly available scientific publications and data. The latter enable the verification and improvement of previous research. In the context of language technologies and manually annotated language resources, they also enable training of new text processing tools. However, just like scientific publications, research data need to be properly cited, as only this makes reproducibility of experiments possible and is the most important indicator of how interesting and useful researchers' work is in the community and plays a major role in their success with research grant proposals and career trajectory. In this paper, we survey the landscape of linguistic data, mainly (mainly language corpora) citation in six leading Slovene scientific journals (Jezik in slovstvo, Slavisticna revija, Slovenšcina 2.0, Linguistica, Slovene Linguistic Studies and Jezikoslovni zapiski) and in the proceedings of two scientific conferences focused on linguistics (Jezikovne tehnologije in digitalna humanistika and Obdobja) for the period of the last seven years, i.e. from 2013 to 2019. We consider 1,074 papers and analyse the results both quantitatively and qualitatively. From the quantitative perspective, we show that, overall, only about a fourth of the papers includes the use of language resources, and that in the later period (2018–2019) the use of language resources is over twice as frequent as it is in the earlier period (2013–2017). We classify the manner of language resource citation into five categories (e.g. citing the hyperlink in the texts or citing the key paper about the resource) and show that how a resource is cited is, to a large extent, dependent on the instructions for authors of the particular publication. Our qualitative analysis focuses mainly on resources deposited in the repository of the CLARIN.SI research infrastructure, where we show that they are, with few exceptions, incorrectly cited. We summarise the finding using the so-called Austin principles, show what has already been achieved in the scope of the CLARIN.SI infrastructure and propose guidelines for citing linguistic research data and how to implement them.

KEYWORDS

Open Science - research data citation - language resources - Austin Principles - Slovenian journals and conference proceedings

Free Access

PAGES

pp. 1 - 34

NUMBER

Volumen: 8 Número: 1 Parte: 0 (2020)

COLLECTIONS

No relation

JOURNALS RELATED

Briliant: Jurnal Riset dan Konseptual
Gramatika: Jurnal Ilmiah Kebahasaan dan Kesastraan
Revista PUCE

Articles related

Grounding sloWNet on Slovene corpus data

Darja Fišer, Maciej Piasecki, Bartosz Broda

Wordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessari... see more

Revista: SlovenÅ¡Äina 2.0: empirical; applied and interdisciplinary research

Open Access

Combining available datasets for building named entity recognition models of Croatian and Slovene

Nikola Ljubešic, Marija Stupar, Tereza Juric, Željko Agic

The paper presents efforts in developing freely available models for named entity recognition and classification in Croatian and Slovene text. Our experiments focus on the most informative set of linguistic features taking into account the availability o... see more

Revista: SlovenÅ¡Äina 2.0: empirical; applied and interdisciplinary research

Open Access

Principles in linguistic theory of corpus linguistics

Darinka Verdonik

The paper discusses theoretical and methodological principles developed by the circle of linguists known as the neo-Firthians, i.e. a group of linguists that argue for a pure empirical approach to corpus data, ignoring linguistic probabilistic theories a... see more

Revista: SlovenÅ¡Äina 2.0: empirical; applied and interdisciplinary research

Open Access

Linguistic Devices Reflecting Women's Inferiority in Tohari's 'Ronggeng Dukuh Paruk'

Chusni Hadiati

Inferiority is a state in which one part is lower than another. It is deliberately found in our society which consists of female and male because language choices reflect it. Utterances produced by female and male speaker carry both superiority and infer... see more

Revista: Lensa: Kajian Kebahasaan, Kesusastraan, dan Budaya

Open Access

STRATEGI PENGEMBANGAN KECERDASAN LINGUISTIK ANAK USIA DINI DI RA AL-ISLAM JAMSAREN, SURAKARTA, JAWA TENGAH DOI : 10.29408/goldenage.v6i1.5058| Abstract Views: 156 times

Khaerani Maulida Fitri Az-Zahra

Linguistic intelligence affects an individual's ability to analyze the information received and translated it to other individuals. Individuals who do not have adequate linguistic intelligence will always fail to grasp the meaning of in the information o... see more

Revista: Jurnal Golden Age

Open Access