ARTICLE
TITLE

Linguistic data citation in Slovene scientific publications: Analysis and recommendations

SUMMARY

Open science is based on freely and openly available scientific publications and data. The latter enable the verification and improvement of previous research. In the context of language technologies and manually annotated language resources, they also enable training of new text processing tools. However, just like scientific publications, research data need to be properly cited, as only this makes reproducibility of experiments possible and is the most important indicator of how interesting and useful researchers' work is in the community and plays a major role in their success with research grant proposals and career trajectory. In this paper, we survey the landscape of linguistic data, mainly (mainly language corpora) citation in six leading Slovene scientific journals (Jezik in slovstvo, Slavisticna revija, Slovenšcina 2.0, Linguistica, Slovene Linguistic Studies and Jezikoslovni zapiski) and in the proceedings of two scientific conferences focused on linguistics (Jezikovne tehnologije in digitalna humanistika and Obdobja) for the period of the last seven years, i.e. from 2013 to 2019. We consider 1,074 papers and analyse the results both quantitatively and qualitatively. From the quantitative perspective, we show that, overall, only about a fourth of the papers includes the use of language resources, and that in the later period (2018–2019) the use of language resources is over twice as frequent as it is in the earlier period (2013–2017). We classify the manner of language resource citation into five categories (e.g. citing the hyperlink in the texts or citing the key paper about the resource) and show that how a resource is cited is, to a large extent, dependent on the instructions for authors of the particular publication. Our qualitative analysis focuses mainly on resources deposited in the repository of the CLARIN.SI research infrastructure, where we show that they are, with few exceptions, incorrectly cited. We summarise the finding using the so-called Austin principles, show what has already been achieved in the scope of the CLARIN.SI infrastructure and propose guidelines for citing linguistic research data and how to implement them.

 Articles related

Darja Fišer, Maciej Piasecki, Bartosz Broda    

Wordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessari... see more


Nikola Ljubešic, Marija Stupar, Tereza Juric, Željko Agic    

The paper presents efforts in developing freely available models for named entity recognition and classification in Croatian and Slovene text. Our experiments focus on the most informative set of linguistic features taking into account the availability o... see more


Darinka Verdonik    

The paper discusses theoretical and methodological principles developed by the circle of linguists known as the neo-Firthians, i.e. a group of linguists that argue for a pure empirical approach to corpus data, ignoring linguistic probabilistic theories a... see more


Chusni Hadiati    

Inferiority is a state in which one part is lower than another. It is deliberately found in our society which consists of female and male because language choices reflect it. Utterances produced by female and male speaker carry both superiority and infer... see more


Khaerani Maulida Fitri Az-Zahra    

Linguistic intelligence affects an individual's ability to analyze the information received and translated it to other individuals. Individuals who do not have adequate linguistic intelligence will always fail to grasp the meaning of in the information o... see more