ARTICLE
TITLE

Domain-aware Evaluation of Named Entity Recognition Systems for Croatian

SUMMARY

We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tagset – denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the-art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an F1-score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.

 Articles related

Mohammed N. A. Ali, Guanzheng Tan and Aamir Hussain    

Recurrent neural network (RNN) has achieved remarkable success in sequence labeling tasks with memory requirement. RNN can remember previous information of a sequence and can thus be used to solve natural language processing (NLP) tasks. Named entity rec... see more

Revista: Future Internet

Abdul Munif,Nurul Fajrin Ariyani,Khairunnisa’ Rahma Mardiyani    

Publikasi Online ITS (POMITS) adalah jurnal yang diperuntukkan sebagai jurnal publikasi bagi mahasiswa program sarjana ITS. Artikel yang terbit di dalamnya sudah cukup banyak dan seringkali diperlukan sebagai bahan referensi untuk penelitian mahasiswa la... see more


Novi Kanadia, Siti Mariyah    

BPS-Statistics Indonesia, as an official data producer, puts data quality as a top priority. Public acceptance and trust in data reflect data reliability which is one of the data quality indicators. The existing survey that collects user acceptance, trus... see more


Puspita Rama Nopiana, Universitas Putera Batam, Indonesia    

The number of entrepreneurs in question is SMEs (Small and Holding Enterprises), where its activities only focus on business operations alone tampa pay attention to financial performance for the long term. This causes the difficulty of SMEs to last long ... see more

Revista: RABIT

Siti Mariyah    

The title can help the reader to get the universal point of view of the article as the initial understanding before reading the content as a whole. On technical research papers, the title states essential information. In this study, we aim to develop inf... see more