Home  /  Lexikos  /  Núm: Vol7 Par: 0 (1997)  /  Article
ARTICLE
TITLE

A 38 Million Words Dutch Text Corpus and its Users

SUMMARY

The use of text corpora has increased considerably in the past few years, not only in the field of lexicography but also in computational linguistics and language technology. Consequently, corpus data and expertise developed by lexicographical institutions have gained a broader scope of application. In the European context this has led to a revised view of corpus design. In line with these developments, the Institute for Dutch Lexicology (INL) has since 1994 been providing external access to steadily improving corpora via Internet. In August 1996, the <i>38 Million Words Corpus</i> was available for consultation by the international research community. The present paper reports on the characteristics of this corpus (design, text classification, linguistic annotation) and on its use, both in dictionary projects and in linguistic research. In spite of limitations with respect to corpus design, the INL corpora accessible via Internet have proved to meet external needs. By providing these facilities, the INL has acquired a much broader experience in corpus-building than before, which is essential for new, internal dictionary projects. Giving external access to corpus data which was developed primarily for internal purposes, may be profitable for all parties involved. 

 Articles related

Yogi Setia Samsi,Iwa Lukmana,Dadang Sudana    

The novelty of this study is to explore exclamatory sentences in Sundanese as a local Indonesian language. The current study reports on descriptive analysis of interpersonal meaning through exclamatory sentences in the corpus of Sundanese. To get a speci... see more


Kim Hua Tan,Hamdi Khalis,Nur Ehsan Mohd. Said,Song Howe Ong    

War metaphors have long been used in sports news reporting. In reality, war metaphors are also used commonly in daily conversations. The wide usage of war metaphors in sports news reporting is because the two domains (i.e. war and sports) are comparable ... see more


Ramazan Simsek,Mesut Gün    

In foreign language education, the importance of corpus-based studies has been increasing. The priorities of the words taught in language education are determined within the framework of international criteria set through corpus-based studies. Corpus and... see more


Lene Schøsler    

The Danish-French and French-Danish dictionaries elaborated by Andreas Blinkenberg and Margrethe Thiele, subsequently by Blinkenberg and Poul Høybye are influential and large bilingual Danish dictionaries, the 1997 editions comprising respectively 172,00... see more

Revista: LexicoNordica

Liisa Nuutinen    

The articles in the Dictionary of Old Finnish are based on the collection of about half a million entries of words and a corpus in machine-readable form, the compilation of which was started in 1992. The corpus contains the essential documents of old Fin... see more

Revista: LexicoNordica