Home  /  Entropy  /  Vol: 18 Núm: 10 Par: October (2016)  /  Article
ARTICLE
TITLE

Entropy Rate Estimates for Natural Language—A New Extrapolation of Compressed Large-Scale Corpora

SUMMARY

One of the fundamental questions about human language is whether its entropy rate is positive. The entropy rate measures the average amount of information communicated per unit time. The question about the entropy of language dates back to experiments by Shannon in 1951, but in 1990 Hilberg raised doubt regarding a correct interpretation of these experiments. This article provides an in-depth empirical analysis, using 20 corpora of up to 7.8 gigabytes across six languages (English, French, Russian, Korean, Chinese, and Japanese), to conclude that the entropy rate is positive. To obtain the estimates for data length tending to infinity, we use an extrapolation function given by an ansatz. Whereas some ansatzes were proposed previously, here we use a new stretched exponential extrapolation function that has a smaller error of fit. Thus, we conclude that the entropy rates of human languages are positive but approximately 20% smaller than without extrapolation. Although the entropy rate estimates depend on the script kind, the exponent of the ansatz function turns out to be constant across different languages and governs the complexity of natural language in general. In other words, in spite of typological differences, all languages seem equally hard to learn, which partly confirms Hilberg’s hypothesis.

 Articles related

Bingfeng Wang, Xianrui Yao, Chu Wang, Xiaoyong Zhang and Xiaoxia Huang    

The equiatomic NiCrFeCoMn high-entropy alloy prepared by arc melting has a single crystallographic structure. Mechanical properties and microstructure of the NiCrFeCoMn high-entropy alloy deformed at high strain rates (900 s−1 to 4600 s−1) we... see more

Revista: Entropy

Renaldas Urniezius, Vytautas Galvanauskas, Arnas Survyla, Rimvydas Simutis and Donatas Levisauskas    

For historic reasons, industrial knowledge of reproducibility and restrictions imposed by regulations, open-loop feeding control approaches dominate in industrial fed-batch cultivation processes. In this study, a generic gray box biomass modeling procedu... see more

Revista: Entropy

Tatsuaki Tsuruyama    

Cell signal transduction is a non-equilibrium process characterized by the reaction cascade. This study aims to quantify and compare signal transduction cascades using a model of signal transduction. The signal duration was found to be linked to step-by-... see more

Revista: Entropy

Lina Zhao, Shoushui Wei, Hong Tang and Chengyu Liu    

Simultaneously analyzing multivariate time series provides an insight into underlying interaction mechanisms of cardiovascular system and has recently become an increasing focus of interest. In this study, we proposed a new multivariate entropy measure, ... see more

Revista: Entropy

Guoqiang Xu, Haochun Zhang, Xiu Zhang and Yan Jin    

Active control of heat flux can be realized with transformation optics (TO) thermal metamaterials. Recently, a new class of metamaterial tunable cells has been proposed, aiming to significantly reduce the difficulty of fabrication and to flexibly switch ... see more

Revista: Entropy