Natural Language Processing Using Neighbour Entropy-based Segmentation

Jianfeng Qiao

Xingzhi Yan

Shuran Lv

SUMMARY

In natural language processing (NLP) of Chinese hazard text collected in the process of hazard identification, Chinese word segmentation (CWS) is the first step to extracting meaningful information from such semi-structured Chinese texts. This paper proposes a new neighbor entropy-based segmentation (NES) model for CWS. The model considers the segmentation benefits of neighbor entropies, adopting the concept of "neighbor" in optimization research. It is defined by the benefit ratio of text segmentation, including benefits and losses of combining the segmentation unit with more information than other popular statistical models. In the experiments performed, together with the maximum-based segmentation algorithm, the NES model achieves a 99.3% precision, 98.7% recall, and 99.0% f-measure for text segmentation; these performances are higher than those of existing tools based on other seven popular statistical models. Results show that the NES model is a valid CWS, especially for text segmentation requirements necessitating longer-sized characters. The text corpus used comes from the Beijing Municipal Administration of Work Safety, which was recorded in thefourth quarter of 2018.To cite this article: J. Qiao, X. Yan, and S. Lv, “Natural Language Processing Using Neighbour Entropy-based Segmentation” in CIT. Journal of Computing and Information Technology, vol. 29, no. 2, pp. 113–131, 2021, doi: 10.20532/cit.2021.1005393

KEYWORDS

Text Mining - Text Segmentation - Chinese Word Segmentation - Safety Management - Hazard Analysis

Free Access

PAGES

pp. 113 - 131

NUMBER

Volumen: 29 Número: 2 Parte: 0 (2021)

COLLECTIONS

Computing
Computing

JOURNALS RELATED

ScientiCO : Computer Science and Informatics Journal
Techno.Com
JURNAL TEKNIK INFORMATIKA DAN SISTEM INFORMASI

Articles related

Using Customer Emotional Experience from E-Commerce for Generating Natural Language Evaluation and Advice Reports on Game Products

Hamdan Gani,Kiyoshi Tomimatsu

Investigating customer emotional experience using natural language processing (NLP) is an example of a way to obtain product insight. However, it relies on interpreting and representing the results understandably. Currently, the results of NLP are presen... see more

Revista: Journal of ICT Research and Applications

Open Access

Natural Language based On-demand Service Composition

Florin-Claudiu Pop,Marcel Cremene,Mircea Vaida,Michel Riveill,Jean-Yves Tigli,Stéphane Lavirotte

The widespread of Web services in the ubiquitous computing era and the impossibility to predict a priori all possible user needs generates the necessity for on-demand service composition. Natural language is one of the the easiest ways for a user to expr... see more

Revista: International Journal of Computers Communications & Control

Open Access

Exploring natural language understanding in robotic interfaces

(1) Ioannis Giachos (“Technoglossia” Postgraduate Computational Linguistics Programme, Greece) (2) Evangelos C. Papakitsos (School of Pedagogical and Technological Education, Greece) (3) Georgios Chorozoglou (1st Centre of Informatics and Novel Technologies of Athens)

Natural Language Understanding is a major aspect of the intelligence of robotic systems. A main goal of improving their artificial intelligence is to allow a robot to ask questions, whenever the given instructions are not complete, and also by using impl... see more

Revista: IJAIN (International Journal of Advances in Intelligent Informatics)

Open Access

From Imitation to Prediction, Data Compression vs Recurrent Neural Networks for Natural Language Processing

Juan Andres Laura, Gabriel Omar Masi, Luis Argerich

In recent studies Recurrent Neural Networks were used for generative processes and their surprising performance can be explained by their ability to create good predictions. In addition, Data Compression is also based on prediction. What the problem come... see more

Revista: Inteligencia Artificial

Open Access

Natural Model based Design in Context: an Effective Method for Environmental Problems

Eric D. Kameni,Theo P. van der Weide,Wouter T. de Groot

Analyzing complex problem domains is not easy. Simulation tools support decision makers to find the best policies. Model-based system development is an approach where a model of the application domain is the central driving force when designing simulatio... see more

Revista: Complex Systems Informatics and Modeling Quarterly

Open Access