ARTICLE
TITLE

Evaluation of data balancing techniques. Application to CAD of lung nodules using the LUNA16 framework

SUMMARY

Due to the high incidence of the lung cancer all over the world, computer-aided detection (CAD) systems play an important role in screening. Classification in CAD systems have to deal with highly imbalanced datasets composed by actual nodules and non-nodules structures. The application of data balancing techniques helps the training process of the classifiers making the generation of the classification rules more effective. The purpose of this paper is to compare the performance of different data balancing techniques applied to the classification of lung nodules. According to the reviewed literature, this is the first time that different data balancing methods are evaluated on the problem of lung nodule detection using a large data set. A web-based framework was used to evaluate the different methods applied to a classical CAD system (ETROCAD) presented in the LUNA16 Challenge. In the experiments, data balance using SMOTE and SMOTE-TL lead to the best results, with a score of 0.760 and 0.759 respectively, in comparison to 0.748 when not balancing the data. At the time of writing this paper, the SMOTE-based ETROCAD system have the best score among all the classical systems using handcrafted features in LUNA16 web sit.

 Articles related

Vengus Panhwar, Arjumand Zaidi, Asmat Ullah    

This study aims to illustrate and present various techniques for evaluating the performance of check-dams and proposes some suitable approaches for impact assessment of dams built in the Balochistan province of Pakistan. These dams were built during the ... see more


Derry Pramono Adi, Lukman Junaedi, Frismanda, Agustinus Bimo Gumelar, Andreas Agung Kristanto    

Initially, the goal of Machine Learning (ML) advancements is faster computation time and lower computation resources, while the curse of dimensionality burdens both computation time and resource. This paper describes the benefits of the Feature Selection... see more


Saman Hina, Raheela Asif, Syed Abbas Ali    

It is imperative in a medical domain that protection of information does not allow an individual to be overlooked. In medical domain, research community encourages use of real-time datasets for research purposes. These real-time datasets contain structur... see more


ROSZITA IBRAHIM, AZANA HAFIZAH MOHD AMAN, AMRIZAL MUHD NUR, SYED MOHAMED ALJUNID    

This study explored radiology procedure cost across available units in the Radiology’s Department UKMMC (University Kebangsaan Malaysia Medical Centre). In 2011, the total number of radiology procedures carried out in this department was 121,221. Neverth... see more


Gunadi W Nurcahyo    

A study of using fuzzy-based parameters for solving public bus routing problem with uncertain demand is presented. The fuzzy-based parameters are designed to provide data required by the route selection procedure. The uncertain data are represented as li... see more