Simple Stopping Criteria for Information Theoretic Feature Selection

SUMMARY

Feature selection aims to select the smallest feature subset that yields the minimum generalization error. In the rich literature in feature selection, information theory-based approaches seek a subset of features such that the mutual information between the selected features and the class labels is maximized. Despite the simplicity of this objective, there still remain several open problems in optimization. These include, for example, the automatic determination of the optimal subset size (i.e., the number of features) or a stopping criterion if the greedy searching strategy is adopted. In this paper, we suggest two stopping criteria by just monitoring the conditional mutual information (CMI) among groups of variables. Using the recently developed multivariate matrix-based Rényi’s a-entropy functional, which can be directly estimated from data samples, we showed that the CMI among groups of variables can be easily computed without any decomposition or approximation, hence making our criteria easy to implement and seamlessly integrated into any existing information theoretic feature selection methods with a greedy search strategy.

KEYWORDS

feature selection - stopping criterion - conditional mutual information - multivariate matrix-based Rényi’s a-entropy functional

Free Access

PAGES

NUMBER

Volumen: 21 Número: 1 Parte: January (2019)

COLLECTIONS

Communication
Research

JOURNALS RELATED

JMMR (Jurnal Medicoeticolegal dan Manajemen Rumah Sakit)
South African Journal of Science
Entropy

DOI

https://doi.org/10.3390/e21010099

Articles related

Efficient Multi-Label Feature Selection Using Entropy-Based Label Selection

Jaesung Lee and Dae-Won Kim

Multi-label feature selection is designed to select a subset of features according to their importance to multiple labels. This task can be achieved by ranking the dependencies of features and selecting the features with the highest rankings. In a multi-... see more

Revista: Entropy

Open Access

A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen–Shannon Divergence

Truong Khanh Linh Dang, Cornelia Meckbach, Rebecca Tacke, Stephan Waack and Mehmet Gültas

The knowledge of protein-DNA interactions is essential to fully understand the molecular activities of life. Many research groups have developed various tools which are either structure- or sequence-based approaches to predict the DNA-binding residues in... see more

Revista: Entropy

Open Access

Information Landscape and Flux, Mutual Information Rate Decomposition and Connections to Entropy Production

Qian Zeng and Jin Wang

We explored the dynamics of two interacting information systems. We show that for the Markovian marginal systems, the driving force for information dynamics is determined by both the information landscape and information flux. While the information lands... see more

Revista: Entropy

Open Access

Assessing the Relevance of Specific Response Features in the Neural Code

Hugo Gabriel Eyherabide and Inés Samengo

The study of the neural code aims at deciphering how the nervous system maps external stimuli into neural activity—the encoding phase—and subsequently transforms such activity into adequate responses to the original stimuli—the decoding... see more

Revista: Entropy

Open Access

Information Guided Exploration of Scalar Values and Isocontours in Ensemble Datasets

Subhashis Hazarika, Ayan Biswas, Soumya Dutta and Han-Wei Shen

Uncertainty of scalar values in an ensemble dataset is often represented by the collection of their corresponding isocontours. Various techniques such as contour-boxplot, contour variability plot, glyphs and probabilistic marching-cubes have been propose... see more

Revista: Entropy

Open Access