ARTICLE
TITLE

The Problem of Redundant Variables in Random Forests

SUMMARY

Random forests are currently one of the most preferable methods of supervised learning among practitioners. Their popularity is influenced by the possibility of applying this method without a time consuming pre-processing step. Random forests can be used for mixed types of features, irrespectively of their distributions. The method is robust to outliers, and feature selection is built into the learning algorithm. However, a decrease of classification accuracy can be observed in the presence of redundant variables. In this paper, we discuss two approaches to the problem of redundant variables. We consider two strategies of searching for best feature subset as well as two formulas of aggregating the features in the clusters. In the empirical experiment, we generate collinear predictors and include them in the real datasets. Dimensionality reduction methods usually improve the accuracy of random forests, but none of them clearly outperforms the others.

 Articles related

FIRDAUS MAHDI AL-SAFFAH, KUTAIBA YUSER AIED    

An experiment was conducted at the research station of the Department of Horticulture and Garden Engineering - College of Agriculture - University of Tikrit to study the use of organic fertilization with vermicompost and the vertical cultivation of tomat... see more


ALAEDEEN IBRAHEM ABDULAZEEZ, FANAR HASHIM AL- HASHEMI, BASSAM YAHYA IBRAHEM    

This experiment was laid out in order to study the effect of two (Freesia X hybrid) cultivars and nano fertilizer (K and Fe) on the growth of the plants, flowers, corms and cormles during the growing seasons of 2019-2020. The experiment was in a factoria... see more


Alexander Khutoretskii, Vladimir Nefedkin    

In this study, we propose to model the operation of a service concession arrangement in the economic area of municipal heat supply utilities. We offer a scheme of interaction between the concedent and concessionaire in this concessionary arrangement. Cur... see more


Edesiri Nkemnole    

AbstractThe movement of stock prices, in capital markets across the world, has been found to be both random and non-random. Basically, for a stock price to follow a random walk, its future price changes randomly based on all currently available informati... see more


Iswaya Maalik Syahrani    

Bioinformatics research currently supported by rapid growth of computation technology and algorithm. Ensemble decision tree is common method for classifying large and complex dataset such as DNA sequence. By implementing two classification methods with e... see more