ARTICLE
TITLE

SAMPLE FORMATION AND REDUCTION FOR DATA MINING

SUMMARY

In data mining problem solving it has to operate with a large amount of data samples. This entails a significant amount of time to process the data. Therefore, an urgent task is to reduce the dimensionality of the data samples. The aim of paper is to provide a method for the formation and reduction of samples, allowing to handle a large amount of the original sample. The problem of sample formation and reduction for data mining was solved. The scientific novelty of the work lies in the fact that the method of sample formation and reduction is firstly proposed. It provides a saving of the most important topological properties of original sample in the formed sub-sample without the need for downloading the original sample to the computer memory, and without numerous passages of the original sample. It allows to reduce the size of the sample and to reduce the resource requirements of a computer. The practical significance of the work lies in the development of software, which implements the proposed method of sample formation and reduction, also as conducting of experiments on research of proposed method to solve practical problems, the results of which allows to recommend the developed method for use in practice in solving problems of data mining. Using the proposed method one can significantly reduce the amount of a sample (in 7,7–12,5 times), without the need to download the original sample into computer memory, providing preservation in the generated sub-sample the most important for analysis of the topological properties of the original sample.

 Articles related

S. ?. Subbotin    

Context. To reduce the data dimensionality in the diagnostic and recognition model construction, it becomes necessary to select the most informative instances, as well as to select the most informative features. The time spent on the separate implementat... see more


V. Vysotska    

Context. Authorization of the authorship of the text is a technique for determining the author of the text, when it is ambiguous who wrote it. It is useful when several people claim to be the authors of one publication or in cases where nobody claims to ... see more


P. S. Nosov,A. P. Ben,A. F. Safonova,I. V. Palamarchuk    

Context. The problem of identifying the manifestation of the human factor in the context of utility in maritime transport duringemergency situations is considered. The aim of the study is to increase safety in maritime transport by identifying positive a... see more


S. A. Subbotin    

The problem of mathematical support development is solved to automate the sampling at diagnostic and recognizing model building by precedents. The object of study is the process of diagnostic and recognizing neural network model building by pre... see more


D. M. Piza,D. S. Semenov,G. V. Moroz    

Context. The spatially-distributed nature of the passive component destroys the spatial correlation of point sources of active interference under the influence of combined interference. This leads to a significant degradation in the suppression coefficie... see more