ARTICLE
TITLE

USE OF LENGTH-BASED SIMILARITY MEASURE IN CLUSTERING PROBLEMS

SUMMARY

Context. The study is devoted to the development of a flexible mathematical apparatus, which should have a sufficiently wide range ofmeans for grouping objects into different types of similarity measures. This makes it possible, within the framework of the developed approach, to efficiently solve sufficiently broad classes of applied problems from different subject areas and to partition objects with clusters of different geometric forms.Objective. The aim of the study is improvement of the efficiency of solving cluster problems by applying a similar measure of the vectorcharacteristics of objects.Method. A fuzzy binary relation and its membership function describing the similarity of objects according to the level of similarity oftheir vector attributes are described. The method of single-level clustering, based on fuzzy binary relations for the use of a similarity measure, is modified. In this case, certain values are set – the thresholds of clusterization that characterize the similarity degree of objects within the cluster. By changing the thresholds of clusterization, one can analyze the dynamics of cluster formation, investigate their structure and interrelationships between objects, determine the ultimate objects, and make a thorough analysis of the obtained results. The proposed approach does not require a preliminary determination of the number of clusters and allows clustering of data in concentric spheres in the absence of additional a priori information, so it can be used at the stage of preliminary data analysis.Results. The developed approach is implemented in the form of a software system on the basis of which the actual applied problem ofinvestigating the intensity of population migration by regions of Ukraine is solved.Conclusions. The conducted experimental researches show the convenience and efficiency of using the similarity measure for solvingapplied problems requiring clustering in the form of concentric spheres. The presented approach provides an opportunity to conduct newmeaningful studies of input data. Prospects for further research are development of a decision support system, to solve the problems ofgrouping objects into clusters by concentric spheres, cones, ellipses and their intersections; implementation of parallel multi-level clusteringcarried out simultaneously by several criteria of similarity of objects and their application; study of the partitioning of objects by differentgeometric forms of clusters for a single sample of input data and carrying out a meaningful interpretation of the obtained results

 Articles related

Purnawarman Musa, Eri Prasetyo Wibowo, Saiful Bahri Musa, Iqbal Baihaqi    

Traffic lights are generally used to regulate the control flow of traffic at an intersection from all directions, including a pelican crossing system with traffic signals for pedestrians. There are two facilities for walker crossing, namely using a pedes... see more


Dahlia Dahlia, Endah Puji Astuti, Endah Wiji Lestari, Anisa Rahmawati, Dea Rahmayani Nurendah    

Kata kunci: Sistem Informasi, Penjualan dan Pembelian Pada Koperasi Insan Mandiri. Abstract: The existing system in the Mandiri Insan Cooperative is a sales system that has problems that arise, one of which is a decrease in turnover/profits, a gradu... see more


A. Ya. Beletsky    

Context. The problem to form generalized primitive matrixes on the Galois and Fibonacci any order over the field characteristics2 for the construction by the generators gamma functions for cryptographically stable algorithms of inline data encryption, fr... see more


S. A. Subbotin    

Context. To make decisions in technical applications, it is usually necessary to have a model that allows you to predict the stateof a managed object or process. The object of the study is the process of building dependency models by use cases. The subje... see more


V. I. Freyman    

Context. The reliability indicators of information transmission between control systems elements, taking into account the packaging(grouping) of errors in the communication channel, are analyzed. The research object are the characteristics and parameters... see more