ARTICLE
TITLE

Frequent Patterns Algorithm of Biological Sequences based on Pattern Prefix-tree

SUMMARY

In the application of bioinformatics, the existing algorithms cannot be directly and efficiently implement sequence pattern mining. Two fast and efficient biological sequence pattern mining algorithms for biological single sequence and multiple sequences are proposed in this paper. The concept of the basic pattern is proposed, and on the basis of mining frequent basic patterns, the frequent pattern is excavated by constructing prefix trees for frequent basic patterns. The proposed algorithms implement rapid mining of frequent patterns of biological sequences based on pattern prefix trees. In experiment the family sequence data in the pfam protein database is used to verify the performance of the proposed algorithm. The prediction results confirm that the proposed algorithms can’t only obtain the mining results with effective biological significance, but also improve the running time efficiency of the biological sequence pattern mining.

 Articles related

(1) Filippo Galli (TeCIP Institute, Scuola Superiore Sant’Anna, via Moruzzi 1, Italy) (2) Marco Vannucci (TeCIP Institute, Scuola Superiore Sant’Anna, via Moruzzi 1, Italy) (3) Valentina Colla (TeCIP Institute, Scuola Superiore Sant’Anna, via Moruzzi 1, Italy)    

Classification of imbalanced datasets is a critical problem in numerous contexts. In these applications, standard methods are not able to satisfactorily detect rare patterns due to multiple factors that bias the classifiers toward the frequent class. Thi... see more


Dewi Sartika, Dana Indra Sensuse    

Data mining is a process of analysis of the large data set in the database so that the information obtained will be used for the next stage. One technique commonly used data mining is the technique of classification. Classification is an engineering mode... see more


Hamideh Iraj,Babak Sohrabi    

The use of data-driven decision making and data scientists is on the rise in Iran as companies have rapidly been focusing on gathering data and analyzing it to guide corporate decisions. In order to facilitate the process and understand the nature and ch... see more


Daniela Elena Popescu,Madalina Lonea,Doina Zmaranda,Codruta Vancea,Cristian Tiurbe    

Based on the available information (eg.multiple functional faults or sensor errors give rise to similar alarm patterns or outcomes), some states in the behaviour of a network can not be distinguished from one another. So, the computer network’s fault tre... see more


J. A. Vazquez-Lopez,I. Lopez-Juarez,M. Peña-Cabrera    

Time-series statistical pattern recognition is of prime importance in statistics, especially in quality control techniques for manufacturing processes. A frequent problem in this application is the complexity when trying to determine the behaviour (patte... see more