ARTICLE
TITLE

Development of Computational Pipeline Software for Genome/Exome Analysis on the K Computer

SUMMARY

Pipeline software that comprise tool and application chains for specific data processing have found extensive utilization in the analysis of several data types, such as genome, in bioinformatics research. Recent trends in genome analysis require use of pipeline software for optimum utilization of computational resources, thereby facilitating efficient handling of large-scale biological data accumulated on a daily basis. However, use of pipeline software in bioinformatics tends to be problematic owing to their large memory and storage capacity requirements, increasing number of job submissions, and a wide range of software dependencies. This paper presents a massive parallel genome/exome analysis pipeline software that addresses these difficulties. Additionally, it can be executed on a large number of K computer nodes. The proposed pipeline incorporates workflow management functionality that performs effectively when considering the task-dependency graph of internal executions via extension of the dynamic task distribution framework. Performance results pertaining to the core pipeline functionality, obtained via evaluation experiments performed using an actual exome dataset, demonstrate good scalability when using over a thousand nodes. Additionally, this study proposes several approaches to resolve performance bottlenecks of a pipeline by considering the domain knowledge pertaining to internal pipeline executions as a major challenge facing pipeline parallelization. 

 Articles related

Elma Zanaj,Deivis Disha,Susanna Spinsante,Ennio Gambi    

The fall problem affects approximately one third of people aged over 65 years. Falls and fall-related injuries are one of the major causes of morbidity and mortality in the elderly population. Since many years, research activities have been targeted towa... see more


Julian Martin Kunkel,Michael Kuhn,Thomas Ludwig    

The computational power and storage capability of supercomputers are growing at a different pace, with storage lagging behind; the widening gap necessitates new approaches to keep the investment and running costs for storage systems at bay. In this paper... see more


Jack Dongarra,Azzam Haidar,Jakub Kurzak,Piotr Luszczek,Stanimire Tomov,Asim YarKhan    

Hardware heterogeneity of the HPC platforms is no longer considered unusual but instead have become the most viable way forward towards Exascale.  In fact, the multitude of the heterogeneous resources available to modern computers are designed for d... see more


Sofya V. Lushchekina,Galina F. Makhaeva,Dana A. Novichkova,Irina V. Zueva,Nadezhda V. Kovaleva,Rudy R. Richardson    

Molecular docking is one of the most popular tools of molecular modeling. However, in certain cases, like development of inhibitors of cholinesterases as therapeutic agents for Alzheimer's disease, there are many aspects, which should be taken into accou... see more


Maxim K. Sakharov,Anatoly P. Karpenko    

The paper presents an adaptive load balancing method for the modified parallel Mind Evolutionary Computation (MEC) algorithm. The proposed method takes into account an objective function's topology utilizing the information obtained during the lands... see more