Performance optimization in DNA short-read alignment


Wilton Richard1ORCID,Szalay Alexander S12


1. Department of Physics and Astronomy, Johns Hopkins University , Baltimore, MD 21218, USA

2. Department of Computer Science, Johns Hopkins University , Baltimore, MD 21218, USA


Abstract Summary Over the past decade, short-read sequence alignment has become a mature technology. Optimized algorithms, careful software engineering and high-speed hardware have contributed to greatly increased throughput and accuracy. With these improvements, many opportunities for performance optimization have emerged. In this review, we examine three general-purpose short-read alignment tools—BWA-MEM, Bowtie 2 and Arioc—with a focus on performance optimization. We analyze the performance-related behavior of the algorithms and heuristics each tool implements, with the goal of arriving at practical methods of improving processing speed and accuracy. We indicate where an aligner's default behavior may result in suboptimal performance, explore the effects of computational constraints such as end-to-end mapping and alignment scoring threshold, and discuss sources of imprecision in the computation of alignment scores and mapping quality. With this perspective, we describe an approach to tuning short-read aligner performance to meet specific data-analysis and throughput requirements while avoiding potential inaccuracies in subsequent analysis of alignment results. Finally, we illustrate how this approach avoids easily overlooked pitfalls and leads to verifiable improvements in alignment speed and accuracy. Contact Supplementary information Appendices referenced in this article are available at Bioinformatics online.


National Institutes of Health

Johns Hopkins Department of Physics and Astronomy

Lieber Institute for Brain Development

Extreme Science and Engineering Discovery Environment

UCSD Expanse and Purdue Anvil, XSEDE

National Science Foundation


Oxford University Press (OUP)


Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference36 articles.

1. Trimmomatic: a flexible trimmer for Illumina sequence data;Bolger;Bioinformatics,2014

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献







Copyright © 2019-2023 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3