Co-linear Chaining with Overlaps and Gap Costs-Reference-Cited by-同舟云学术

Co-linear Chaining with Overlaps and Gap Costs

Published:2021-02-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Jain Chirag,Gibney Daniel,Thankachan Sharma V.

Abstract

AbstractCo-linear chaining has proven to be a powerful heuristic for finding near-optimal alignments of long DNA sequences (e.g., long reads or a genome assembly) to a reference. It is used as an intermediate step in several alignment tools that employ a seed-chain-extend strategy. Despite this popularity, efficient subquadratic-time algorithms for the general case where chains support anchor overlaps and gap costs are not currently known. We present algorithms to solve the co-linear chaining problem with anchor overlaps and gap costs in Õ(n) time, where n denotes the count of anchors. We also establish the first theoretical connection between co-linear chaining cost and edit distance. Specifically, we prove that for a fixed set of anchors under a carefully designed chaining cost function, the optimal ‘anchored’ edit distance equals the optimal co-linear chaining cost. Finally, we demonstrate experimentally that optimal co-linear chaining cost under the proposed cost function can be computed orders of magnitude faster than edit distance, and achieves correlation coefficient above 0.9 with edit distance for closely as well as distantly related sequences.

Publisher

Cold Spring Harbor Laboratory

Reference32 articles.

1. Chaining algorithms for multiple genome comparison;Journal of Discrete Algorithms,2005

2. CoCoNUT: an efficient system for the comparison and analysis of genomes

3. Backurs, A. , Indyk, P. : Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015. pp. 51–58 (2015)

4. de Berg, M. , Cheong, O. , van Kreveld, M.J. , Overmars, M.H. : Computational geometry: algorithms and applications, 3rd Edition. Springer (2008), https://www.worldcat.org/oclc/227584184

5. AVID: A Global Alignment Program

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Accelerating minimap2 for long-read sequencing applications on modern CPUs;Nature Computational Science;2022-02-28

2. Chaining for Accurate Alignment of Erroneous Long Reads to Acyclic Variation Graphs;2022-01-07

3. Co-linear Chaining with Overlaps and Gap Costs;Lecture Notes in Computer Science;2022

4. Accelerating long-read analysis on modern CPUs;2021-07-23

5. Accurate spliced alignment of long RNA sequencing reads;2020-09-03