SSAHA: A Fast Search Method for Large DNA Databases-Reference-Cited by-同舟云学术

SSAHA: A Fast Search Method for Large DNA Databases

Published:2001-10-01 Issue:10 Volume:11 Page:1725-1729
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Ning Zemin,Cox Anthony J.,Mullikin James C.

Abstract

We describe an algorithm, SSAHA (SequenceSearch and Alignment by HashingAlgorithm), for performing fast searches on databases containing multiple gigabases of DNA. Sequences in the database are preprocessed by breaking them into consecutive k-tuples ofk contiguous bases and then using a hash table to store the position of each occurrence of each k-tuple. Searching for a query sequence in the database is done by obtaining from the hash table the “hits” for each k-tuple in the query sequence and then performing a sort on the results. We discuss the effect of the tuple length k on the search speed, memory usage, and sensitivity of the algorithm and present the results of computational experiments which show that SSAHA can be three to four orders of magnitude faster than BLAST or FASTA, while requiring less memory than suffix tree methods. The SSAHAalgorithm is used for high-throughput single nucleotide polymorphism (SNP) detection and very large scale sequence assembly. Also, it provides Web-based sequence search facilities for Ensembl projects.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics (clinical),Genetics

Reference16 articles.

1. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

2. An SNP map of the human genome generated by reduced representation shotgun sequencing

3. Tandem repeats finder: a program to analyze DNA sequences

4. Alignment of whole genomes

Cited by 755 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design, synthesis and mechanistic anticancer activity of new acetylated 5-aminosalicylate-thiazolinone hybrid derivatives;iScience;2024-01

2. Creating and Using Minimizer Sketches in Computational Genomics;Journal of Computational Biology;2023-12-01

3. Infection pressure in apes has driven selection for CD4 alleles that resist lentivirus (HIV/SIV) infection;2023-11-13

4. A Multi-FPGA Implementation of FM-Index Based Genomic Pattern Search;IEICE Transactions on Information and Systems;2023-11-01

5. Roadmap to the study of gene and protein phylogeny and evolution—A practical guide;PLOS ONE;2023-02-24