FINDEL: A Deep Learning Approach to Efficient Artifact Removal From Cancer Genomes
Author:
Tan Denis,Zhou Pengfei,Zhang Shaoting,Wong VicPearly,Zhang Jie,Long Edwin
Abstract
AbstractNext-generation sequencing technologies have increased sequencing throughput by 100-1000 folds and subsequently reduced the cost of sequencing a human genome to approximately US$1,000. However, the existence of sequencing artifacts can cause erroneous identification of variants and adversely impact the downstream analyses. Currently, the manual inspection of variants for additional refinement is still necessary for high-quality variant calls. The inspection is usually done on large binary alignment map (BAM) files which consume a huge amount of labor and time. It also suffers from a lack of standardization and reproducibility. Here we show that the use of mutational signatures coupled with deep learning can replace the current standards in the bioinformatics workflow. This software, called FINDEL, can efficiently remove sequencing artifacts from cancer samples. It queries the variant call format file which is much more compact than BAM files. The software automates the variant refinement process and produces high-quality variant calls.
Publisher
Cold Spring Harbor Laboratory
Reference84 articles.
1. Next-Generation Sequencing Platforms
2. The human genome project: big science transforms biology and medicine;Genome medicine,2013
3. Towards a data sharing code of conduct for international genomic research;Genome Medicine,2011
4. Coming of age: ten years of next-generation sequencing technologies