Exploration of whole genome amplification generated chimeric sequences in long-read sequencing data

Author:

Lu Na12,Qiao Yi12,An Pengfei123,Luo Jiajian12,Bi Changwei4,Li Musheng5,Lu Zuhong12,Tu Jing12ORCID

Affiliation:

1. State Key Laboratory of Bioelectronics , School of Biological Science and Medical Engineering, , Nanjing 210096 , China

2. Southeast University , School of Biological Science and Medical Engineering, , Nanjing 210096 , China

3. Monash University-Southeast University Joint Research Institute , Suzhou 215123 , China

4. College of Information Science and Technology, Nanjing Forestry University , Nanjing 210037 , China

5. Department of Physiology and Cell Biology, University of Nevada, Reno School of Medicine , Reno, NV 89511 , USA

Abstract

Abstract Motivation Multiple displacement amplification (MDA) has become the most commonly used method of whole genome amplification, generating a vast amount of DNA with higher molecular weight and greater genome coverage. Coupling with long-read sequencing, it is possible to sequence the amplicons of over 20 kb in length. However, the formation of chimeric sequences (chimeras, expressed as structural errors in sequencing data) in MDA seriously interferes with the bioinformatics analysis but its influence on long-read sequencing data is unknown. Results We sequenced the phi29 DNA polymerase-mediated MDA amplicons on the PacBio platform and analyzed chimeras within the generated data. The 3rd-ChimeraMiner has been constructed as a pipeline for recognizing and restoring chimeras into the original structures in long-read sequencing data, improving the efficiency of using TGS data. Five long-read datasets and one high-fidelity long-read dataset with various amplification folds were analyzed. The result reveals that the mis-priming events in amplification are more frequently occurring than widely perceived, and the propor tion gradually accumulates from 42% to over 78% as the amplification continues. In total, 99.92% of recognized chimeric sequences were demonstrated to be artifacts, whose structures were wrongly formed in MDA instead of existing in original genomes. By restoring chimeras to their original structures, the vast majority of supplementary alignments that introduce false-positive structural variants are recycled, removing 97% of inversions on average and contributing to the analysis of structural variation in MDA-amplified samples. The impact of chimeras in long-read sequencing data analysis should be emphasized, and the 3rd-ChimeraMiner can help to quantify and reduce the influence of chimeras. Availability and implementation The 3rd-ChimeraMiner is available on GitHub, https://github.com/dulunar/3rdChimeraMiner.

Funder

Natural Science Foundation of Jiangsu Province

National Natural Science Foundation of China

Fundamental Research Funds for the Central Universities and Qinglan Project of Jiangsu Province of China

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Reference79 articles.

1. Human molecular genetics and genomics — important advances and exciting possibilities;Collins;N Engl J Med,2021

2. Genomic sequencing should be part of the standard of care for most urologic cancers;Salami,2022

3. The National Academies’ Roundtable on Genomics and Precision Health: where we have been and where we are heading;Ginsburg;Am J Hum Genet,2021

4. The human genome. physical activity, fitness, and health;Bouchard;Kinesiol Rev,2021

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

全球学者库

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"全球学者库"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前全球学者库共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2023 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3