Reference Genome Choice and Filtering Thresholds Jointly Influence Phylogenomic Analyses

Author:

Rick Jessica A1ORCID,Brock Chad D2,Lewanski Alexander L3ORCID,Golcher-Benavides Jimena4ORCID,Wagner Catherine E56ORCID

Affiliation:

1. School of Natural Resources & the Environment, University of Arizona , Tucson, AZ 85719 , USA

2. Department of Biological Sciences, Tarleton State University , Stephenville, TX 76401 , USA

3. Department of Integrative Biology and W.K. Kellogg Biological Station, Michigan State University , East Lansing, MI 48824 , USA

4. Department of Natural Resource Ecology and Management, Iowa State University , Ames, IA 50011 , USA

5. Program in Ecology and Evolution, University of Wyoming , Laramie, WY 82071 , USA

6. Department of Botany, University of Wyoming , Laramie, WY 82071 , USA

Abstract

Abstract Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate the extent to which the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find the greatest topological accuracy when filtering sites for minor allele count (MAC) >3–4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with MAC >1–2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short-read genomic data for phylogenetic inference.

Funder

National Science Foundation

Publisher

Oxford University Press (OUP)

Subject

Genetics,Ecology, Evolution, Behavior and Systematics

Reference122 articles.

1. Large number of taxa used to estimate a rooted species tree with the ABC method from an unrooted gene tree;Alanzi;Gene. Mol. Res,2020

2. Inferring rooted species trees from unrooted gene trees using approximate Bayesian computation;Alanzi;Mol. Phylogene. Evol,2017

3. MuMIn: Multi-model inference. R package version 1.47.1.;Bartoń,2022

4. Fitting linear mixed effects models using lme4;Bates;J. Stat. Softw,2015

5. Automated reconstruction of whole-genome phylogenies from short-sequence reads;Bertels;Mol. Biol. Evol,2014

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3