Why are rare variants hard to impute? Coalescent models reveal theoretical limits in existing algorithms

Author:

Si Yichen1,Vanderwerff Brett1,Zöllner Sebastian12

Affiliation:

1. Department of Biostatistics, School of Public Health, University of Michigan, 1420 Washington Heights, Ann Arbor, MI 48109, USA

2. Department of Psychiatry, University of Michigan,1420 Washington Heights, Ann Arbor, MI 48109, USA

Abstract

Abstract Genotype imputation is an indispensable step in human genetic studies. Large reference panels with deeply sequenced genomes now allow interrogating variants with minor allele frequency < 1% without sequencing. Although it is critical to consider limits of this approach, imputation methods for rare variants have only done so empirically; the theoretical basis of their imputation accuracy has not been explored. To provide theoretical consideration of imputation accuracy under the current imputation framework, we develop a coalescent model of imputing rare variants, leveraging the joint genealogy of the sample to be imputed and reference individuals. We show that broadly used imputation algorithms include model misspecifications about this joint genealogy that limit the ability to correctly impute rare variants. We develop closed-form solutions for the probability distribution of this joint genealogy and quantify the inevitable error rate resulting from the model misspecification across a range of allele frequencies and reference sample sizes. We show that the probability of a falsely imputed minor allele decreases with reference sample size, but the proportion of falsely imputed minor alleles mostly depends on the allele count in the reference sample. We summarize the impact of this error on genotype imputation on association tests by calculating the r2 between imputed and true genotype and show that even when modeling other sources of error, the impact of the model misspecification has a significant impact on the r2 of rare variants. To evaluate these predictions in practice, we compare the imputation of the same dataset across imputation panels of different sizes. Although this empirical imputation accuracy is substantially lower than our theoretical prediction, modeling misspecification seems to further decrease imputation accuracy for variants with low allele counts in the reference. These results provide a framework for developing new imputation algorithms and for interpreting rare variant association analyses.

Funder

National Institutes of Health

NIGMS Human Genetic Cell Repository

Coriell Institute for Medical Research

NHGRI

Publisher

Oxford University Press (OUP)

Subject

Genetics

Reference34 articles.

1. A map of human genome variation from population-scale sequencing;Abecasis;Nature,2010

2. A one-penny imputed genome from next-generation reference panels;Browning;Am JHum Genet,2018

3. High coverage whole genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios;Byrska-Bishop;bioRxiv,2021

4. Evaluating imputation algorithms for low-depth genotyping-by-sequencing (GBS) data;Chan;PLoS One,2016

5. Next-generation genotype imputation service and methods;Das;Nat Genet,2016

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3