A deep catalogue of protein-coding variation in 983,578 individuals
Author:
Sun Kathie Y.ORCID, Bai Xiaodong, Chen Siying, Bao SuyingORCID, Zhang Chuanyi, Kapoor ManavORCID, Backman JoshuaORCID, Joseph Tyler, Maxwell Evan, Mitra George, Gorovits Alexander, Mansfield Adam, Boutkov Boris, Gokhale Sujit, Habegger Lukas, Marcketta Anthony, Locke Adam E.ORCID, Ganel LironORCID, Hawes Alicia, Kessler Michael D., Sharma Deepika, Staples Jeffrey, Bovijn Jonas, Gelfman Sahar, Di Gioia Alessandro, Rajagopal Veera M.ORCID, Lopez Alexander, Varela Jennifer Rico, Alegre-Díaz JesúsORCID, Berumen JaimeORCID, Tapia-Conyer RobertoORCID, Kuri-Morales PabloORCID, Torres JasonORCID, Emberson JonathanORCID, Collins Rory, , , Abecasis Gonçalo, Coppola Giovanni, Deubler Andrew, Economides Aris, Ferrando Adolfo, Lotta Luca A., Shuldiner Alan, Siminovitch Katherine, , Beechert Christina, Brian Erin D., Cremona Laura M., Du Hang, Forsythe Caitlin, Gu Zhenhua, Guevara Kristy, Lattari Michael, Manoochehri Kia, Challa Prathyusha, Pradhan Manasi, Reynoso Raymond, Schiavo Ricardo, Padilla Maria Sotiropoulos, Wang Chenggu, Wolf Sarah E., , Averitt Amelia, Banerjee Nilanjana, Li Dadong, Malhotra Sameer, Mower Justin, Sarwar Mudasar, Staples Jeffrey C., Yu Sean, Zhang Aaron, , Bunyea Andrew, Punuru Krishna Pawan, Sreeram Sanjay, Eom Gisu, Sultan Benjamin, Lanche Rouel, Mahajan Vrushali, Austin Eliot, O’Keeffe Sean, Panea Razvan, Polanco Tommy, Rasool Ayesha, Zhang Lance, Edelstein Evan, Guan Ju, Krasheninina Olga, Zarate Samantha, Mansfield Adam J., Maxwell Evan K., Sun Kathie, , Ferreira Manuel Allen Revez, Burch Kathy, Campos Adrian, Chen Lei, Choi Sam, Damask Amy, Gaynor Sheila, Geraghty Benjamin, Ghosh Arkopravo, Martinez Salvador Romero, Gillies Christopher, Gurski Lauren, Herman Joseph, Jorgenson Eric, Kessler Michael, Kosmicki Jack, Lin Nan, Locke Adam, Nakka Priyanka, Landheer Karl, Delaneau Olivier, Ghoussaini Maya, Mbatchou Joelle, Moscati Arden, Pandey Aditeya, Pandit Anita, Paulding Charles, Ross Jonathan, Sidore Carlo, Stahl Eli, Suciu Maria, VandeHaar Peter, Vedantam Sailaja, Vrieze Scott, Zhang Jingning, Wang Rujin, Wu Kuan-Han, Ye Bin, Zhang Blair, Ziyatdinov Andrey, Zou Yuxin, Watanabe Kyoko, Tang Mira, , Hobbs Brian, Silver Jon, Palmer William, Guerreiro Rita, Joshi Amit, Baldassari Antoine, Willer Cristen, Graham Sarah, Mayerhofer Ernst, Haas Mary, Verweij Niek, Hindy George, De Tanima, Akbari Parsa, Sun Luanluan, Sosina Olukayode, Gilly Arthur, Dornbos Peter, Rodriguez-Flores Juan, Riaz Moeen, Tzoneva Gannie, Jallow Momodou W., Alkelai Anna, Ayer Ariane, Rajagopal Veera, Kumar Vijay, Otto Jacqueline, Parikshak Neelroop, Guvenek Aysegul, Bras Jose, Alvarez Silvia, Brown Jessie, He Jing, Khiabanian Hossein, Revez Joana, Skead Kimberly, Zavala Valentina, , Mitnaul Lyndon J., Jones Marcus B., Chen Esteban, LeBlanc Michelle G., Mighty Jason, Nishtala Nirupama, Rana Nadia, Rico-Varela Jennifer, Hernandez Jaimee, , Fenney Alison, Schwartz Randi, Hankins Jody, Hart Samuel, , Perez-Beals Ann, Solari Gina, Rivera-Picart Johannie, Pagan Michelle, Siceron Sunilbe, , , Gwynne David, , Rotter Jerome I., Weinreb Robert, , Haines Jonathan L., Pericak-Vance Margaret A., Stambolian Dwight, , Barzilai Nir, Suh Yousin, Zhang Zhengdong, , Hong Elliot, , Mitchell Braxton, , Blackburn Nicholas B., Broadley Simon, Fabis-Pedrini Marzena J., Jokubaitis Vilija G., Kermode Allan G., Kilpatrick Trevor J., Lechner-Scott Jeanette, Leslie Stephen, McComish Bennet J., Motyer Allan, Parnell Grant P., Scott Rodney J., Taylor Bruce V., Rubio Justin P., , Saleheen Danish, , Kaufman Ken, Kottyan Leah, Martin Lisa, Rothenberg Marc E., , Ali Abdullah, Raza Azra, , Cohen Jonathan, , Glassman Adam, , Kraus William E., Newgard Christopher B., Shah Svati H., , Craig Jamie, Hewitt Alex, , Chalasani Naga, Foroud Tatiana, Liangpunsakul Suthat, , Cox Nancy J., Dolan Eileen, El-Charif Omar, Travis Lois B., Wheeler Heather, Gamazon Eric, , Sakoda Lori, Witte John, , Lazaridis Kostantinos, , , Buchanan Adam, Carey David J., Martin Christa L., Meyer Michelle N., Retterer Kyle, Rolston David, , Akula Nirmala, Besançon Emily, Detera-Wadleigh Sevilla D., Kassem Layla, McMahon Francis J., Schulze Thomas G., , Gordon Adam, Smith Maureen, Varga John, , Bradford Yuki, Damrauer Scott, DerOhannessian Stephanie, Drivas Theodore, Dudek Scott, Dunn Joseph, Haubein Ned, Judy Renae, Ko Yi-An, Kripke Colleen Morse, Livingstone Meghan, Naseer Nawar, Nerz Kyle P., Poindexter Afiya, Risman Marjorie, Santos Salma, Sirugo Giorgio, Stephanowski Julia, Tran Teo, Vadivieso Fred, Verma Anurag, Verma Shefali S., Weaver JoEllen, Wollack Colin, Rader Daniel J., Ritchie Marylyn, , O’Brien Joan, , Bottinger Erwin, Cho Judy, , Bridges S. Louis, , Kimberly Robert, , Fejzo Marlena, , Spritz Richard A., , Elder James T., Nair Rajan P., Stuart Philip, Tsoi Lam C., , Dent Robert, McPherson Ruth, , Keating Brendan, , Kershaw Erin E., Papachristou Georgios, Whitcomb David C., , Assassi Shervin, Mayes Maureen D., , Austin Eric D., Cantor MichaelORCID, Thornton TimothyORCID, Kang Hyun Min, Overton John D., Shuldiner Alan R.ORCID, Cremona M. Laura, Nafde Mona, Baras ArisORCID, Abecasis GonçaloORCID, Marchini JonathanORCID, Reid Jeffrey G.ORCID, Salerno WilliamORCID, Balasubramanian SuganthiORCID
Abstract
AbstractRare coding variants that substantially affect function provide insights into the biology of a gene1–3. However, ascertaining the frequency of such variants requires large sample sizes4–8. Here we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. In total, 23% of the Regeneron Genetics Center Million Exome (RGC-ME) data come from individuals of African, East Asian, Indigenous American, Middle Eastern and South Asian ancestry. The catalogue includes more than 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss of function (LOF), we identify 3,988 LOF-intolerant genes, including 86 that were previously assessed as tolerant and 1,153 that lack established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions that are depleted of missense variants despite being tolerant of pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this resource of coding variation from the RGC-ME dataset publicly accessible through a variant allele frequency browser.
Publisher
Springer Science and Business Media LLC
Reference63 articles.
1. Baxter, S. M. et al. Centers for Mendelian Genomics: a decade of facilitating gene discovery. Genet. Med. 24, 784–797 (2022). 2. Musunuru, K. et al. Exome sequencing, ANGPTL3 mutations, and familial combined hypolipidemia. N. Engl. J. Med. 363, 2220–2227 (2010). 3. Soutar, A. K. & Naoumova, R. P. Mechanisms of disease: genetic causes of familial hypercholesterolemia. Nat. Clin. Pract. Cardiovasc. Med. 4, 214–225 (2007). 4. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). 5. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|