GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

Author:

Zhang Liubin,Yuan Yangyang,Peng Wenjie,Tang Bin,Li Mulin Jun,Gui Hongsheng,Wang Qiang,Li MiaoxinORCID

Abstract

AbstractWhole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC’s data structure and algorithms are valuable for accelerating large-scale genomic research.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

全球学者库

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"全球学者库"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前全球学者库共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2023 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3