Feature selection with a genetic algorithm can help improve the distinguishing power of microbiota information in monozygotic twins' identification

Author:

Fu Guangping,Ma Guanju,Dou Shujie,Wang Qian,Fu Lihong,Zhang Xiaojing,Lu Chaolong,Cong Bin,Li Shujin

Abstract

IntroductionPersonal identification of monozygotic twins (MZT) has been challenging in forensic genetics. Previous research has demonstrated that microbial markers have potential value due to their specificity and long-term stability. However, those studies would use the complete information of detected microbial communities, and low-value species would limit the performance of previous models.MethodsTo address this issue, we collected 80 saliva samples from 10 pairs of MZTs at four different time points and used 16s rRNA V3–V4 region sequencing to obtain microbiota information. The data formed 280 inner-individual (Self) or MZT sample pairs, divided into four groups based on the individual relationship and time interval, and then randomly divided into training and testing sets with an 8:2 ratio. We built 12 identification models based on the time interval ( ≤ 1 year or ≥ 2 months), data basis (Amplicon sequence variants, ASVs or Operational taxonomic unit, OTUs), and distance parameter selection (Jaccard distance, Bray-Curist distance, or Hellinger distance) and then improved their identification power through genetic algorithm processes. The best combination of databases with distance parameters was selected as the final model for the two types of time intervals. Bayes theory was introduced to provide a numerical indicator of the evidence's effectiveness in practical cases.ResultsFrom the 80 saliva samples, 369 OTUs and 1130 ASVs were detected. After the feature selection process, ASV-Jaccard distance models were selected as the final models for the two types of time intervals. For short interval samples, the final model can completely distinguish MZT pairs from Self ones in both training and test sets.DiscussionOur findings support the microbiota solution to the challenging MZT identification problem and highlight the importance of feature selection in improving model performance.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Hebei Province

Publisher

Frontiers Media SA

Subject

Microbiology (medical),Microbiology

全球学者库

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"全球学者库"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前全球学者库共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2023 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3