Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences

Author:

Kaden Marika,Bohnsack Katrin Sophie,Weber Mirko,Kudła Mateusz,Gutowska Kaja,Blazewicz Jacek,Villmann ThomasORCID

Abstract

AbstractWe present an approach to discriminate SARS-CoV-2 virus types based on their RNA sequence descriptions avoiding a sequence alignment. For that purpose, sequences are preprocessed by feature extraction and the resulting feature vectors are analyzed by prototype-based classification to remain interpretable. In particular, we propose to use variants of learning vector quantization (LVQ) based on dissimilarity measures for RNA sequence data. The respective matrix LVQ provides additional knowledge about the classification decisions like discriminant feature correlations and, additionally, can be equipped with easy to realize reject options for uncertain data. Those options provide self-controlled evidence, i.e., the model refuses to make a classification decision if the model evidence for the presented data is not sufficient. This model is first trained using a GISAID dataset with given virus types detected according to the molecular differences in coronavirus populations by phylogenetic tree clustering. In a second step, we apply the trained model to another but unlabeled SARS-CoV-2 virus dataset. For these data, we can either assign a virus type to the sequences or reject atypical samples. Those rejected sequences allow to speculate about new virus types with respect to nucleotide base mutations in the viral sequences. Moreover, this rejection analysis improves model robustness. Last but not least, the presented approach has lower computational complexity compared to methods based on (multiple) sequence alignment.

Funder

Laserinstitut Hochschule Mittweida

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Software

Reference95 articles.

1. Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF (2020) The proximal origin of SARS-CoV-2. Nat Med 26:450–452

2. Bai Y, Jiang D, Lon J, Chen X, Hu M, Lin S, Chen Z, Meng Y, Du H (2020) Evolution and molecular characteristics of SARS-CoV-2 genome. bioRXiv, (2020.04.24.058933)

3. Bauer H-U, Herrmann M, Villmann T (1999) Neural maps and topographic vector quantization. Neural Netw 12(4–5):659–676

4. Bhanot G, Biehl M, Vilmann T, Zühlke D (2017) Biomedical data analysis in translational research: Integration of expert knowledge and interpretable models. In M. Verleysen, editor, Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’2017), pages 177–186, Louvain-La-Neuve, Belgium. i6doc.com

5. Biehl M, Hammer B, Villmann T (2016) Prototype-based models in machine learning. Wiley Interdisciplinary Rev Cogn Sci 2:92–111

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3