Identification of potential riboswitch elements inHomo SapiensmRNA 5’UTR sequences using Positive-Unlabeled Machine learning

Author:

Raymond William S.ORCID,DeRoo Jacob,Munsky Brian

Abstract

AbstractRiboswitches are a class of noncoding RNA structures that interact with target ligands to cause a conformational change that can then execute some regulatory purpose within the cell. Riboswitches are ubiquitous and well characterized in bacteria and prokaryotes, with additional examples also being found in fungi, plants, and yeast. To date, no purely RNA-small molecule riboswitch has been discovered inHomo Sapiens. Several analogous riboswitch-like mechanisms have been described within theH. Sapienstranslatome within the past decade, prompting the question: Is there aH. Sapiensriboswitch dependent on only small molecule ligands? In this work, we set out to train positive unlabeled machine learning classifiers on known riboswitch sequences and apply the classifiers toH. SapiensmRNA 5’UTR sequences found in the 5’UTR database, UTRdb, in the hope of identifying a set of mRNAs to investigate for riboswitch functionality. 67,683 riboswitch sequences were obtained from RNAcentral and sorted for ligand type and used as positive examples and 48,031 5’UTR sequences were used as unlabeled, unknown examples. Positive examples were sorted by lig- and, and 20 positive-unlabeled classifiers were trained on sequence and secondary structure features while withholding one or two ligand classes. Cross validation was then performed on the withheld ligand sets to obtain a validation accuracy range of 75%-99%. The joint sets of 5’UTRs identified as potential riboswitches by the 20 classifiers were then analyzed. 15333 sequences were identified as a riboswitch by one or more classifier(s) and 436 of theH. Sapiens5’UTRs were labeled as harboring potential riboswitch elements by all 20 classifiers. These 436 sequences were mapped back to the most similar riboswitches within the positive data and examined. An online database of identified and ranked 5’UTRs, their features, and their most similar matches to known riboswitches, is provided to guide future experimental efforts to identifyH. Sapiensriboswitches.Author summaryRiboswitches are an important regulatory element in bacteria that have not been described inHomo Sapiens. However, if human riboswitches exist and if they can be found, they could have vast implications on human disease. We apply positive-unlabeled machine learning to combine known riboswitch sequences withH. Sapiens5’UTR sequences and to search for potential riboswitches. We analyze our ensemble predictions for likelyH. Sapiens5’UTR riboswitches using GO analysis to determine their potential functional roles, and we rank and display our predicted sequences next to the most similar known riboswitches. We expect these analyses to be helpful to the scientific community in planning future experiments for laboratory discovery and validation.0.1Graphical Abstract

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3