Abstract
Abstract
Background
Retrotransposons are one of the oldest evolutionary forces shaping mammalian genomes, with the ability to mobilize from one genomic location to another. This mobilization is also a significant factor in human disease. The only autonomous human retroelement, L1, has propagated to make up 17% of the human genome, accumulating over 500,000 copies. The majority of these loci are truncated or defective with only a few reported to remain capable of retrotransposition. We have previously published a strand-specific RNA-Seq bioinformatics approach to stringently identify at the locus-specific level the few expressed full-length L1s using cytoplasmic RNA. With growing repositories of RNA-Seq data, there is potential to mine these datasets to identify and study expressed L1s at single-locus resolution, although many datasets are not strand-specific or not generated from cytoplasmic RNA.
Results
We developed whole-cell, cytoplasmic and nuclear RNA-Seq datasets from 22Rv1 prostate cancer cells to test the influence of different preparations on the quality and effort needed to measure L1 expression. We found that there was minimal data loss in the identification of full-length expressed L1 s using whole cell, strand-specific RNA-Seq data compared to cytoplasmic, strand-specific RNA-Seq data. However, this was only possible with an increased amount of manual curation of the bioinformatics output to eliminate increased background. About half of the data was lost when the sequenced datasets were non-strand specific.
Conclusions
The results of these studies demonstrate that with rigorous manual curation the utilization of stranded RNA-Seq datasets allow identification of expressed L1 loci from either cytoplasmic or whole-cell RNA-Seq datasets.
Funder
National Institute of General Medical Sciences
National Institute on Aging
National Center for Advancing Translational Sciences
Publisher
Springer Science and Business Media LLC
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献