Mining NCBI Sequence Read Archive Database: An Untapped Source of Organelle Genomes for Taxonomic and Comparative Genomics Research
-
Published:2024-02-06
Issue:2
Volume:16
Page:104
-
ISSN:1424-2818
-
Container-title:Diversity
-
language:en
-
Short-container-title:Diversity
Author:
Eldem Vahap1ORCID, Balcı Mehmet Ali1
Affiliation:
1. Department of Biology, Faculty of Science, Istanbul University, Istanbul 34134, Turkey
Abstract
The NCBI SRA database is constantly expanding due to the large amount of genomic and transcriptomic data from various organisms generated by next-generation sequencing, and re-searchers worldwide regularly deposit new data into the database. This high-coverage genomic and transcriptomic information can be re-evaluated regardless of the original research subject. The database-deposited NGS data can offer valuable insights into the genomes of organelles, particularly for non-model organisms. Here, we developed an automated bioinformatics workflow called “OrgaMiner”, designed to unveil high-quality mitochondrial and chloroplast genomes by data mining the NCBI SRA database. OrgaMiner, a Python-based pipeline, automatically orchestrates various tools to extract, assemble, and annotate organelle genomes for non-model organisms without available organelle genome sequences but with data in the NCBI SRA. To test the usability and feasibility of the pipeline, “mollusca” was selected as a keyword, and 76 new mitochondrial genomes were de novo assembled and annotated automatically without writing one single code. The applicability of the pipeline can be expanded to identify organelles in diverse invertebrate, vertebrate, and plant species by simply specifying the taxonomic name. OrgaMiner provides an easy-to-use, end-to-end solution for biologists mainly working with taxonomy and population genetics.
Funder
the Scientific Research Projects Coordination Unit of Istanbul University the National Center for High Performance Computing of Turkey
Reference57 articles.
1. A beginner’s guide to low-coverage whole genome sequencing for population genomics;Lou;Mol. Ecol.,2021 2. GeneNoteBook, a collaborative notebook for comparative genomics;Holmer;Bioinformatics,2019 3. Baxter, S.W., Davey, J.W., Johnston, J.S., Shelton, A.M., Heckel, D.G., Jiggins, C.D., and Blaxter, M.L. (2011). Linkage mapping and comparative genomics using next-generation RAD sequencing of a non-model organism. PLoS ONE, 6. 4. Berhe, M., Dossa, K., You, J., Mboup, P.A., Diallo, I.N., Diouf, D., Zhang, X., and Wang, L. (2021). Genome-wide association study and its applications in the non-model crop Sesamum indicum. BMC Plant Biol., 21. 5. Zeng, Q., Liu, S., Yao, J., Zhang, Y., Yuan, Z., Jiang, C., Chen, A., Fu, Q., Su, B., and Dunham, R. (2016). Transcriptome Display During Testicular Differentiation of Channel Catfish (Ictalurus punctatus) as Revealed by RNA-Seq Analysis. Biol. Reprod., 95.
|
|