Affiliation:
1. Department of Pharmacological Sciences, Mount Sinai Center for Bioinformatics, Icahn School of Medicine at Mount Sinai , New York, NY 10029, United States
2. Department of Biochemistry, University of Wisconsin , Madison, WI 53706, United States
Abstract
Abstract
Motivation
There is a rapid growth in the production of omics datasets collected by the diabetes research community. However, such published data are underutilized for knowledge discovery. To make bioinformatics tools and published omics datasets from the diabetes field more accessible to biomedical researchers, we developed the Diabetes Data and Hypothesis Hub (D2H2).
Results
D2H2 contains hundreds of high-quality curated transcriptomics datasets relevant to diabetes, accessible via a user-friendly web-based portal. The collected and processed datasets are curated from the Gene Expression Omnibus (GEO). Each curated study has a dedicated page that provides data visualization, differential gene expression analysis, and single-gene queries. To enable the investigation of these curated datasets and to provide easy access to bioinformatics tools that serve gene and gene set-related knowledge, we developed the D2H2 chatbot. Utilizing GPT, we prompt users to enter free text about their data analysis needs. Parsing the user prompt, together with specifying information about all D2H2 available tools and workflows, we answer user queries by invoking the most relevant tools via the tools’ API. D2H2 also has a hypotheses generation module where gene sets are randomly selected from the bulk RNA-seq precomputed signatures. We then find highly overlapping gene sets extracted from publications listed in PubMed Central with abstract dissimilarity. With the help of GPT, we speculate about a possible explanation of the high overlap between the gene sets. Overall, D2H2 is a platform that provides a suite of bioinformatics tools and curated transcriptomics datasets for hypothesis generation.
Availability and implementation
D2H2 is available at: https://d2h2.maayanlab.cloud/ and the source code is available from GitHub at https://github.com/MaayanLab/D2H2-site under the CC BY-NC 4.0 license.
Funder
National Institutes of Health
Publisher
Oxford University Press (OUP)
Subject
Computer Science Applications,Genetics,Molecular Biology,Structural Biology