Species-agnostic transfer learning for cross-species transcriptomics data integration without gene orthology

Author:

Park Youngjun12ORCID,Muttray Nils P3,Hauschild Anne-Christin14

Affiliation:

1. Department of Medical Informatics, University Medical Center Göttingen , Göttingen, Germany

2. International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen Göttingen , Germany

3. Applied Statistics, Georg-August-Universität Göttingen Göttingen , Germany

4. Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen Göttingen , Germany

Abstract

Abstract Novel hypotheses in biomedical research are often developed or validated in model organisms such as mice and zebrafish and thus play a crucial role. However, due to biological differences between species, translating these findings into human applications remains challenging. Moreover, commonly used orthologous gene information is often incomplete and entails a significant information loss during gene-id conversion. To address these issues, we present a novel methodology for species-agnostic transfer learning with heterogeneous domain adaptation. We extended the cross-domain structure-preserving projection toward out-of-sample prediction. Our approach not only allows knowledge integration and translation across various species without relying on gene orthology but also identifies similar GO among the most influential genes composing the latent space for integration. Subsequently, during the alignment of latent spaces, each composed of species-specific genes, it is possible to identify functional annotations of genes missing from public orthology databases. We evaluated our approach with four different single-cell sequencing datasets focusing on cell-type prediction and compared it against related machine-learning approaches. In summary, the developed model outperforms related methods working without prior knowledge when predicting unseen cell types based on other species’ data. The results demonstrate that our novel approach allows knowledge transfer beyond species barriers without the dependency on known gene orthology but utilizing the entire gene sets.

Funder

German Ministry of Education and Research

International Max Planck Research School for Genome Science

Göttingen Graduate Center for Neurosciences, Biophysics, und Molecular Biosciences

Publisher

Oxford University Press (OUP)

Reference45 articles.

1. The age of model organisms;Davis;Nat Rev Genet,2004

2. Multi-omics integration in the age of million single-cell data;Miao;Nat Rev Nephrol,2021

3. Cross-species analysis of single-cell transcriptomic data;Shafer;Front Cell Dev Biol,2019

4. Integrated analysis of multimodal single-cell data;Hao;Cell,2021

5. Best practices for single-cell analysis across modalities;Heumos;Nat Rev Genet,2023

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3