Similarity-based multimodal regression


Chen Andrew A1ORCID,Weinstein Sarah M2ORCID,Adebimpe Azeez34,Gur Ruben C45,Gur Raquel E45,Merikangas Kathleen R6,Satterthwaite Theodore D34,Shinohara Russell T78,Shou Haochang78


1. Department of Public Health Sciences, Medical University of South Carolina , Charleston, SC 29425, USA

2. Department of Epidemiology and Biostatistics, Temple University College of Public Health , Philadelphia, PA 19122, USA

3. Penn Lifespan Informatics & Neuroimaging Center, Department of Psychiatry, University of Pennsylvania , Philadelphia, PA 19104, USA

4. Department of Psychiatry, University of Pennsylvania , Philadelphia, PA 19104, USA

5. Lifespan Brain Institute Penn Medicine and CHOP, University of Pennsylvania , Philadelphia, PA 19104, USA

6. Genetic Epidemiology Research Branch, Intramural Research Program, National Institute of Mental Health , Bethesda, MD 20892, USA

7. Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania , Philadelphia, PA 19104, USA

8. Center for Biomedical Image Computing and Analytics, University of Pennsylvania , Philadelphia, PA 19104, USA


Summary To better understand complex human phenotypes, large-scale studies have increasingly collected multiple data modalities across domains such as imaging, mobile health, and physical activity. The properties of each data type often differ substantially and require either separate analyses or extensive processing to obtain comparable features for a combined analysis. Multimodal data fusion enables certain analyses on matrix-valued and vector-valued data, but it generally cannot integrate modalities of different dimensions and data structures. For a single data modality, multivariate distance matrix regression provides a distance-based framework for regression accommodating a wide range of data types. However, no distance-based method exists to handle multiple complementary types of data. We propose a novel distance-based regression model, which we refer to as Similarity-based Multimodal Regression (SiMMR), that enables simultaneous regression of multiple modalities through their distance profiles. We demonstrate through simulation, imaging studies, and longitudinal mobile health analyses that our proposed method can detect associations between clinical variables and multimodal data of differing properties and dimensionalities, even with modest sample sizes. We perform experiments to evaluate several different test statistics and provide recommendations for applying our method across a broad range of scenarios.


National Institute of Neurological Disorders and Stroke

National Multiple Sclerosis Society

National Institute of Mental Health

University of Pennsylvania Center for Biomedical Image Computing and Analytics


Oxford University Press (OUP)


Statistics, Probability and Uncertainty,General Medicine,Statistics and Probability







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3