ScaleFExSM: a lightweight and scalable method to extract fixed features from single cells in high-content imaging screens

Author:

Migliori Bianca,Bose NeeloyORCID,Paull DanielORCID

Abstract

AbstractHigh-content imaging (HCI) is a popular technique that leverages high throughput datasets to uncover phenotypes of cell populationsin vitro. When the differences between populations (such as a healthy and disease state) are completely unknown, it is crucial to build very large HCI screens to account for individual (donor) variation, as well as having enough replicates to create a reliable model. One approach to highlight phenotypic differences is to reduce images into a set of features using unbiased methods, such as embeddings or autoencoders. These methods are powerful at preserving the predictive power contained in each image while removing most of the unimportant image features and noise (e.g., background). However, they do not provide interpretable information about the features driving the decision process of the AI algorithm used. While tools have been developed to address this issue, such as CellProfiler, scaling this tool to large sample batches containing hundreds of thousands of images poses computational challenges. Additionally, the resulting feature vector, computationally expensive to have generated, is very large in size (containing over 3000 features) with many redundant features, making it challenging to perform further analysis and identify the truly relevant features. Ultimately, there is an increased risk of overfitting due to the presence of too many non-meaningful features that can ultimately skew downstream predictions.To address this issue, we have developed ScaleFExSM, a Python pipeline that extracts multiple generic fixed features at the single cell level that can be deployed across large high-content imaging datasets with low computational requirements. This pipeline efficiently and reliably computes features related to shape, size, intensity, texture, granularity as well as correlations between channels. Additionally, it allows the measurement of additional features specifically related to mitochondria and RNA only, as they represent important channels with characteristics worth to be measured on their own. The measured features can be used to not only separate populations of cells using AI tools, but also highlight the specific interpretable features that differ between populations. We applied ScaleFExSMto identify the phenotypic shifts that multiple cell lines undergo when exposed to different compounds. We used a combination of recursive feature elimination, logistic regression, correlation analysis and dimensionality reduction representations to narrow down to the most meaningful features that described the drug shifts. Furthermore, we used the best scoring features to extract images of cells for each class closest to the average to visually highlight the phenotypic shifts caused by the drugs. Using this approach, we were able to identify features linked to the drug shifts in line with literature, and we could visually validate their involvement in the morphological changes of the cells.ScaleFExSMcan be used as a powerful tool to understand the underlying phenotypes of complex diseases and subtle drug shifts at the single cell level, bringing us a step closer to identifying disease-modifying compounds for the major diseases of our time.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3