Introduction

Mesenchymal stem cells (MSCs) have the capacity of self-renewal and multi-lineage differentiation potential, making them a promising tool in tissue engineering [1,2,3]. Driving MSCs to osteogenic differentiation has emerged as a promising research field. Specifically, understanding the intricate molecular mechanisms governing MSC commitment to the osteogenic lineage is essential for advancing regenerative medicine and orthopedic treatments. The ability to manipulate and enhance the osteogenic potential of MSCs opens up innovative therapeutic avenues for solving addressing bone-related disorders, holding great potential for applications in the treatment of bone injuries, defects, osteoporosis, and even bone tumors [4,5,6]. Furthermore, insights gained from MSC osteogenic differentiation contribute to the optimization of biomedical implant integration, even including the generation of personalized bone grafts for transplantation, and provide valuable models for drug development and screening [7, 8]. In summary, the multifaceted implications of MSC osteogenic differentiation underscore its pivotal role in addressing challenges related to bone health and fostering innovations in biomedical research and clinical practice [9, 10].

Early and accurate evaluation of osteogenic differentiation of MSCs is crucial for the above biomedical applications. The conventional biological methods include immunochemical staining, polymerase chain reaction (PCR), and western blots [11,12,13]. But these biological methods all are cumbersome, time-consuming, and uneconomical to detect the osteogenic differentiation. And these methods usually require several days to confirm the osteogenic trend of MSCs [12, 14]. It may be affected by many issues, such as the specificity of antibody, repeatability, and the operational skills of experimenters [15, 16]. Therefore, an accurate, early stage, and low-cost method is urgently required to assess the osteogenic differentiation of MSCs for next-generation biomedical applications.

In recent years, with the continuous progress of hardware technology and increase of training data, artificial intelligence technology represented by the deep learning (DL) has been greatly developed. DL is a prominent branch of machine learning and it uses artificial neural networks to learn and reason from data. DL has been widely used in image and speech recognition, natural language processing and so on [17, 18]. Convolutional neural network (CNN) is a very classic and effective DL model. With its end-to-end, high performance and strong robustness, it has become one of the most commonly used models in computer vision, especially in the field of medical image processing [19, 20]. Specifically, CNN is usually composed of multiple convolutional layers and pooling layers. Its complex and variable network structure provides strong nonlinear fitting ability to effectively extract advanced features (such as angles, curves and contours) of input image data. The designed loss function can gradually update the CNN model parameters to minimize the error between the prediction and ground truth, and finally accurately achieve target detection, classification, etc.

However, limited by the difficulties of interdisciplinary integration and data collection, there are relatively few studies on the application of DL in the field of cell biology [21]. An exploratory study on Nature Methods used CNN to perform pixel-level analysis of cell images to classify and cluster cells [22]. Several subsequent studies have shown that DL can be successfully applied in cell biology problems such as fluorescence staining prediction [23], immune cell differentiation [24] and blood cell recognition [25], which provides important reference and guidance for future related research. In 2018, Kusumoto et al. found that it is possible to predict whether induced pluripotent stem cells (iPSCs) contain endothelial cells only based on cell morphology, which indicates that iPSCs differentiation is also one of the areas that can benefit significantly from the progress of DL [26], and some subsequent studies have also been preliminarily explored [26,27,28].

Herein, as illustrated in Fig. 1, we constructed and trained various DL models. We verified the hypothesis that CNN can accurately predict the osteogenic differentiation of MSCs from bright-field images obtained by phase contrast microscope. A comprehensive control experiment at multiple times (0, 1, 3, 5, 7 days) was strictly performed. At each time point, the osteogenic differentiation of MSCs was detected by conventional biological methods and DL, respectively. By applying CNN training and testing on different days after the beginning of differentiation, the results showed that CNN can predict osteogenic differentiation accurately (the accuracy can exceed 0.95 at the crop-image level). In addition, we were pleasantly surprised to find that CNN can identify differentiated cells on the first day after differentiation (the accuracy can exceed 0.80), which was much more time-efficient than conventional biological detection to identify differentiated osteogenic differentiation on day 3. Further interpretability analysis showed that CNN payed more attention to global cytological features. We have evidence to believe that early and accurate detection of MSCs osteogenic differentiation through a simple microscope and a well-trained CNN may revolutionize cell detection methods in the near future.

Fig. 1
figure 1

Overview of workflow. Schematic flow chart of the training, validation, comparison, and prediction of CNN in this study

Materials and Methods

Cell Culture

Human umbilical MSCs were purchased from the Cell Bank of Chinese Academy of Sciences (Shanghai, China). MSCs were cultured in α-minimum essential medium (α-MEM, Gibco, Carlsbad, CA, USA) containing 10% fetal bovine serum (FBS, Gibco, Carlsbad, CA, USA) and 1% penicillin/streptomycin (Sigma) at 37 °C in humidified air with 5% CO2. To determine the osteogenic differentiation capacity of MSCs, the cells were cultured in the osteogenic medium (OM) and the medium was replaced every three days. The OM contained 10% FBS medium supplemented with 100 nM dexamethasone (Beijing Solarbio Science & Technology co., Ltd., Beijing, China), 10 mM β-glycerophosphate (Beijing Solarbio Science & Technology co., Ltd., Beijing, China) and 50 ug/ml ascorbic acid (Beijing Solarbio Science & Technology co., Ltd., Beijing, China). The MSCs were cultured for 0, 1, 3, 5, and 7 days. The MSCs were cultured in basic medium (BM) as a control.

Real-Time Quantitative PCR (RT-qPCR)

At day 0, 1, 3, 5, and 7 of culture, total RNA was extracted with TRIzol® reagent (Invitrogen; Thermo Fisher Scientific, USA) according to the manufacturer’s instructions. 2000 ng of total RNA of each sample was reverse transcribed to generate cDNA using a PrimeScript™ RT reagent kit with gDNA Eraser (Takara, Tokyo, Japan), and the qRT-PCR reactions were carried out in a 20 μl reaction volume with a TB Green PCR Core Kit (Takara) and the LightCycler® 96 System following the manufacturer’s instructions. The expression levels of mRNA for alkaline phosphatase (ALP), osteopontin (OPN), and Runt-related transcription factor 2 (RUNX2) were detected and normalized to that of glyceraldehyde 3-phosphate dehydrogenase (GAPDH). The primer sequences used in this study are listed in Supplementary Table S1.

MSCs Identification and Characterization

MSCs based on specific surface markers and functional properties. The primary identification of MSCs was focussed on the expression of surface markers CD29, CD44, and CD105, while simultaneously confirming the absence of hematopoietic markers such as CD31, CD34, and HLA-DR. Our flow cytometry analysis indicated a high compliance with the International Society for Cellular Therapy (ISCT) criteria for defining human MSCs. Specifically, 95% of the isolated cells expressed CD105, CD73, and CD90, and less than 2% exhibited hematopoietic markers (Fig. S1A).

Furthermore, to robustly confirm the MSC identity, we conducted differentiation assays under standardized in vitro conditions. These assays demonstrated the cells capability to differentiate into osteoblasts, adipocytes, and chondroblasts. Osteogenic differentiation was confirmed by the presence of mineralization in cultures, evidenced through Alizarin Red staining. Adipogenic differentiation was indicated by the formation of lipid droplets, visualized via Oil Red O staining. Chondrogenic differentiation was validated by the presence of a proteoglycan-rich extracellular matrix in cultures, as evident from Alcian Blue staining (Fig. S1B).

Alkaline Phosphatase (ALP) Activity Assay and Staining

At day 0, 1, 3, 5, and 7 of culture, the ALP activity of MSCs was analysed using an ALP assay kit (Nanjing Jiancheng Bioengineering Institute, Nanjing, China) according to the manufacturer’s instructions. For ALP staining, MSCs were washed with PBS and fixed with 4% paraformaldehyde (PFA) then stained using a 5-bromo-4-chloro-3-indolylphosphate/nitro blue tetrazolium (BCIP/NBT) ALP staining kit (Beyotime Biotechnology, Shanghai, China). The stained cells were then observed and photographed by a microscope (Olympus IX71) and quantified using ImageJ software (NIH Image, Bethesda, MD, USA).

Preparation of Datasets

In order to ensure the consistency of cell density and growth days during image acquisition, we adopted a reverse time addition of osteogenic induction medium. First, we initially seeded the same 100,000 cells in the same batch of 6 cm culture dishes and divided them into 6 groups (control group and induction groups of 0, 1, 3, 5, 7 days), with 10 dishes in each group. As shown in Fig. S2A, the control group is cultured in the BM for 7 days. Then, the other five groups were induced with OM for 7, 5, 3, 1, and 0 days, respectively. We undertook further steps to verify and quantify the uniformity of cell density. Specifically, we performed cell nucleus staining on the control group and on day 0, 1, 3, 5 and 7 groups after osteogenic induction. The staining results confirmed that there was no significant difference in the cell density among different groups (Fig. S1B).

On the seventh day, we collected the bright-field images of all the cell culture dishes in a unified way, and took random images continuously and manually under the optical phase contrast microscope (Olympus IX71). A 10-fold objective lens was used and the light intensity was set to 40%. The captured image files were saved in .jpg format. The standard output of the bright-field images was 1024 × 1360-pixels, with three channels (RGB). For the images taken in the experiment, the control and osteogenic differentiation images were taken in parallel. About 300 images were taken under each group. Then, we used the images collected from 8 dishes for training and validation sets, and the images collected from the other two dishes for independent test sets.

By running the python programs, each original image was cropped to the square window tiles of 50, 100, 200, 300, 400, and 500 pixels. These cropped images were subjected to a random data augmentation process (horizontal and flip vertical, random image rotation) before training, and then resampled to 224 × 224 pixels into the CNN models.

CNN Training

Neural network training was performed in the local server. The server was configured with a CPU (Intel Xeon Platinum 8358, 32 cores) and a NVIDIA A100 Tensor Core GPU (40 GB). The calculation was completed in the DL pre-configured environment based on Ubuntu 16.04. The code program was completed using Python 3.7.0, and the training framework was Pytorch 1.8.0.

Model Prediction Evaluation Indexes

The predictive performances of CNN models were quantitatively assessed using the area under the curve (AUC), the area under the precision-recall curve (AUPRC), accuracy (ACC), precision, recall, and F1-score. At the same time, the performances were visually displayed through the receiver operating characteristic (ROC) curve and confusion matrix.

Furthermore, ROC curves were plotted using false positive rate (FPR) and true positive rate (TPR). Specifically, ACC, precision, recall, F1-score, FPR and TPR can be defined as:

$${\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}$$
(1)
$${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}$$
(2)
$${\text{Recall}} = {\text{Sensitivity}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
(3)
$${\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{FP}} + {\text{TN}}}}$$
(4)
$$F1 = 2 \times \frac{{{\text{Precision}} \times {\text{Sensitivity}}}}{{{\text{Precision}} + {\text{Sensitivity}}}}$$
(5)
$${\text{FPR}} = \frac{{{\text{FP}}}}{{{\text{FP}} + {\text{TN}}}}$$
(6)
$${\text{TPR}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}}$$
(7)

where TP, TN, FP, and FN stand for true positive, true negative, false positive and false negative, respectively. The relation between FPR and TPR was shown in the ROC curve, and the AUC value of ROC was calculated.

Statistical Analysis

Data were expressed as the format of means ± standard deviation (s.d.) from at least triplicates. Statistical analysis was performed using SPSS software (version 13.0; Social Inc. IL, Chicago, USA). Statistical comparisons were carried out using SPSS software (version 13.0; Social Inc. IL, Chicago, USA) with two-way analysis of variance (ANOVA; Tukey or Dunnett) tests. Differences were considered to be significant when p values were below 0.05. Details of sample sizes and significance levels were given in figure legends.

Results

A DL-Based Model was Established Using Bright-Field Images Obtained from Osteogenic Differentiation Experiments of MSCs

To collect data for modeling early stages of osteogenic differentiation, we cultured MSCs in osteogenic and basal mediums and then took bright-field images on day 0, 1, 3, 5, and 7. To verify the osteogenic differentiation efficiency of different groups with traditional laboratory methods, RT-PCR ALP staining, and ALP activity were employed. As shown in Fig. 2A, the osteogenic markers, such as ALP, OPN, and RUNX2, was significantly upregulated on day 3 when MSCs were cultured in osteogenic medium (OM). Moreover, ALP staining and ALP activity were also significantly increased after 3 days in OM (Fig. 2B–D). Base on biological evidences, it indicated that MSCs demonstrated to osteogenic differentiation on day 3 when cultured in OM. The results consisted with previous results [14].

Fig. 2
figure 2

Differentiation of MSCs cultured in osteogenic medium (OM). A MSCs were cultured in OM for 0, 1, 3, 5, and 7 days, and qPCR confirmed the upregulation of ALP (n ≥ 3), OPN (n ≥ 3) and RUNX2 (n ≥ 3) after 3 days. B–D ALP activity (D, n ≥ 3), ALP staining and ALP intensity (C, n = 43) increased after three days of culture in OM compared with BM. Scale Bar: 200 μm. Data from at least three independent biological replicates experiments. All error bars indicate mean ± SE. The p values were obtained using one-way ANOVA followed by Tukey’s post hoc test

Optimal Window Size and Data Description

DL models required sufficient training to mine potential useful features, so a large number of training images were needed. The initial image size we took using the phase contrast microscope was too large (1024 × 1360 pixels) to enter the CNN model directly. Therefore, it was necessary to perform appropriate image cropping before inputting the model. Image cropping can also increase the number of training images. However, the optimal cropped window size needed to be explored. By default, we used the image data on day 7 and Resnet18 model to explore the appropriate cropped window [19]. As shown in Fig. 3A, the original image is cropped into square windows of 50, 100, 200, 300, 400, and 500 pixels. Table 1 shows the number of original images and cropped images by different sizes on day 7.

Fig. 3
figure 3

Search for the best window size. A Candidate window sizes on the original image. The original image size is 1024 × 1360 pixels, and the candidate window sizes are the square windows of 50, 100, 200, 300, 400, and 500 pixels. Scale bar: 200 μm. B The division of training set, validation set and independent test set. C The classification performances of different window sizes

Table 1 Data detail under different window sizes

In order to reduce the influence of random noise data, a 5-fold cross-validation experiment was used, as shown in Fig. 3B. The data were randomly divided into 5-folds, of which 4-folds were the training data and the remaining one was the validation set. In addition, a set of images from a separate culture dish and taken by the microscope were used independently as the test set. That is, the final ratio of training set, validation set and test set was 4:1:1. Additional experiments of training data size evaluation have also been implemented (Figs. S3S5), which shows that the reduction of less than 80% of the training images has little effect on the results of validation sets. The data on the training set needed further random horizontal and vertical flip and random rotation to obtain data augmentation. Then, the images are resized to 224 × 224 pixels and input into the Resnet18 model, and the classification results are shown in Fig. 3C. When the original images were copped into 400 × 400 pixels, the classification result was the best, while the classification result of the 100 × 100 pixels was the weakest. On the whole, the results of different cropped windows were not much different on the validation set.

Classification Performance of Different CNNs

After determining the optimal cropped window (400 × 400 pixels) for predicting osteogenic differentiation of MSCs using Resnet18, we further compared the performance of different classification models. A total of six CNN models were included in this study, namely VGG16, ResNet18, ResNet34, ResNet50, ResNet101 and DenseNet121 [29, 30]. The six models were also trained and validated with the images on day 7, and the original images were cropped to 400 × 400 tiles. The classification accuracies on the training and validation sets are shown in Fig. 4A. For these six models, DenseNet121 and ResNet18 show the best performance, and can obtain more than 0.95 classification accuracy on the validation set. Their loss curves can also converge quickly on the verification set, indicating that the two models are easy to achieve sufficient and stable training (Fig. 4B). The classification performances of VGG16, ResNet101 and ResNet50 were relatively weak, and the loss values were difficult to converge on the validation set. Therefore, for the MSCs osteogenic differentiation prediction, DenseNet121 and ResNet18 were recommended excellent models.

Fig. 4
figure 4

Classification performance of different models. A Comparison of accuracy curves of different CNN models on training set and validation set. B Comparison of cross-entropy loss values of different CNN models

Investigation for Predictable Times

Early identification of osteogenic differentiation of MSCs was of great significance for the biomedical applications. Therefore, after obtaining a DL model with good predictive performance using the data from day 7, we then investigated the earliest predictable time of osteogenic differentiation. Consistent with the time points of biological detection, we used a phase contrast microscope to photograph the cells of control and induce groups at 0, 1, 3, 5, 7 days, respectively. After the same data processing method was applied to the images collected at different time points, DenseNet121 model was used for training and validation. Fig. 5 shows the accuracy, cross-entropy loss and confusion matrix on different induction days. It can be clearly seen that at day 0, the model cannot effectively distinguish the control group from the induction group (the accuracy of the verification set was very low, and the cross-entropy loss was completely non-convergent). This was because the cells in the induction group at this time only replaced the osteogenic medium for image acquisition, and they were all in the undifferentiated state.

Fig. 5
figure 5

Investigation for the earliest predictable induction time. Consistent with the time point of biological detections, the cells in the control and the induction (0, 1, 3, 5, 7 days) groups were photographed by phase contrast microscope. All images in different groups were cropped into 400 × 400 pixels, and the DenseNet121 model was used for training and verification. The results showed that the DL model could effectively predict the osteogenic differentiation of MSCs when the induction time was only 1 day

At day 1, it could be found that the model has been able to successfully identify osteogenic differentiation cells and control groups (the classification accuracy was about 0.8), which was an interesting early prediction time. As a contrast, biological detection methods were completely unable to distinguish osteogenic differentiated cells at day 1. With the extension of the induction time, the DenseNet121 model could gradually obtain the higher accuracy values and smaller standard deviation on the validation set, and the cross-entropy loss values converged more stably, which was also consistent with the results presented by the biological detection indicators. In summary, DL could predict the fate of stem cells in only 1 day after induction, and the prediction results were more accurate for longer induction time.

Predictive Interpretability

We were interested in how the trained CNN model successfully predicted differentiation from early images of 1 day. In practice, training neural networks requires a lot of computation in a series of so-called ‘hidden’ layers [31]. Therefore, the intermediate calculation value in the hidden layer can be obtained and used to construct the intermediate image. Then, by converting the activation layer into pixels, the areas that CNN is actually paying attention to can be drawn. In this way, we can gain a deeper understanding of how CNN focuses on image content and how it performs classification tasks. The GradCAM saliency map of the model classification at day 1 was used to explore the basis of the DL model to distinguish osteogenic differentiating and undifferentiated cells, as shown in Fig. 6. It could be clearly observed that as the number of network layers deepens (from left to right), the regions of interest of the DL model tended to be global. At this time, the significant difference in global characteristics between differentiating and undifferentiated cells was the main basis for the successful prediction of CNN. This means that the basis for accurate classification was extracted from the overall characteristics (cellular morphology, arrangement direction, etc.) of the image, rather than limited to the characteristics of local cells.

Fig. 6
figure 6

GradCAM saliency map for different layers of the best model

Prediction Results on the Independent Test Set

Finally, the performance of the trained optimal model (DenseNet121) on the independent test set was verified. The images in the test set were collected from cell culture dishes that separated from the training data, which ensures the independence of the test results. The specific test results are shown in Fig. 7 and Table 2. At day 0, the trained DenseNet121 can be considered to have no classification performance on the test set. At day 1, it can be clearly seen that DenseNet121 exhibited preliminary classification ability, with ACC and AUC of 0.804 and 0.833, respectively. These results were achieved on the copped images of 400 × 400 pixels. If on the levels of the original images and cell culture dishes, the prediction results after a majority of votes can be almost completely correct. With the increase of the induction time, the performances of DenseNet121 on the independent test set were also gradually improved. At day 7, the classified ACC and AUC were 0.961 and 0.981, respectively.

Fig. 7
figure 7

The ROC curves of the best model on the independent test set

Table 2 The classification performance of the best model on the independent test sets

Discussion

In this paper, we demonstrate that the DL model can correctly identify osteogenic differentiation of MSC samples by learning on a series of bright-field images taken by a simple optical phase contrast microscope, which can only be detected by conventional biological methods before. At the crop-image level of 400 × 400 pixels, the prediction accuracy of the optimal DenseNet121 model on the independent test set could exceed 0.95. In addition, we set up a multi-time (0, 1, 3, 5, 7 days) control experiment to explore the predictable early time points of DL. We found that CNN could identify differentiated cells on the first day after osteogenic inducing, and the prediction accuracy at the crop-image level could exceed 0.80. This was significantly less time-consuming than biological methods to identify osteogenic markers on the day 3. The application of CNN in cell identification also had the advantages of continuous, automatic, real-time and lossless detection. In conclusion, through a simple microscope and neural network identification system, MSCs osteogenic differentiation can be detected more effectively.

The application of DL in the prediction of osteogenic differentiation of MSCs should emphasize its simplicity, convenience and standardization. First of all, the ordinary bright-fields images were taken by the transmitted light phase contrast microscope with a 10-fold objective lens, without the need to apply complex and expensive imaging protocols and strict acquisition techniques. Secondly, no additional cell pretreatment process was required before imaging, and the cells would not be contaminated and consumed, which significantly improved the convenience and safety in biological experiments. Finally, the detection was carried out in a very short period of time. After our experiment, the captured image was standardly cropped into 400 × 400 pixels, which was the best detection window. The standard-sized images after cropping could be directly tested using the trained CNNs. In the case of using GPU, the test process was fast enough to synchronously process the image and display the test results during the image acquisition process. This indicated that the DL detection system can support high-throughput and real-time detection of cell differentiation.

Furthermore, a total of six DL models (VGG16, ResNet18, ResNet34, ResNet50, ResNet101, and DenseNet121) were used for differentiation detection. Among them, DenseNet121 and ResNet18 showed the great performances, while ResNet34, ResNet50, ResNet101 and VGG16 had relatively weak classification performances. Although the performances of the six models were different, they can all achieve relatively high detection accuracy, which indicated that our detection results did not have model contingency. The different performance between different models had certain interpretability. Firstly, VGG16 cannot achieve a deeper network architecture because it did not have skip connection, and its feature extraction ability was weak [32]. ResNet34, ResNet50, ResNet101 network layers were deeper and had a larger number of parameters. Their low performance showed that too many parameters were not suitable for MSC osteogenic differentiation prediction tasks, which could easily cause parameter redundancy. Although the number of layers of DenseNet121 was deep, the mechanism of dense connection greatly reduced its parameter quantity and computational cost, which was close to the parameter quantity of ResNet18, and had stronger feature extraction ability [33]. Therefore, DenseNet121, which had both lightweight and feature utilization efficiency, was recommended as the optimal model for MSC osteogenic differentiation.

In addition, the experimental results at multiple time points showed that the predictive performance of CNNs was gradually improved with the prolongation of induced differentiation time, which was more consistent with the difference of traditional biological detection indicators and was in line with expectations. However, it was worth noting that we found that CNN could predict more accurately on day 1 (the prediction accuracy at the crop-image level can exceed 0.80), while traditional biological detection methods can only detect cell differentiation on day 3. The advance of the detection time point was very helpful for potential biomedical applications. We were interested in how CNN successfully predicted differentiation from early images of day 1. The GradCAM diagram of the model classification showed that the global characteristic difference between differentiated and undifferentiated cells was the main basis for the successful prediction of CNN.

The global features of the image more reflected the consistent changes of MSCs, such as changes in the overall cell morphology, arrangement direction, etc. This interesting result indicated that when osteogenic differentiation occurs, the overall morphology of MSCs may first have some subtle changes, which were detailed information that cannot be observed by the naked eyes. Further quantitative results of cell morphology indicate a shift from an elongated to a more polygonal cell shape (Fig. S6), which is characteristic of cells undergoing osteogenic differentiation. The results of Lan et al. also showed the same pattern of cell morphological changes [34]. Therefore, the global cell morphological changes are most likely to be the prediction basis of our DL models. Amazingly, as shown in Fig. S7, DL model can achieve a classification accuracy close to 1 (0.992) at a lower cell density, which obviously exceeds the results under denser conditions (0.804). We found that at lower densities, the change of cell morphological features was more distinct (Fig. S6). The better results at lower cell density also verified the interpretability analysis of classification based on global cell morphological changes.

The potential of DL to predict differentiation is rooted in the ability of DL algorithms to identify subtle, yet significant, changes in cellular morphology [35]. These early indicators are often imperceptible to conventional analysis but can be detected through the high-dimensional feature extraction capabilities of DL models. Zhu’s study demonstrated that while traditional biological markers require 5–7 days to detect neural differentiation, their DL model, using unlabeled single-cell brightfield images, was able to accurately predict NSC differentiation within just 0.5 to 1 day [28]. Drawing a parallel to their work, by training DL models on datasets including early changes of differentiation, we can detect subtle morphological changes of indicating osteogenesis. Similar to the purpose of Zhu’s research, this advance could revolutionize the way researchers approach differentiation detection of MSCs, provide rapid and accurate predictions that enhance both research efficiency and clinical applicability.

Some precautions should also be mentioned. First, CNNs were successfully introduced to a limited case of stem cell differentiation test. The internal effectiveness of the training networks was demonstrated through independent tests, but the performance of models under external validation (other cell types, microscope brands, and/or cell culture environments) remained to be further evaluated. Secondly, the DL field was developing rapidly, and there were many other CNN networks with different architectures and novel model architectures different from CNN [36], Transformer [37], Diffusion model [38] worthy of attention. We have achieved good results on the more mature CNN models, but it is possible to apply the latest development of the DL model to obtain better performance. Finally, our DL models also hold the potential for adaptation to other differentiation pathways. The key lies in the further exploration and comprehensive utilization of the cell morphological changes under different differentiation directions. The specific morphological changes associated with distinct differentiation pathways may vary, necessitating adjustments in the model architecture and training dataset. These potential research directions are interesting and obviously valuable, and are worth further exploration in future works.

Conclusion

In conclusion, the hypothesis that CNN can accurately predict the osteogenic differentiation of MSCs from bright-field images at a very early inducing stage was verified. Among the trained CNN models, DenseNet121 was recommended as the optimal model with excellent prediction accuracy (0.961 for day 7) at the cropped image level. In addition, the ability of CNNs to predict osteogenic differentiation can be shortened to 1 day. It was hard to think of any other biological experiment that can confirm differentiation in such a short time with such precision and at such a low cost. The interpretability analyses showed that the global cell morphological characteristics of MSCs were the main basis for successful classification. Our model could effectively predict MSC images cross a range of cell densities within 1 day of induction, and the classification accuracy would be higher at a lower cell density due to more distinct cell morphological changes. Therefore, this study preliminarily proved the application value and promising prospect of DL-based techniques in predicting osteogenic differentiation of MSCs.