Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Using visual attention estimation on videos for automated prediction of autism spectrum disorder and symptom severity in preschool children

  • Ryan Anthony J. de Belen ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    r.debelen@unsw.edu.au

    Affiliation School of Computer Science and Engineering, University of New South Wales, New South Wales, Australia

  • Valsamma Eapen,

    Roles Data curation, Funding acquisition, Writing – review & editing

    Affiliation School of Psychiatry, University of New South Wales, New South Wales, Australia

  • Tomasz Bednarz,

    Roles Supervision, Writing – review & editing

    Affiliation School of Art & Design, University of New South Wales, New South Wales, Australia

  • Arcot Sowmya

    Roles Supervision, Writing – review & editing

    Affiliation School of Computer Science and Engineering, University of New South Wales, New South Wales, Australia

Abstract

Atypical visual attention in individuals with autism spectrum disorders (ASD) has been utilised as a unique diagnosis criterion in previous research. This paper presents a novel approach to the automatic and quantitative screening of ASD as well as symptom severity prediction in preschool children. We develop a novel computational pipeline that extracts learned features from a dynamic visual stimulus to classify ASD children and predict the level of ASD-related symptoms. Experimental results demonstrate promising performance that is superior to using handcrafted features and machine learning algorithms, in terms of evaluation metrics used in diagnostic tests. Using a leave-one-out cross-validation approach, we obtained an accuracy of 94.59%, a sensitivity of 100%, a specificity of 76.47% and an area under the receiver operating characteristic curve (AUC) of 96% for ASD classification. In addition, we obtained an accuracy of 94.74%, a sensitivity of 87.50%, a specificity of 100% and an AUC of 99% for ASD symptom severity prediction.

Introduction

Autism spectrum disorders (ASD) are currently being diagnosed through visual observation and analysis of children’s natural behaviours. While a gold standard observational tool is available, early screening of ASD in children still remains a complex problem. It is often expensive and time-consuming [1] to conduct interpretative coding of child observations, parent interviews and manual testing [2]. In addition, differences in professional training, resources and cultural context may affect the reliability and validity of the results obtained from a clinician’s observations [3]. Furthermore, the behaviours of children in their natural environments (e.g., home) cannot be typically captured by clinical observation ratings. To reduce waiting periods for access to interventions, it is important to develop new methods of ASD diagnosis without compromising accuracy and clinical relevance. This is critical because early diagnosis and intervention can provide long-term improvements for the child and even have a greater effect on clinical outcomes [4].

Recent advances in technology have allowed for the quantification of different biological and behavioural markers that are useful in ASD research (see [5,6] for reviews). Eye-tracking technology has shown promise in providing a non-invasive and objective tool for ASD research [7,8]. Several eye-tracking studies have identified unique visual attention patterns in ASD individuals. Gaze abnormalities in toddlers (<3-year-olds) include reduced attention to eye and head regions, reduced preference for biological motion, difficulties in response to joint attention behaviours [9] and scene monitoring challenges during explicit dyadic cues [10]. Pierce, et al. [11], Pierce, et al. [12] and Moore, et al. [13] developed a geometric preference (“GeoPref”) test that contains both geometric and social videos. It was found that a subset of ASD participants exhibited a visual preference for geometric motion. This finding has already been leveraged by a growing number of studies that aim to leverage atypical visual attention to identify individuals with ASD [14,15] and predict symptom severity [16].

Computational models that predict visual attention (i.e., saliency) have seen tremendous progress, starting from handcrafted features dating back to 1998 [17] to a resurgence of deep neural networks (DNNs) [18,19]. This breakthrough has generated great interest in utilising saliency prediction as a diagnostic paradigm for ASD. For example, there is a growing collection of eye movements of ASD children recorded during image-[2022] and video-[22] viewing tasks. Although the use of saliency detection models on image datasets has resulted in remarkable diagnostic performance, there is still a lack of diagnostic paradigms that utilise dynamic saliency detection. In fact, the most common approach of studies that utilise dynamic stimuli is to convert the eye-tracking data into an image and perform image classification to identify individuals with ASD. In this work, we present a novel pipeline that leverages the dynamic visual attention of humans for ASD diagnosis, as well as symptom severity prediction.

This paper makes three major contributions to the field. First, we implement a data-driven approach to learn the dynamic visual attention of humans on videos and extract spatiotemporal features for downstream tasks (e.g., ASD classification and symptom severity prediction). Second, we develop a novel computational pipeline to diagnose ASD based on the learned features from dynamic visual stimuli. Finally, we use a similar method to predict the level of ASD-related symptoms from eye-tracking data of children obtained during a free-viewing task. In the next section, we discuss published works that are related to ours. Despite the growing literature, it is evident that the comparison of results is challenging due to the lack of publicly available datasets and open-source code repositories. This is even further complicated by the differences in the participants, age group and stimuli used in the experiments, making fair and straightforward performance comparisons more difficult. Nevertheless, we compare our work with a simple thresholding technique [1113] and a machine learning (ML) classification approach using handcrafted features [23,24].

Related works

Over the last decade, different behavioural and biological markers have already been quantified, to some extent, using computer vision methods (a comprehensive review [5] is available). Various data modalities, such as magnetic resonance imaging (MRI)/functional MRI [2530], eye-gaze data [14,3136], stereotyped behaviours [3742] and multimodal data [43] have been utilised in autism diagnosis. We first provide a review of publicly available datasets that utilise the eye-tracking paradigm. Afterwards, related works that utilise eye-tracking data for the following purposes are reviewed: (i) saliency prediction in ASD, (ii) ASD diagnosis using static stimuli, (iii) ASD diagnosis using dynamic stimuli and (iv) ASD risk and symptom severity prediction. Each purpose has a corresponding table that includes the following information about the published research: mean age of the participants, gender distribution, stimuli and input used, methodology and conclusion. While not as exhaustive and rigorous in inclusion criteria as a systematic review, we hope that our discussion below will help the readers navigate the research landscape and better situate our work in the literature. Readers are also encouraged to read systematic reviews [8,44] for additional reference.

Publicly available datasets

There is a growing number of publicly available datasets that capture the eye-tracking data of ASD participants. In Table 1, we provide a summary of these datasets by providing descriptions of their target application area, the mean age of the participants, sample size, stimuli used and data format provided by the authors. There are two datasets for saliency estimation [20,21] and two datasets for ASD classification [22,45].

thumbnail
Table 1. List of publicly available datasets and their corresponding application area, mean age, sample size, stimuli and data format provided by the authors.

https://doi.org/10.1371/journal.pone.0282818.t001

Saliency prediction in ASD

Accurately predicting the visual attention (i.e., saliency maps) of ASD individuals can boost prediction performance because classification models can better leverage the distinction between the visual attention of ASD and typically developing (TD) individuals. Table 2 shows the published research that aims to model the visual attention of ASD participants by developing different saliency models.

Duan, et al. [46] compared the performance of five state-of-the-art (SOTA) saliency prediction networks based on a deep neural network (DNN) architecture with pre-trained and fine-tuned weights on their dataset. Experimental results revealed that transfer learning provides a useful approach to modelling visual attention on images for individuals with ASD. Duan, et al. [47] combined high-level features (e.g., face size, facial features, face pose and facial expressions) and feature maps extracted from the SOTA saliency models to quantify visual attention on human faces in ASD. Their proposed approach reported higher performance when compared to other saliency models.

The remaining works used the Saliency4ASD dataset [20,21] for saliency estimation. For example, Fang, et al. [48] used U-net trained on a novel loss function for semantic feature learning, resulting in improved performance on some metrics. Wei, et al. [49] proposed a novel saliency prediction model for children with ASD. The fusion of multi-level features, deep supervision on attention maps and the single-side clipping operated on ground truths provided a boost in saliency prediction. Nebout, et al. [50] proposed a Convolutional Neural Network (CNN) with a coarse-to-fine architecture and trained using a novel loss function, achieving the best performance on most metrics when compared to general saliency models. Fang, et al. [51] proposed a model consisting of a spatial feature module and a pseudo-sequential feature module to generate an ASD-specific saliency map. Their model achieved the best performance on most metrics when compared to general saliency models and ASD-specific saliency models [4850]. Finally, Wei, et al. [52] proposed a DNN architecture that enhances multi-level side-out feature maps using a scale-adaptive coarse-and-fine inception module. In addition, they designed a novel loss function to fit the atypical pattern of visual attention, resulting in SOTA performance.

This growing evidence suggests that researchers are starting to develop computational models that mimic the atypical visual attention on images of ASD individuals. However, there is still a huge gap in prediction performance as saliency prediction models trained on TD individuals do not generalise well on ASD individuals, as highlighted by Le Meur, et al. [22]. They revealed that current models trained on a TD dataset and fine-tuned on an ASD dataset perform well only on a small part of the ASD spectrum. To this end, they proposed two new eye-tracking datasets that cover a large part of the ASD spectrum.

Eye-tracking on static stimuli for ASD diagnosis

As discussed in the previous section, it has been found that ASD participants exhibit atypical visual attention. As shown in Table 3, researchers explored the possibility of using the eye-tracking paradigm during image-viewing tasks to identify individuals with ASD. The earliest works explored different handcrafted features and ML models for ASD diagnosis. For example, Wang, et al. [54] used features extracted from images followed by a Support Vector Machine (SVM), while Yaneva, et al. [55] explored logistic-regression classification algorithms for detecting high-functioning ASD in adults. Liu, et al. [34] proposed a ML framework based on the frequency distribution of eye movements recorded during a face recognition task to identify individuals with ASD. The recent advances in deep learning (DL) also helped researchers better extract discriminative features from images. For example, Jiang and Zhao [33] used a DL approach followed by an SVM to distinguish individuals with ASD.

thumbnail
Table 3. Eye tracking on static stimuli for ASD diagnosis.

https://doi.org/10.1371/journal.pone.0282818.t003

The succeeding works used the Saliency4ASD dataset [20,21]. Startsev and Dorr [56] and Arru, et al. [57] extracted features from the eye-tracking data and the input image and trained a random forest for ASD classification. Their analysis revealed that images that contain multiple faces provide significant differences in visual attention between ASD and TD individuals. Wu, et al. [58] proposed two machine learning approaches based on synthetic saccade generation and image classification with similar performance in terms of accuracy and AUC. Tao and Shyu [59] proposed a combination of CNN and long short-term memory (LSTM) networks to classify ASD and TD individuals. Exploiting a similar architecture, Chen and Zhao [43] proposed a multimodal approach to utilise information from behavioural modalities captured during photo-taking and image-viewing tasks, resulting in higher performance in both modalities. Using an additional dataset that contains people looking at other people/objects in the scene, Fang, et al. [60] proposed a DNN that achieved a higher accuracy when compared to a previous model [33]. Rahman, et al. [61] used several saliency prediction models and compared the performance of SVM and XGBoost. Observing that not all images highlight significant differences in visual attention between ASD and TD participants, Xu, et al. [62] used structural similarity between ASD and TD saliency maps to identify a subset of images in which a new bio-inspired metric was applied to identify ASD participants. Wei, et al. [63] proposed a dynamic filter and spatiotemporal feature extraction for ASD diagnosis, achieving the highest accuracy and similar specificity and AUC scores when compared to previous models [5659]. Liaqat, et al. [64] proposed two ML approaches that include a branched MLP approach and an image-based approach for ASD classification and found that the latter approach resulted in slightly better performance. Mazumdar, et al. [65] extracted different handcrafted and DL features and compared 23 ML algorithms to identify individuals with ASD. Their results were among the top 4 performing models across different metrics when compared to previous models [56,59,64].

Eye-tracking on dynamic stimuli for ASD diagnosis

Prior research explored the possibility of using the eye-tracking paradigm during video-viewing tasks to identify specific neurological disorders. For example, Tseng, et al. [66] extracted low-level features from eye movement recorded from 15 minutes of videos and used an ML model to identify participants with attention deficit hyperactivity disorder, fetal alcohol spectrum disorder and Parkinson’s disease. Although this work did not include ASD classification, it accentuates the efficacy of using eye-tracking on dynamic stimuli to identify the mental states of participants.

As shown in Table 4, there are recent works that utilise dynamic stimuli to differentiate ASD from TD subjects. Wan, et al. [67] investigated the difference in fixation times between ASD and TD children watching a 10-second video of a female speaking. Their results revealed that fixation times at the mouth and body could significantly discriminate ASD from TD with a classification accuracy of 85.1%. Jiang, et al. [68] collected eye-tracking data during a dynamic affect recognition evaluation task, extracted handcrafted features and used a random forest classifier to identify ASD individuals. Zhao, et al. [69] collected eye-tracking data during a live interaction with an interviewer, extracted handcrafted features and employed four ML classifiers to identify individuals with ASD. These prior studies rely on handcrafted features that may provide less discriminative information between TD and ASD individuals.

thumbnail
Table 4. Eye tracking on dynamic stimuli for ASD diagnosis.

https://doi.org/10.1371/journal.pone.0282818.t004

Numerous studies employed an image classification approach based on a published dataset that contains the visualisation of eye-tracking data (i.e., scanpath images) of the participants during the experiment [45]. For example, Carette, et al. [45,70] used the raw pixel values as features and compared ML and DL algorithms for ASD classification. Their results revealed that DL algorithms achieved the highest performance when compared to ML models. Elbattah, et al. [71] trained a deep autoencoder and used a k-means clustering approach on the learned latent representation to identify clusters of participants. Their analysis revealed that an identified cluster contained a high percentage of ASD participants, suggesting that the algorithm can be used for ASD classification. Using a similar unsupervised learning approach, Akter, et al. [72] performed k-means clustering to divide the dataset into 4 groups and compared different ML models to identify participants with ASD. Cilia, et al. [73] used CNN and a fully-connected layer to predict ASD participants. Similarly, Kanhirakadavath and Chandran [74] compared Principal Component Analysis (PCA) and CNN for feature extraction and different ML and DL models for ASD classification. Gaspar, et al. [75] performed additional image augmentation to generate more training data. Afterwards, they used a kernel extreme learning machine optimised using the Giza Pyramids Construction metaheuristic algorithm to identify ASD individuals. Their approach achieved higher performance when compared to ML approaches. Ahmed, et al. [76] compared ML, DL and a combination of both approaches for ASD diagnosis. The results in these prior studies suggest that DL models for feature extraction and ASD classification perform better when compared to traditional ML approaches.

There are also prior studies that explored the use of dynamic stimuli that are effective in evoking significant differences in visual attention of ASD and TD participants. For example, de Belen, et al. [14] used the GeoPref Test [11,12] in EyeXplain Autism, a system for eye-tracking data analysis, automated ASD prediction and interpretation of deep learning network predictions. Recently, Oliveira, et al. [15] used similar video stimuli, trained a visual attention model and utilised an ML model to identify individuals with ASD. Fan, et al. [77] and Fang, et al. [78] used biological motion stimuli and different ML classifiers for ASD diagnosis. Using a stimulus for initiating joint attention, Carette, et al. [79] extracted features related to saccadic movement (e.g., amplitude, velocity, acceleration) and trained an LSTM network to predict three diagnostic groups (i.e., ASD, TD, unclassified). Putra, et al. [80] collected eye-tracking data during Go/No-Go tasks, identified spatial and auto-regressive temporal gaze-related features that differ significantly between ASD and TD participants and applied an AdaBoost meta-learning algorithm to identify participants with ASD.

Although previous studies utilised dynamic stimuli, the most common approach was to convert the participant’s eye-tracking data into an image, potentially losing spatiotemporal information that can be leveraged for classification. In addition, this approach disregards the pixel information around the fixation, a crucial insight into what part of the stimuli attracts human attention. In this paper, we propose a DNN approach that utilises dynamic saliency prediction to identify individuals with ASD.

While previous works have investigated the feasibility of leveraging visual attention in identifying individuals with ASD, limited research has been conducted to explore the effectiveness of exploiting the dynamic visual attention of the participant in ASD classification. Our approach utilises eye-tracking data captured during a dynamic stimulus viewing task. Our approach follows a similar deep learning framework reported in the literature [33], however it provides an extension from static stimuli, widening the diagnostic paradigm to include dynamic stimuli.

Eye-tracking in ASD risk and symptom severity prediction

Although there has been a great deal of research on the use of eye-tracking in ASD diagnosis, relatively little research focus on other applications, such as automatically predicting the risk of ASD (e.g., low, medium and high) and symptom severity, as shown in Table 5. Nevertheless, previous studies provide insights into the potential use of eye tracking in symptom severity prediction. For example, Kou, et al. [81] found that a reduction in visual preference for social scenes is significantly correlated with the ADOS social affect score, which may be useful in severity prediction. On the other hand, Bacon, et al. [82] found that a higher visual preference of toddlers for geometric scenes is significantly correlated with later symptom severity at school age, further suggesting the clinical utility of eye tracking for ASD symptom severity prediction.

thumbnail
Table 5. Eye tracking in ASD risk and symptom severity prediction.

https://doi.org/10.1371/journal.pone.0282818.t005

Recently, Revers, et al. [16] trained two computational models [83] to generate saliency maps of severe and non-severe groups and used the RELIEFF algorithm [84] to select the most important features for classification. Afterwards, a neural network was trained to identify symptom severity for each fixation made by the participant. The final prediction is considered to be severe if more than 20 fixations were classified as severe by the trained neural network. Their approach obtained an average accuracy of 88%, precision of 70%, sensitivity of 87% and specificity of 60% in predicting symptom severity.

In a slightly different problem, Canavan, et al. [23] and Fabiano, et al. [24] proposed a method for predicting ASD risk using eye gaze and demographic feature descriptors (e.g., age and gender). Handcrafted features, such as average fixation duration and average velocity, were tested on four different classifiers, namely random forests, decision trees, partial decision trees and a deep forward neural network. Although their results with a maximum classification rate of 93.45% are promising, it is crucial to compare their handcrafted features to features learned by modern deep learning models and determine if the latter improves the risk prediction accuracy. In this paper, we present the same DNN approach we used in ASD classification to predict the level of ASD-related symptoms.

Materials and methods

In this work, we used a data-driven approach to extract rich features learned from a dynamic stimulus to identify participants with autism and predict the level of ASD-related symptoms. In Fig 1, an overview of the proposed approach is provided. The method is divided into different stages, including eye-tracking data collection, dynamic saliency detection trained on the difference of fixations between ASD and TD individuals, and SVM-based classification and severity prediction. This study was approved by the Human Research Ethics Committee of the University of New South Wales. Written informed consent was obtained from the parents/legally authorised representatives of the participants. All methods were carried out in accordance with relevant guidelines and regulations.

thumbnail
Fig 1. Overview of the proposed feature learning/extraction, classification and symptom severity prediction approach.

(a) Given a video input, per-frame features are learned using an end-to-end approach to predict the difference of fixation (DoF) maps. (b) Extracted features at fixated pixels from each fixation stage are cascaded and passed on to an SVM to identify individuals with ASD and predict the level of ASD-related symptoms.

https://doi.org/10.1371/journal.pone.0282818.g001

Eye-tracking

Participants.

There were 57 children (9 females) in the ASD group and 17 children (9 females) in the TD group. Participants were matched by their age at the time of the study. 24 children in the ASD group were recruited from an Autism Specific Early Learning and Care Centre (ASELCC) and 33 children were recruited from the Child Development Unit (CDU) of a Children’s Hospital. The TD children were recruited from a children’s services preschool. All participants in the ASD group met the criteria for ASD based on the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) [85] criteria and the diagnosis of ASD was confirmed using the Autism Diagnostic Observation Schedule (ADOS), Second Edition [86]. Of the 57 ASD children, there were 24 who showed high ASD-related symptoms and 33 had moderate symptoms. There are no specific exclusion criteria for the ASD group in this study. The TD group’s exclusion criteria included known neurodevelopmental disorders, significant developmental delays and known visual/hearing impairments. No child had any visual acuity problems.

Dynamic stimulus.

We used the GeoPref Test [11,12] dynamic stimulus, which has been shown to be an effective stimulus for detecting ASD subgroups. This stimulus consists of dynamic geometric images (DGIs) on one side and dynamic social images (DSIs) on the other. The DGIs were constructed from recordings of animated screen-saver programs. The DSIs were produced from a series of short sequences of children performing yoga exercises. It included images of children performing a wide range of movements (e.g., waving arms and appearing as if dancing). The stimulus contained a total of 28 different scenes and was presented in order, based on the originally published stimulus [11,12]. It has a resolution of 1281 x 720 pixels and contains a total of 1,488 frames, which is equivalent to 61 seconds of video playback.

Eye-tracking apparatus and procedure.

Participants were tested using the Tobii X2-60 eye tracker and eye-tracking data was processed using Tobii Studio software to identify fixations and saccades. Eye movements were recorded at 60 Hz (with an accuracy of 0.5°) during the dynamic stimuli viewing. Each participant was seated approximately 60 cm in front of a 22” monitor with a video resolution of 1680 x 1050 pixels in a quiet room and shown dynamic visual stimuli in full-screen. A built-in five-point calibration in Tobii studio was completed before administering the task for accurate eye gaze tracking. The calibration procedure required gaze following on an image of an animal paired with auditory cues, starting with the centre of the screen, and moving across the four corners of the screen. The eye-tracking procedure was conducted during a clinical assessment or the intake assessment for entry to an early intervention program. Multiple attempts were made to ensure that the eye tracker had been calibrated properly for accurate data collection. Multiple attempts were also made to ensure that the participants were engaged during the experiment. As a result, depending on the capacity of the child, the procedure was conducted over 2 to 3 sittings or with smaller breaks in between. The overall clinical assessment and eye-tracking procedure were completed in approximately 2.5h per participant.

Data processing and statistical analysis.

Tobii Studio’s I-VT filter [87] was used to process the raw eye-tracking data, exclude random noise and define fixations for further analysis. More specifically, short fixations (<100ms) were discarded and adjacent fixations (75ms, 0.5°) were merged. Trials were excluded if the total fixation duration was less than 15 seconds. That is, to be included, the participant should be looking at the stimulus for approximately 25% of the entire video duration. Once included, the eye-tracking data captured during the entire length of the stimulus are used for training and evaluation.

A calibration quality assessment was performed to rule out the possibility of eye-tracking data quality as a confounding factor. In this assessment, a toy accompanied by a sound was used to attract the participants’ gaze to the calibration point in the middle of the screen. The mean distance between the detected fixation locations and the calibration point was calculated as a measure of accuracy. A t-test showed no significant difference between the groups, suggesting that data quality did not differ between the two groups: t(64) = -0.445, p = .658, ASD: 45.89 pixels (22.67), TD: 48.76 pixels (19.00).

An additional data quality assessment was performed to determine the overall nature of the visual attention of the participants to the stimuli. A t-test showed no significant difference in visual attention between groups: t(72) = 0.011, p = .991, ASD: 37.13 seconds (12.03), TD: 37.10 seconds (8.07). These analyses of quality suggest that it is unlikely that differences in data quality and general visual attention influenced the results.

An independent-samples t-test was used to investigate differences in visual attention across two groups for diagnosis (ASD vs. TD) and severity prediction (moderate vs. severe). All statistical analysis was performed in IBM SPSS Statistics Version 26.

Computation of per-frame saliency maps

Saliency detection models are typically optimised to detect salient features in a scene. They are trained on a probability distribution of eye fixations, called the fixation map. The per-frame fixation maps of each participant group were generated from the eye movement data collected in the study. For a given frame, all fixation points of the children in each group were overlaid in a binary map, in which the fixation points were set to 1 on a black background (value set to 0). The resulting per-frame fixation maps were smoothed with a Gaussian kernel (bandwidth = 1°) and normalised by the sum to generate per-frame visual attention heatmaps (labelled ASD and TD heatmaps in Fig 2).

thumbnail
Fig 2. Overview of the computation of difference of fixation (DoF) maps.

On the left, the dynamic stimulus is analysed per frame. In the middle, the eye-tracking data in each participant group are aggregated and the difference is computed for each frame. On the right, the TD heatmaps are in white, while the ASD heatmaps are in black.

https://doi.org/10.1371/journal.pone.0282818.g002

Computation of per-frame difference of fixation (DoF) maps

Similar to Jiang and Zhao [33], our network was optimised on the difference of fixation (DoF) maps, highlighting the difference in visual attention between TD and ASD individuals. Since our approach uses a dynamic stimulus, we predict DoF maps on each frame. In particular, let I+ and I be the fixation maps for the ASD and TD groups, respectively. The DoF map of a frame is computed as: where I = −I+ is a pixel-wise subtraction of fixation maps and σI represents the standard deviation of I.

The resulting DoF maps highlight the difference in visual attention between ASD and TD individuals (refer to Fig 2). The white regions of the DoF map illustrate the visual attention of TD individuals while the black regions are for ASD individuals. Note that this is the opposite of the DoF computation elsewhere [33]. This also resulted in better training performance compared to DoF maps that highlight more fixations of the ASD group.

Per-frame prediction of difference of fixation maps

As shown in Fig 3, ACLNet [88], one of the best models available for dynamic saliency detection, is used for feature extraction. It consists of a CNN-LSTM network with an attention mechanism to enable fast, end-to-end saliency prediction. Since ACLNet already contains an attention network trained on TD individuals, we trained and fine-tuned our model with DoF maps that highlight more fixations of the TD group.

thumbnail
Fig 3. Overview of the approach for learning the difference of fixation (DoF) maps.

https://doi.org/10.1371/journal.pone.0282818.g003

Our model was optimised using the following loss function [89] which considers three different saliency evaluation metrics instead of the binary-cross entropy loss used before [33]. We denote the predicted difference of fixation map as Y∈[0,1]28×28 and the ground truth saliency map as Q∈[0,1]28×28. Our loss function combines Kullblack-Leibler (KL) divergence, the Linear Correlation Coefficient (LCC) and the Normalised Scanpath Saliency (NSS) similar to prior work [88]:

LKL is widely used for training saliency models and is computed by:

LLCC measures the linear relationship between Y and Q: where cov(Y, Q) is the covariance of Y and Q while σ is the standard deviation.

LNSS is defined as: where and N = ∑xQ(x). It computes the mean of scores from the normalised saliency map at the predicted DoF maps Y.

Training protocol

Our classification and severity prediction models are iteratively trained with sequential DoF maps and image data. We train the model by using a loss defined over the predicted dynamic saliency maps from convLSTM. Let and denote the predicted dynamic saliency maps and continuous difference of fixation maps. We minimise the following loss:

The parameters of ACLNet are initialised to the pre-trained parameters [88]. The network is then fine-tuned on the current dataset.

ASD classification and symptom severity prediction

Once the model has been trained to predict DoF maps of ASD and TD individuals from a given dynamic stimulus, feature extraction and classification are performed, with Fig 4 illustrating the process [14]. Based on the eye-tracking data, we determined the fixation positions and the corresponding frames in which they were recorded. Each saccade-fixation pair was considered a fixation stage. For each fixation stage, features were extracted from the corresponding fixation position on the feature map obtained from the convLSTM output (note that the convLSTM output is upsampled 4 times before extracting the feature map). More specifically, given a frame where a fixation has been identified, the feature map at the corresponding fixation is extracted, which results in a 256-dimensional feature vector at each fixation. For a corresponding number of fixation stages, feature vectors for all fixations are concatenated in their temporal order starting from the first fixation to the last fixation stage. This serves as the feature space in which classification is performed. If there were fewer identified fixations, zeros are appended at the end. We explored the number of fixation stages that provided the best performance.

thumbnail
Fig 4. Overview of the approach for feature extraction and classification.

https://doi.org/10.1371/journal.pone.0282818.g004

A linear decision boundary between ASD and TD individuals was determined by training an SVM on the extracted features. In addition, another SVM model was trained on the DoF maps of moderate and high ASD individuals to predict autism severity. We used the ADOS-2 calibrated severity scores (CSS) as ground truth to determine the ASD severity. Participants with ADOS CSS of 5–7 are considered to have moderate symptoms, while those with ADOS CSS of 8–10 are considered to have more severe (high) symptoms.

Experimental setup

Training and testing protocols.

We report the model’s performance on ASD classification and symptom severity prediction using leave-one-out cross-validation (LOOCV). Given the unbalanced nature and the limited number of samples in the dataset, LOOCV is used to provide an almost unbiased estimate of the probability of error [90]. In addition, it allows us to maximise the number of training samples per fold unlike in a k-fold validation approach. While a stratified k-fold cross-validation strategy may account for the group imbalance that is present in our dataset, it results in smaller training samples per fold. However, removing a single sample from the training set done in LOOCV also does not drastically change the class distribution. The combination of being able to use as much training data as possible while also maintaining similar class distribution was the reason why we used LOOCV. The same evaluation approach has been employed in prior studies [14,33,34,43,68,69] in this application area.

Implementation details.

We implemented our model in Tensorflow with Keras and Scikit-learn libraries. During the training phase, we fine-tuned the network with Adam optimizer and a batch size of one image for a total of 20 epochs. The learning rate was set to 0.0001. We did not perform any dropout and data augmentation. L2 regularisation with the penalty parameter C = 1 was used for SVM classification.

Evaluation metrics.

We report on the performance of our model in terms of accuracy, sensitivity (i.e., true positive rate) and specificity (i.e., true negative rate) recorded at different numbers of fixations. Once the best number of fixations to be included in the classification was identified, the area under the receiver operating characteristic (ROC) curve and the confusion matrix were also computed. To obtain a meaningful area under the ROC curve (AUC) in an LOOCV, the output probability of the SVM for each fold (each consisting of just one subject) was saved and the AUC was computed on the set of these probability estimates. The computation of the confusion matrix was performed similarly using the predicted class to compare with the ground truth label.

Computational load.

The entire training procedure for each video stimulus takes about 1 hour with two NVIDIA 2080 Super and a 3.5GHz Intel processor (i7-7800X CPU). Once the model has been trained, feature extraction and SVM classification can be performed in less than 1 minute.

Results

Datasets

Children with ASD had a mean age of 4.63(standard deviation (SD) = 0.80) years and TD participants also had a mean age of 4.61 (SD = 0.47) years. There was no significant difference in age between the ASD and TD groups, t(72) = 0.009, p = 0.993.

Eye-tracking data analysis

ASD classification.

It was previously shown that ASD individuals with severe symptoms tend to fixate more on the geometric stimuli than the social stimuli [11,12]. Shown in Fig 5 are the %Geo values, the percentage of time spent looking at the dynamic geometric stimuli. %Geo values are computed by dividing the total fixation duration on the geometric stimuli by the total fixation duration on both geometric and social stimuli. Independent-samples t-test was used to compare %Geo for each diagnostic group. Similar to published results elsewhere [1113], ASD participants in our study were significantly more attracted to dynamic geometric images when compared to TD participants (t = 2.11, p < .0386). On average, the ASD group spent 49.37% (standard deviation (SD) = 24.14%) of their attention looking at the dynamic geometric images, while the TD group spent 35.97% (SD = 18.58%) of their attention.

thumbnail
Fig 5. Comparison of the percentage of time spent looking at the dynamic geometric stimuli (%geo) between TD and ASD participants.

Each box plot contains the interquartile range, the x marker corresponds to the mean value and the horizontal line inside corresponds to the median. Each sample is also visualised using dot points.

https://doi.org/10.1371/journal.pone.0282818.g005

ASD symptom severity prediction.

Shown in Fig 6 are the %Geo values, the percentage of time spent on looking at the dynamic geometric stimuli. There was no significant difference in the %Geo values between the moderate and severe ASD participants (t = 0.424, p < .6729). On average, ASD participants with moderate symptoms fixated around 48.21% (SD = 23.82%) of their attention on the geometric stimuli. On the other hand, ASD participants with severe symptoms spent 50.98% (SD = 25.00%) of their attention looking at the geometric stimuli. We also performed pair-wise comparisons between the TD participants and the two ASD participant groups (i.e., moderate and severe). There was a significant difference in the %Geo values between ASD participants with severe symptoms and TD participants (t = 2.096, p < .0426). On the other hand, there was only a trend toward a significant difference in the %Geo values between ASD participants with mild symptoms and TD participants (t = 1.846, p < .0710).

thumbnail
Fig 6. Comparison of the percentage of time spent looking at the dynamic geometric stimuli (%geo) ASD participants with moderate and severe symptoms.

Each box plot contains the interquartile range, the x marker corresponds to the mean value and the horizontal line inside corresponds to the median. Each sample is also visualised using dot points.

https://doi.org/10.1371/journal.pone.0282818.g006

In recent years, it has been shown that stimuli that have both dynamic geometric and social images can reliably separate the visual attention of ASD and TD individuals. We contribute to the literature by showing that a DNN-based approach using dynamic stimuli can result in highly accurate ASD classification and even predict the level of ASD-related symptoms with promising performance.

ASD classification performance.

In Fig 7, different performance metrics for ASD prediction on the GeoPref Test dynamic stimulus are shown. In Fig 7A, the accuracy, sensitivity and specificity of the model as the number of fixations (i.e., fixation length) increases are displayed. It can be observed that all measures generally increase as the number of fixations increases. In Fig 7B and 7C, the receiver operating characteristics (ROC) curve and the confusion matrix of the model that reported the highest accuracy (i.e., using the optimal fixation length) in Fig 7A are shown. The area under the ROC curve (AUC) of our model is 0.96, significantly higher than chance-level performance (AUC = 0.5). Our model achieved the highest accuracy of 94.59% when 64 fixations were included in the analysis. The high sensitivity of our model (highest value = 100%) suggests that it can reliably identify ASD children. On the other hand, the specificity of our model (highest value = 76.47%) suggests that it can reliably identify children without the disorder. Overall, four (4) children were mistakenly flagged as having the disorder despite not having it.

thumbnail
Fig 7. Different performance metrics for ASD prediction.

(a) the plot of the model’s sensitivity, specificity and accuracy as the number of fixations (i.e., fixation length) increases. (b) the plot of the area under the receiving operating curve of the best-performing model. (c) the confusion matrix of the best-performing model.

https://doi.org/10.1371/journal.pone.0282818.g007

ASD severity prediction performance.

Similar to the results of the diagnosis prediction, it can be observed in Fig 8A that all performance measures for ASD severity prediction generally increase as the number of fixations (i.e., fixation length) increases. In Fig 8B and 8C, the ROC curve and the confusion matrix of the model that reported the highest accuracy in Fig 8A are shown. Our model achieved the highest accuracy of 94.74% when 44 fixations were included in the analysis. The area under the ROC curve (AUC) of our model is 0.99, significantly higher than chance-level performance (AUC = 0.5). The high specificity of our model (highest value = 100%) suggests that it can reliably identify children with mild ASD. On the other hand, the high sensitivity of our model (highest value = 87.50%) suggests that it can reliably identify children with severe symptoms. Overall, three (3) children were mistakenly flagged as having severe diagnoses despite having milder symptoms.

thumbnail
Fig 8. Different performance metrics for ASD symptom severity prediction.

(a) the plot of the model’s sensitivity, specificity and accuracy as the number of fixations (i.e., fixation length) increases. (b) the plot of the area under the receiving operating curve of the best-performing model. (c) the confusion matrix of the best-performing model.

https://doi.org/10.1371/journal.pone.0282818.g008

Comparison with other approaches

As outlined in the related work section, a straightforward comparison with previous approaches that utilise dynamic stimuli is not possible because the published dataset contains a visualisation of eye-tracking participants (i.e., scanpath images) rather than the stimuli used and the associated eye-tracking data that our model requires. Nevertheless, we compared our proposed approach with a simple thresholding method [1113] and ML algorithms using handcrafted features [23,24].

ASD classification.

Following the cut-off of %Geo > 69% to determine ASD participants in a similar study [1113], we obtained a sensitivity of 22.80%, specificity of 88.23% and accuracy of 37.84%. The AUC obtained was 0.67. In comparison, our proposed model resulted in 77.2% higher sensitivity, 11.76% lower specificity and 56.75% higher accuracy when compared to solely utilising the %Geo values. Handcrafted features that include raw eye gaze points (x and y locations), average fixation duration, age and gender, were also used as input to a random forest regressor and a decision tree classifier for ASD classification similar to a previous study [23,24]. The random forest regressor achieved an accuracy of 72.97%, a sensitivity of 91.22% and a specificity of 0%. On the other hand, the decision tree classifier achieved an accuracy of 58.11%, a sensitivity of 70.18% and a specificity of 17.65%.

Overall, our proposed model achieved the highest accuracy of 94.59%, the highest sensitivity of 100% and the second-best specificity of 76.47%. The comparison results in ASD classification suggest that our model better identified participants with ASD than the previous approaches, as shown in Table 6.

thumbnail
Table 6. ASD classification results comparison with prior approaches.

https://doi.org/10.1371/journal.pone.0282818.t006

ASD symptom severity prediction.

We also used the same cut-off of %Geo > 69% [1113] to identify ASD participants with severe symptoms and obtained a sensitivity of 25.00%, specificity of 78.79% and accuracy of 43.24%. The AUC obtained was 0.54. Again, our proposed method showed promising results for severity prediction, resulting in a 62.50% increase in sensitivity, a 21.21% increase in specificity and a 51.5% increase in accuracy when compared to solely utilising the %Geo values. In comparison to our model, using handcrafted features and ML classifiers resulted in the same accuracy of 94.74%, slightly higher sensitivity of 91.67% and slightly lower specificity of 96.97%. Overall, our proposed model achieved the highest accuracy of 94.47%, the second-best sensitivity of 87.50% and the highest specificity of 100%. The comparison results in ASD symptom severity prediction suggest that our model better identifies participants with moderate symptoms than the previous approaches, as shown in Table 7.

thumbnail
Table 7. ASD symptom severity prediction results comparison with prior approaches.

https://doi.org/10.1371/journal.pone.0282818.t007

Discussion

Over the past decade, eye-tracking studies have revealed significant differences in visual attention between ASD and TD individuals. This motivated researchers to leverage recent advances in saliency prediction when designing a more quantitative approach to ASD diagnosis, as well as risk and symptom severity prediction. In this context, researchers have explored the use of static and dynamic stimuli during free-viewing tasks. The most common approach in the literature comprised of a traditional two-stage method that consists of a feature extraction stage followed by a classification stage. Increasing evidence suggests that the DL-based approach produced more discriminative features when compared to ML-based approaches. Classification methods that utilise DL also resulted in better performance than ML models. The rapid advances in DL approaches and the increasing number of publicly available datasets may help further advance the literature and improve classification performance. In this paper, we utilised a combination of DL and ML approaches for ASD diagnosis and symptom severity prediction.

Unlike prior research that utilised dynamic stimuli and converted the participant’s eye-tracking data into an image for classification, we propose a data-driven approach utilising a dynamic saliency model to extract discriminative features from the stimuli and an ML approach based on eye-tracking data to automatically identify individuals with ASD. In addition, we show that the same approach can predict the level of ASD-related symptoms in preschool children. Our approach to identifying children with ASD offers several advantages when compared to existing eye-tracking research. Most notably, our method only takes one minute of eye-tracking, a substantial decrease in recording time when compared to about 10 minutes required in previous studies [33,34]. While our method requires a substantially shorter amount of time, it is not a replacement for standard clinical assessments. Extensive experiments are necessary before the true clinical utility and usability of our proposed method can be realised.

Our results support other studies [1113] that found a significant difference in the overall attention towards geometric stimuli between ASD and TD participants. This significant difference in visual attention was also found between ASD children with severe symptoms and TD children in our study. Despite these differences, using the ratio of visual attention towards the geometric stimuli and the total overall attention and implementing a thresholding technique employed previously [1113] resulted in lower classification performance than our proposed model. Using an ML-based approach on handcrafted features [23,24] also resulted in lower accuracy in ASD prediction and a similar accuracy in symptom severity prediction than our proposed model. Overall, our results demonstrate the feasibility of using our approach in accurately identifying ASD children and children with severe symptoms. Our model achieved promising performance with high accuracy, sensitivity and specificity.

Finally, most published research reviewed in this paper attempted to identify adults with ASD or older ASD children. In contrast, we investigated the possibility of diagnosing autism and predicting the level of ASD-related symptoms in preschool children (around 4 years old), an age range where diagnosis and assessment are typically performed. As a result, we provide an alternative to augment (and not replace) existing clinical observation tools with a more objective and efficient approach to ASD diagnosis. This takes us closer to an early ASD screening system and allows children to access intervention for better health outcomes. While our results are promising, our proposed approach needs to be trained and tested on a much larger dataset before it can be utilised in clinical settings.

From a clinical perspective, our findings suggest that eye-tracking technology could be used as a biomarker of the presence of ASD and symptom severity in preschool children. Initial findings already found significant correlations between changes in eye-tracking measures and changes in clinical measures captured before and after interventions, suggesting that eye-tracking can be utilised to quantify treatment response [91]. Given the rapid advances in technology supported by the promising performance of the classification models reviewed in this paper, it is not hard to imagine that future research would explore the use of a similar eye-tracking paradigm in predicting other clinical phenotypes and treatment response outcomes in preschool ASD children. This will have a tremendous impact on targeting interventions that maximise health outcomes in patients.

Limitations

Despite the utility of the current study, there are several limitations to keep in mind. First, there was a gender skew towards males in the ASD group, as would be clinically expected. Nevertheless, further studies with more female participants are required to clarify our results, as differences in autism presentation and diagnosis between males and females have been documented [92]. For example, studies have shown that girls on the spectrum behave similarly to neurotypical boys and girls on certain socially orientated tasks, such as enhanced attention to faces during scenes that do not have social interactions [93,94]. In addition, TD men with high ASD traits exhibit worse accuracy of gaze shifts, while TD women have similar gaze-following behaviour regardless of ASD traits [95].

Further, the participant groups also differed in sample size, with the ASD group being three times as large as the TD group. The ASD participants in this study were recruited from an ASD-specific centre and there was good uptake to the study. Despite significant efforts of the team to recruit control participants, there was less interest from the families of neurotypical children to participate in the study, which is probably not surprising given the study is less meaningful for children without a developmental diagnosis. We also acknowledge that the dataset size is relatively small in comparison to the dataset required to train modern DL models. To aid our model training and leverage transfer learning, we utilised one of the best dynamic saliency detection model [88] and finetuned its weights to our dataset. This allowed our model to learn better and extract more robust and semantically meaningful features when compared to a model trained from scratch on our dataset. We believe that using the leave-one-out cross-validation approach to train and test the model addressed the class imbalance and small sample size in our study. This validation approach has been used extensively in prior research [14,33,34,43,68,69].

It is also useful to note that the participant groups were matched on chronological age but not on developmental abilities. Further studies with larger sample sizes with a developmentally age-matched group are suggested to confirm our findings. As reported in the Materials and methods section, children with ASD were not excluded from the study if they had a comorbid diagnosis. Although this has implications for any strict interpretation of the findings reported here, the inclusion of comorbid conditions in ASD research is ecologically valid. Indeed, it is rare in clinical practice to encounter a young person who has a ‘pure’ autism spectrum diagnosis with no other psychiatric or developmental comorbidities.

Finally, we cannot report on the performance of the stimuli-based classification approaches and compare it with our dynamic stimuli-based classification approach since this study is part of a larger study that aimed to find differences in eye-tracking data between ASD and TD participants while watching dynamic stimuli. As such, no eye-tracking data from the same participants were collected while viewing static stimuli.

Acknowledgments

We extend our gratitude to the children and their families who participated in this study and to the staff where this study was conducted.

References

  1. 1. Huerta M., Bishop S. L., Duncan A., Hus V. & Lord C. Application of DSM-5 criteria for autism spectrum disorder to three samples of children with DSM-IV diagnoses of pervasive developmental disorders. American Journal of Psychiatry 169, 1056–1064 (2012). pmid:23032385
  2. 2. Randall M. et al. Diagnostic tests for autism spectrum disorder (ASD) in preschool children. Cochrane Database of Systematic Reviews (2018). pmid:30075057
  3. 3. Taylor L. J. et al. Brief Report: An Exploratory Study of the Diagnostic Reliability for Autism Spectrum Disorder. Journal of Autism and Developmental Disorders 47, 1551–1558, (2017). pmid:28233080
  4. 4. Estes A. et al. Long-term outcomes of early intervention in 6-year-old children with autism spectrum disorder. Journal of the American Academy of Child & Adolescent Psychiatry 54, 580–587 (2015). pmid:26088663
  5. 5. de Belen R. A. J., Bednarz T., Sowmya A. & Del Favero D. Computer vision in autism spectrum disorder research: a systematic review of published studies from 2009 to 2019. Translational Psychiatry 10, 333, (2020). pmid:32999273
  6. 6. Sapiro G., Hashemi J. & Dawson G. Computer vision and behavioral phenotyping: an autism case study. Current Opinion in Biomedical Engineering 9, 14–20, (2019). pmid:37786644
  7. 7. Ahmed Z. A. T. & Jadhav M. E. A Review of Early Detection of Autism Based on Eye-Tracking and Sensing Technology in 2020 International Conference on Inventive Computation Technologies (ICICT). 160–166 (IEEE) (Year).
  8. 8. Kollias K.-F., Syriopoulou-Delli C. K., Sarigiannidis P. & Fragulis G. F. The Contribution of Machine Learning and Eye-Tracking Technology in Autism Spectrum Disorder Research: A Systematic Review. Electronics 10, 2982 (2021).
  9. 9. de Belen R. A. et al. Eye-tracking correlates of response to joint attention in preschool children with autism spectrum disorder. BMC Psychiatry 23, 211, (2023). pmid:36991383
  10. 10. Osterling J. & Dawson G. Early recognition of children with autism: a study of first birthday home videotapes. J Autism Dev Disord 24, 247–257, (1994). pmid:8050980
  11. 11. Pierce K., Conant D., Hazin R., Stoner R. & Desmond J. Preference for Geometric Patterns Early in Life as a Risk Factor for Autism. Archives of General Psychiatry 68, 101–109, (2011). pmid:20819977
  12. 12. Pierce K. et al. Eye Tracking Reveals Abnormal Visual Preference for Geometric Images as an Early Biomarker of an Autism Spectrum Disorder Subtype Associated With Increased Symptom Severity. Biological Psychiatry 79, 657–666, (2016). pmid:25981170
  13. 13. Moore A. et al. The geometric preference subtype in ASD: identifying a consistent, early-emerging phenomenon through eye tracking. Molecular autism 9, 1–13 (2018).
  14. 14. de Belen R. A. J., Bednarz T. & Sowmya A. EyeXplain Autism: Interactive System for Eye Tracking Data Analysis and Deep Neural Network Interpretation for Autism Spectrum Disorder Diagnosis in Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. Article 364 (Association for Computing Machinery), https://doi.org/10.1145/3411763.3451784 (Year).
  15. 15. Oliveira J. S. et al. Computer-aided autism diagnosis based on visual attention models using eye tracking. Scientific reports 11, 1–11 (2021).
  16. 16. Revers M. C. et al. Classification of Autism Spectrum Disorder Severity Using Eye Tracking Data Based on Visual Attention Model in 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS). 142–147 (IEEE) (Year).
  17. 17. Itti L., Koch C. & Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence 20, 1254–1259 (1998).
  18. 18. Borji A. Saliency prediction in the deep learning era: Successes and limitations. IEEE transactions on pattern analysis and machine intelligence 43, 679–700 (2019).
  19. 19. de Belen R. A. J., Bednarz T. & Sowmya A. ScanpathNet: A Recurrent Mixture Density Network for Scanpath Prediction in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5010–5020 (Year).
  20. 20. Duan H. et al. A dataset of eye movements for the children with autism spectrum disorder in Proceedings of the 10th ACM Multimedia Systems Conference. 255–260 (Association for Computing Machinery), https://doi.org/10.1145/3304109.3325818 (Year).
  21. 21. Gutiérrez J., Che Z., Zhai G. & Le Callet P. Saliency4ASD: Challenge, dataset and tools for visual attention modeling for autism spectrum disorder. Signal Processing: Image Communication 92, 116092 (2021).
  22. 22. Le Meur O., Nebout A., Cherel M. & Etchamendy E. From Kanner Austim to Asperger Syndromes, the Difficult Task to Predict Where ASD People Look at. IEEE Access 8, 162132–162140 (2020).
  23. 23. Canavan S. et al. Combining gaze and demographic feature descriptors for autism classification in 2017 IEEE International Conference on Image Processing (ICIP). 3750–3754 (IEEE) (Year).
  24. 24. Fabiano D., Canavan S., Agazzi H., Hinduja S. & Goldgof D. Gaze-based classification of autism spectrum disorder. Pattern Recognition Letters 135, 204–212 (2020).
  25. 25. Chaddad A., Desrosiers C. & Toews M. Multi-scale radiomic analysis of sub-cortical regions in MRI related to autism, gender and age. Scientific Reports 7, 45639, (2017). pmid:28361913
  26. 26. Chaddad A., Desrosiers C., Hassan L. & Tanougast C. Hippocampus and amygdala radiomic biomarkers for the study of autism spectrum disorder. BMC Neuroscience 18, 52, (2017). pmid:28821235
  27. 27. Chanel G. et al. Classification of autistic individuals and controls using cross-task characterization of fMRI activity. NeuroImage: Clinical 10, 78–88, (2016). pmid:26793434
  28. 28. Eslami T. & Saeed F. Auto-ASD-Network: A Technique Based on Deep Learning and Support Vector Machines for Diagnosing Autism Spectrum Disorder using fMRI Data in Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 646–651 (Association for Computing Machinery), https://doi.org/10.1145/3307339.3343482 (Year).
  29. 29. Zheng W. et al. Multi-feature based network revealing the structural abnormalities in autism spectrum disorder. IEEE Transactions on Affective Computing, 1–1, (2018).
  30. 30. Crimi A., Dodero L., Murino V. & Sona D. Case-control discrimination through effective brain connectivity in 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). 970–973, https://doi.org/10.1109/ISBI.2017.7950677 (Year).
  31. 31. Shukla P., Gupta T., Saini A., Singh P. & Balasubramanian R. A Deep Learning Frame-Work for Recognizing Developmental Disorders in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 705–714, https://doi.org/10.1109/WACV.2017.84 (Year).
  32. 32. Li B. et al. A Facial Affect Analysis System for Autism Spectrum Disorder in 2019 IEEE International Conference on Image Processing (ICIP). 4549–4553, https://doi.org/10.1109/ICIP.2019.8803604 (Year).
  33. 33. Jiang M. & Zhao Q. Learning Visual Attention to Identify People with Autism Spectrum Disorder in 2017 IEEE International Conference on Computer Vision (ICCV). 3287–3296, https://doi.org/10.1109/ICCV.2017.354 (Year).
  34. 34. Liu W., Li M. & Yi L. Identifying children with autism spectrum disorder based on their face processing abnormality: A machine learning framework. Autism Research 9, 888–898, (2016). pmid:27037971
  35. 35. Liu W. et al. Efficient autism spectrum disorder prediction with eye movement: A machine learning framework in 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). 649–655, https://doi.org/10.1109/ACII.2015.7344638 (Year).
  36. 36. Vu T. et al. Effective and efficient visual stimuli design for quantitative autism screening: An exploratory study in 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI). 297–300, https://doi.org/10.1109/BHI.2017.7897264 (Year).
  37. 37. Vyas K. et al. Recognition Of Atypical Behavior In Autism Diagnosis From Video Using Pose Estimation Over Time in 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP). 1–6, https://doi.org/10.1109/MLSP.2019.8918863 (Year).
  38. 38. Zunino A. et al. Video Gesture Analysis for Autism Spectrum Disorder Detection in 2018 24th International Conference on Pattern Recognition (ICPR). 3421–3426, https://doi.org/10.1109/ICPR.2018.8545095 (Year).
  39. 39. Rajagopalan S. S., Dhall A. & Goecke R. Self-Stimulatory Behaviours in the Wild for Autism Diagnosis in 2013 IEEE International Conference on Computer Vision Workshops. 755–761, https://doi.org/10.1109/ICCVW.2013.103 (Year).
  40. 40. Rajagopalan S. S. & Goecke R. Detecting self-stimulatory behaviours for autism diagnosis in 2014 IEEE International Conference on Image Processing (ICIP). 1470–1474, https://doi.org/10.1109/ICIP.2014.7025294 (Year).
  41. 41. Wang Z. et al. Screening Early Children with Autism Spectrum Disorder via Response-to-Name Protocol. IEEE Transactions on Industrial Informatics, 1–1, (2019).
  42. 42. Wang Z., Xu K. & Liu H. Screening Early Children with Autism Spectrum Disorder via Expressing Needs with Index Finger Pointing in Proceedings of the 13th International Conference on Distributed Smart Cameras. Article 24 (Association for Computing Machinery), https://doi.org/10.1145/3349801.3349826 (Year).
  43. 43. Chen S. & Zhao Q. Attention-Based Autism Spectrum Disorder Screening With Privileged Modality in 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 1181–1190, https://doi.org/10.1109/ICCV.2019.00127 (Year).
  44. 44. Minissi M. E., Chicchi Giglioli I. A., Mantovani F. & Alcaniz Raya M. Assessment of the autism spectrum disorder based on machine learning and social visual attention: A systematic review. Journal of Autism and Developmental Disorders 52, 2187–2202 (2022). pmid:34101081
  45. 45. Carette R., Elbattah M., Dequen G., Guérin J.-L. & Cilia F. Visualization of eye-tracking patterns in autism spectrum disorder: method and dataset in 2018 Thirteenth International Conference on Digital Information Management (ICDIM). 248–253 (IEEE) (Year).
  46. 46. Duan H. et al. Learning to predict where the children with asd look in 2018 25th ieee international conference on image processing (icip). 704–708 (IEEE) (Year).
  47. 47. Duan H. et al. Visual attention analysis and prediction on human faces for children with autism spectrum disorder. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 1–23 (2019).
  48. 48. Fang Y., Huang H., Wan B. & Zuo Y. Visual attention modeling for autism spectrum disorder by semantic features in 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 625–628 (IEEE) (Year).
  49. 49. Wei W., Liu Z., Huang L., Nebout A. & Le Meur O. Saliency prediction via multi-level features and deep supervision for children with autism spectrum disorder in 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 621–624 (IEEE) (Year).
  50. 50. Nebout A., Wei W., Liu Z., Huang L. & Le Meur O. Predicting saliency maps for asd people in 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 629–632 (IEEE) (Year).
  51. 51. Fang Y. et al. Visual attention prediction for Autism Spectrum Disorder with hierarchical semantic fusion. Signal Processing: Image Communication 93, 116186 (2021).
  52. 52. Wei W. et al. Predicting atypical visual saliency for autism spectrum disorder via scale-adaptive inception module and discriminative region enhancement loss. Neurocomputing 453, 610–622 (2021).
  53. 53. Min X. et al. Visual attention analysis and prediction on human faces. Information Sciences 420, 417–430 (2017).
  54. 54. Wang S. et al. Atypical Visual Saliency in Autism Spectrum Disorder Quantified through Model-Based Eye Tracking. Neuron 88, 604–616, (2015). pmid:26593094
  55. 55. Yaneva V., Eraslan S., Yesilada Y. & Mitkov R. Detecting high-functioning autism in adults using eye tracking and machine learning. IEEE Transactions on Neural Systems and Rehabilitation Engineering 28, 1254–1261 (2020). pmid:32356755
  56. 56. Startsev M. & Dorr M. Classifying autism spectrum disorder based on scanpaths and saliency in 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 633–636 (IEEE) (Year).
  57. 57. Arru G., Mazumdar P. & Battisti F. Exploiting visual behaviour for autism spectrum disorder identification in 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 637–640 (IEEE) (Year).
  58. 58. Wu C., Liaqat S., Cheung S.-c., Chuah C.-N. & Ozonoff S. Predicting autism diagnosis using image with fixations and synthetic saccade patterns in 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 647–650 (IEEE) (Year).
  59. 59. Tao Y. & Shyu M.-L. SP-ASDNet: CNN-LSTM based ASD classification model using observer scanpaths in 2019 IEEE International conference on multimedia & expo workshops (ICMEW). 641–646 (IEEE) (Year).
  60. 60. Fang Y., Duan H., Shi F., Min X. & Zhai G. Identifying children with autism spectrum disorder based on gaze-following in 2020 IEEE International Conference on Image Processing (ICIP). 423–427 (IEEE) (Year).
  61. 61. Rahman S., Rahman S., Shahid O., Abdullah M. T. & Sourov J. A. Classifying eye-tracking data using saliency maps in 2020 25th International Conference on Pattern Recognition (ICPR). 9288–9295 (IEEE) (Year).
  62. 62. Xu S., Yan J. & Hu M. A new bio-inspired metric based on eye movement data for classifying ASD and typically developing children. Signal Processing: Image Communication 94, 116171 (2021).
  63. 63. Wei W. et al. Identify autism spectrum disorder via dynamic filter and deep spatiotemporal feature extraction. Signal Processing: Image Communication 94, 116195 (2021).
  64. 64. Liaqat S. et al. Predicting ASD diagnosis in children with synthetic and image-based eye gaze data. Signal Processing: Image Communication 94, 116198 (2021). pmid:33859457
  65. 65. Mazumdar P., Arru G. & Battisti F. Early detection of children with autism spectrum disorder based on visual exploration of images. Signal Processing: Image Communication 94, 116184 (2021).
  66. 66. Tseng P.-H. et al. High-throughput classification of clinical populations from natural viewing eye movements. Journal of neurology 260, 275–284 (2013). pmid:22926163
  67. 67. Wan G. et al. Applying eye tracking to identify autism spectrum disorder in children. Journal of autism and developmental disorders 49, 209–215 (2019). pmid:30097760
  68. 68. Jiang M. et al. Classifying individuals with ASD through facial emotion recognition and eye-tracking in 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 6063–6068 (IEEE) (Year).
  69. 69. Zhao Z. et al. Classification of Children With Autism and Typical Development Using Eye-Tracking Data From Face-to-Face Conversations: Machine Learning Model Development and Performance Evaluation. J Med Internet Res 23, e29328, (2021). pmid:34435957
  70. 70. Carette R. et al. Learning to Predict Autism Spectrum Disorder based on the Visual Patterns of Eye-tracking Scanpaths in HEALTHINF. 103–112 (Year).
  71. 71. Elbattah M., Carette R., Dequen G., Guérin J.-L. & Cilia F. Learning clusters in autism spectrum disorder: Image-based clustering of eye-tracking scanpaths with deep autoencoder in 2019 41st Annual international conference of the IEEE engineering in medicine and biology society (EMBC). 1417–1420 (IEEE) (Year).
  72. 72. Akter T., Ali M. H., Khan M. I., Satu M. S. & Moni M. A. Machine learning model to predict autism investigating eye-tracking dataset in 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST). 383–387 (IEEE) (Year).
  73. 73. Cilia F. et al. Computer-aided screening of autism spectrum disorder: eye-tracking study using data visualization and deep learning. JMIR Human Factors 8, e27706 (2021). pmid:34694238
  74. 74. Kanhirakadavath M. R. & Chandran M. S. M. Investigation of Eye-Tracking Scan Path as a Biomarker for Autism Screening Using Machine Learning Algorithms. Diagnostics 12, 518 (2022). pmid:35204608
  75. 75. Gaspar A., Oliva D., Hinojosa S., Aranguren I. & Zaldivar D. An optimized Kernel Extreme Learning Machine for the classification of the autism spectrum disorder by using gaze tracking images. Applied Soft Computing 120, 108654 (2022).
  76. 76. Ahmed I. A. et al. Eye Tracking-Based Diagnosis and Early Detection of Autism Spectrum Disorder Using Machine Learning and Deep Learning Techniques. Electronics 11, 530 (2022).
  77. 77. Fan L. et al. Screening of Autism Spectrum Disorder Using Novel Biological Motion Stimuli. 371–384 (Springer Singapore) (Year).
  78. 78. Fang H., Fan L. & Hwang J.-N. Auxiliary Diagnostic Method for Early Autism Spectrum Disorder Based on Eye Movement Data Analysis in 2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS). 72–77 (IEEE) (Year).
  79. 79. Carette R. et al. Automatic autism spectrum disorder detection thanks to eye-tracking and neural network-based approach in Internet of Things (IoT) Technologies for HealthCare: 4th International Conference, HealthyIoT 2017, Angers, France, October 24–25, 2017, Proceedings 4. 75–81 (Springer) (Year).
  80. 80. Putra P. U., Shima K., Alvarez S. A. & Shimatani K. Identifying autism spectrum disorder symptoms using response and gaze behavior during the Go/NoGo game CatChicken. Scientific reports 11, 1–12 (2021).
  81. 81. Kou J. et al. Comparison of three different eye-tracking tasks for distinguishing autistic from typically developing children and autistic symptom severity. Autism Research 12, 1529–1540, (2019). pmid:31369217
  82. 82. Bacon E. C. et al. Identifying prognostic markers in autism spectrum disorder using eye tracking. Autism 24, 658–669 (2020). pmid:31647314
  83. 83. Treisman A. M. & Gelade G. A feature-integration theory of attention. Cognitive psychology 12, 97–136 (1980). pmid:7351125
  84. 84. Kononenko I., Šimec E. & Robnik-Šikonja M. Overcoming the myopia of inductive learning algorithms with RELIEFF. Applied Intelligence 7, 39–55 (1997).
  85. 85. Association A. P. Diagnostic and statistical manual of mental disorders (DSM-5®). (American Psychiatric Pub, 2013).
  86. 86. Lord C. et al. Autism diagnostic observation schedule, (ADOS-2) modules 1–4. Los Angeles, California: Western Psychological Services (2012).
  87. 87. Olsen A. The Tobii I-VT fixation filter. Tobii Technology 21 (2012).
  88. 88. Wang W., Shen J., Guo F., Cheng M.-M. & Borji A. Revisiting video saliency: A large-scale benchmark and a new model in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4894–4903 (Year).
  89. 89. Huang X., Shen C., Boix X. & Zhao Q. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks in Proceedings of the IEEE international conference on computer vision. 262–270 (Year).
  90. 90. Vapnik V. N. An overview of statistical learning theory. IEEE transactions on neural networks 10, 988–999 (1999). pmid:18252602
  91. 91. Bradshaw J. et al. The Use of Eye Tracking as a Biomarker of Treatment Outcome in a Pilot Randomized Clinical Trial for Young Children with Autism. Autism Research 12, 779–793, (2019). pmid:30891960
  92. 92. Lai M.-C. & Szatmari P. Sex and gender impacts on the behavioural presentation and recognition of autism. Current Opinion in Psychiatry 33, 117–123, (2020). pmid:31815760
  93. 93. Harrop C. et al. Visual attention to faces in children with autism spectrum disorder: are there sex differences? Molecular Autism 10, 28, (2019). pmid:31297179
  94. 94. Harrop C. et al. Social and Object Attention Is Influenced by Biological Sex and Toy Gender-Congruence in Children With and Without Autism. Autism Research 13, 763–776, (2020). pmid:31799774
  95. 95. Whyte E. M. & Scherf K. S. Gaze Following Is Related to the Broader Autism Phenotype in a Sex-Specific Way: Building the Case for Distinct Male and Female Autism Phenotypes. Clinical Psychological Science 6, 280–287, (2018). pmid:29576931