A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events
-
Published:2024-05-21
Issue:1
Volume:17
Page:
-
ISSN:1875-6883
-
Container-title:International Journal of Computational Intelligence Systems
-
language:en
-
Short-container-title:Int J Comput Intell Syst
Author:
Vrochidis AlexandrosORCID, Dimitriou Nikolaos, Krinidis Stelios, Panagiotidis Savvas, Parcharidis Stathis, Tzovaras Dimitrios
Abstract
AbstractThis paper introduces a deep learning methodology for analyzing audience engagement in online video events. The proposed deep learning framework consists of six layers and starts with keyframe extraction from the video stream and the participants’ face detection. Subsequently, the head pose and emotion per participant are estimated using the HopeNet and JAA-Net deep architectures. Complementary to video analysis, the audio signal is also processed using a neural network that follows the DenseNet-121 architecture. Its purpose is to detect events related to audience engagement, including speech, pauses, and applause. With the combined analysis of video and audio streams, the interest and attention of each participant are inferred more accurately. An experimental evaluation is performed on a newly generated dataset consisting of recordings from online video events, where the proposed framework achieves promising results. Concretely, the F1 scores were 79.21% for interest estimation according to pose, 65.38% for emotion estimation, and 80% for sound event detection. The proposed framework has applications in online educational events, where it can help tutors assess audience engagement and comprehension while hinting at points in their lectures that may require further clarification. It is effective for video streaming platforms that want to provide video recommendations to online users according to audience engagement.
Publisher
Springer Science and Business Media LLC
Reference62 articles.
1. Jiao, Z., Lei, H., Zong, H., Cai, Y., Zhong, Z.: Potential escalator-related injury identification and prevention based on multi-module integrated system for public health. Mach. Vis. Appl. 33, 29 (2022). https://doi.org/10.1007/s00138-022-01273-2 2. Citraro, L., Márquez-Neila, P., Savarè, S., Jayaram, V., Dubout, C., Renaut, F., Hasfura, A., Shitrit, B., Fua, P.: Real-time camera pose estimation for sports fields. Mach. Vis. Appl. 31, 16 (2020). https://doi.org/10.1007/s00138-020-01064-7 3. Vrochidis, A., Dimitriou, N., Krinidis, S., Panagiotidis, S., Parcharidis, S., Tzovaras, D.: A multi-modal audience analysis system for predicting popularity of online videos. In: Iliadis, L., Macintyre, J., Jayne, C., Pimenidis, E. (eds.) EANN 2021, 3, 465–476 (2021). https://doi.org/10.1007/978-3-030-80568-5_38 4. Kokila, M.L.S., Christopher, V.B., Sajan, R.I., Akhila, T.S., Kavitha, M.J.: Efficient abnormality detection using patch-based 3D convolution with recurrent model. Mach. Vis. Appl. 34, 54 (2023). https://doi.org/10.1007/s00138-023-01397-z 5. Vrochidis, A., Dimitriou, N., Krinidis, S., Panagiotidis, S., Parcharidis, S., Tzovaras, D.: Video Popularity prediction through fusing early viewership with video content. In: Vincze, M., Patten, T., Christensen, H., Nalpantidis, L., Liu, M., (eds.) Computer Vision Systems, ICVS 2021, 12899, 159–168 (2021). https://doi.org/10.1007/978-3-030-87156-7_13
|
|