BSN-ESC: A Big–Small Network-Based Environmental Sound Classification Method for AIoT Applications-Reference-Cited by-同舟云学术

BSN-ESC: A Big–Small Network-Based Environmental Sound Classification Method for AIoT Applications

Published:2023-07-28 Issue:15 Volume:23 Page:6767
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Peng Lujie¹,Yang Junyu¹,Yan Longke¹,Chen Zhiyi¹,Xiao Jianbiao¹,Zhou Liang¹,Zhou Jun¹

Affiliation:

1. Department of Internet of Things Engineering, School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Abstract

In recent years, environmental sound classification (ESC) has prevailed in many artificial intelligence Internet of Things (AIoT) applications, as environmental sound contains a wealth of information that can be used to detect particular events. However, existing ESC methods have high computational complexity and are not suitable for deployment on AIoT devices with constrained computing resources. Therefore, it is of great importance to propose a model with both high classification accuracy and low computational complexity. In this work, a new ESC method named BSN-ESC is proposed, including a big–small network-based ESC model that can assess the classification difficulty level and adaptively activate a big or small network for classification as well as a pre-classification processing technique with logmel spectrogram refining, which prevents distortion in the frequency-domain characteristics of the sound clip at the joint part of two adjacent sound clips. With the proposed methods, the computational complexity is significantly reduced, while the classification accuracy is still high. The proposed BSN-ESC model is implemented on both CPU and FPGA to evaluate its performance on both PC and embedded systems with the dataset ESC-50, which is the most commonly used dataset. The proposed BSN-ESC model achieves the lowest computational complexity with the number of floating-point operations (FLOPs) of only 0.123G, which represents a reduction of up to 2309 times in computational complexity compared with state-of-the-art methods while delivering a high classification accuracy of 89.25%. This work can achieve the realization of ESC being applied to AIoT devices with constrained computational resources.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/15/6767/pdf

Reference31 articles.

1. A remote human activity detection system based on partial-fiber LDV and PTZ camera;Han;Opt. Laser Technol.,2019

2. Double mode surveillance system based on remote audio/video signals acquisition;Lv;Appl. Acoust.,2018

3. Weninger, F., and Schuller, B. (2011, January 22–27). Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations. Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic.

4. Sound classification in a smart room environment: An approach using GMM and HMM methods;Vacher;Proc. Conf. Speech Technol. Hum.-Comput. Dialogue,2007

5. Two-stage supervised learning-based method to detect screams and cries in urban environments;Anil;IEEE/ACM Trans. Audio Speech Lang. Process.,2016