Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: The case of bonobo calls-Reference-Cited by-同舟云学术

Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: The case of bonobo calls

Published:2023-04-13 Issue:4 Volume:19 Page:e1010325
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Arnaud Vincent^ORCID,Pellegrino François^ORCID,Keenan Sumir,St-Gelais Xavier,Mathevon Nicolas,Levréro Florence,Coupé Christophe^ORCID

Abstract

Despite the accumulation of data and studies, deciphering animal vocal communication remains challenging. In most cases, researchers must deal with the sparse recordings composing Small, Unbalanced, Noisy, but Genuine (SUNG) datasets. SUNG datasets are characterized by a limited number of recordings, most often noisy, and unbalanced in number between the individuals or categories of vocalizations. SUNG datasets therefore offer a valuable but inevitably distorted vision of communication systems. Adopting the best practices in their analysis is essential to effectively extract the available information and draw reliable conclusions. Here we show that the most recent advances in machine learning applied to a SUNG dataset succeed in unraveling the complex vocal repertoire of the bonobo, and we propose a workflow that can be effective with other animal species. We implement acoustic parameterization in three feature spaces and run a Supervised Uniform Manifold Approximation and Projection (S-UMAP) to evaluate how call types and individual signatures cluster in the bonobo acoustic space. We then implement three classification algorithms (Support Vector Machine, xgboost, neural networks) and their combination to explore the structure and variability of bonobo calls, as well as the robustness of the individual signature they encode. We underscore how classification performance is affected by the feature set and identify the most informative features. In addition, we highlight the need to address data leakage in the evaluation of classification performance to avoid misleading interpretations. Our results lead to identifying several practical approaches that are generalizable to any other animal communication system. To improve the reliability and replicability of vocal communication studies with SUNG datasets, we thus recommend: i) comparing several acoustic parameterizations; ii) visualizing the dataset with supervised UMAP to examine the species acoustic space; iii) adopting Support Vector Machines as the baseline classification approach; iv) explicitly evaluating data leakage and possibly implementing a mitigation strategy.

Funder

Ministère de l’Enseignement Supérieur et de la Recherche

Ecole Doctorale SIS of the University of Saint-Etienne

Université du Québec à Chicoutimi

Université Jean Monnet Saint-Etienne

LABEX ASLAN

Agence Nationale de la Recherche

Institut Universitaire de France

Social Sciences and Humanities Research Council of Canada

Publisher

Public Library of Science (PLoS)

Subject

Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modeling and Simulation,Ecology, Evolution, Behavior and Systematics

Reference131 articles.

1. Morphologically structured vocalizations in female Diana monkeys;C Coye;Anim Behav,2016

2. Machine Learning Algorithms for Automatic Classification of Marmoset Vocalizations;HK Turesson;PLOS ONE,2016

3. A method for automated individual, species and call type recognition in free-ranging animals;A Mielke;Anim Behav,2013

4. Signature whistle shape conveys identity information to bottlenose dolphins;VM Janik;Proc Natl Acad Sci,2006

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Active few-shot learning for rare bioacoustic feature annotation;Ecological Informatics;2024-09

2. Bioacoustic classification of a small dataset of mammalian vocalisations using deep learning;Bioacoustics;2024-07-02

3. Vocal repertoire and individuality in the plains zebra ( Equus quagga );Royal Society Open Science;2024-07

4. Strong individual distinctiveness across the vocal repertoire of a colonial seabird, the little auk, Alle alle;Animal Behaviour;2024-04

5. Knowing a fellow by their bellow: acoustic individuality in the bellows of the American alligator;Animal Behaviour;2024-01