The Intelligibility Benefits of Modern Computer-Synthesized Speech for Normal-Hearing and Hearing-Impaired Listeners in Non-Ideal Listening Conditions-Reference-Cited by-同舟云学术

The Intelligibility Benefits of Modern Computer-Synthesized Speech for Normal-Hearing and Hearing-Impaired Listeners in Non-Ideal Listening Conditions

Published:2024-04-18 Issue:1 Volume:5 Page:5
ISSN:2504-463X
Container-title:Journal of Otorhinolaryngology, Hearing and Balance Medicine
language:en
Short-container-title:JOHBM

Author:

Ma Yizhen¹,Tang Yan²³^ORCID

Affiliation:

1. Department of Linguistics, University of Rochester, Rochester, NY 14627, USA

2. Department of Linguistics, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA

3. Beckman Institute for Advanced Science and Technology, Urbana, IL 61801, USA

Abstract

Speech intelligibility is a concern for public health, especially in non-ideal listening conditions where listeners often listen to the target speech in the presence of background noise. With advances in technology, synthetic speech has been increasingly used in lieu of actual human voices in human–machine interfaces, such as public announcement systems, answering machines, virtual personal assistants, and GPS, to interact with users. However, previous studies showed that speech generated by computer speech synthesizers was often intrinsically less natural and intelligible than natural speech produced by human speakers. In terms of noise, listening to synthetic speech is challenging for listeners with normal hearing (NH), not to mention for hearing-impaired (HI) listeners. Recent developments in speech synthesis have significantly improved the naturalness of synthetic speech. In this study, the intelligibility of speech generated by commercial synthesizers from Google, Amazon, and Microsoft was evaluated by both NH and HI listeners in different noise conditions. Compared to a natural female voice as the baseline, listeners’ listening performance suggested that some of the synthetic speech was significantly more intelligible even at rather adverse listening conditions for the NH cohort. Further acoustical analyses revealed that elongated vowel sounds and reduced spectral tilt were primarily responsible for improved intelligibility for NH, but not for HI due to their impairment at high frequencies and possible cognitive decline associated with aging.

Publisher

MDPI AG

Link

https://www.mdpi.com/2504-463X/5/1/5/pdf

Reference52 articles.

1. Fant, C.G.M. (1968). Analysis and Synthesis of Speech Processes, North-Holland Publishing Comp.

2. Intelligibility comparisons for two synthetic and one natural speech source;Clark;J. Phon.,1983

3. The Perception of Synthetic Speech in Noise;Salvi;Basic and Applied Aspects of Noise-Induced Hearing Loss,2007

4. Evaluating the intelligibility benefit of speech modifications in known noise conditions;Cooke;Speech Commun.,2013

5. Black, A.W., Zen, H., and Tokuda, K. (2007, January 15–20). Statistical Parametric Speech Synthesis. Proceedings of the ICASSP, Honolulu, HI, USA.