Syllable-rate-adjusted-modulation (SRAM) predicts clear and conversational speech intelligibility-Reference-Cited by-同舟云学术

Syllable-rate-adjusted-modulation (SRAM) predicts clear and conversational speech intelligibility

Published:2024-02-12 Issue: Volume:18 Page:
ISSN:1662-5161
Container-title:Frontiers in Human Neuroscience
language:
Short-container-title:Front. Hum. Neurosci.

Author:

Yang Ye,Zeng Fan-Gang

Abstract

IntroductionObjectively predicting speech intelligibility is important in both telecommunication and human-machine interaction systems. The classic method relies on signal-to-noise ratios (SNR) to successfully predict speech intelligibility. One exception is clear speech, in which a talker intentionally articulates as if speaking to someone who has hearing loss or is from a different language background. As a result, at the same SNR, clear speech produces higher intelligibility than conversational speech. Despite numerous efforts, no objective metric can successfully predict the clear speech benefit at the sentence level.MethodsWe proposed a Syllable-Rate-Adjusted-Modulation (SRAM) index to predict the intelligibility of clear and conversational speech. The SRAM used as short as 1 s speech and estimated its modulation power above the syllable rate. We compared SRAM with three reference metrics: envelope-regression-based speech transmission index (ER-STI), hearing-aid speech perception index version 2 (HASPI-v2) and short-time objective intelligibility (STOI), and five automatic speech recognition systems: Amazon Transcribe, Microsoft Azure Speech-To-Text, Google Speech-To-Text, wav2vec2 and Whisper.ResultsSRAM outperformed the three reference metrics (ER-STI, HASPI-v2 and STOI) and the five automatic speech recognition systems. Additionally, we demonstrated the important role of syllable rate in predicting speech intelligibility by comparing SRAM with the total modulation power (TMP) that was not adjusted by the syllable rate.DiscussionSRAM can potentially help understand the characteristics of clear speech, screen speech materials with high intelligibility, and convert conversational speech into clear speech.

Publisher

Frontiers Media SA

Reference30 articles.

1. Methods and applications of the audibility index in hearing aid selection and fitting.;Amlani;Trends Amplif.,2002

2. wav2vec 2.0: A framework for self-supervised learning of speech representations.;Baevski;Adv. Neural Inf. Process. Syst.,2020

3. Smoothing Periodograms from Time-Series with Continuous Spectra.;Bartlett;Nature,1948

4. The Bkb (Bamford-Kowal-Bench) sentence lists for partially-hearing children.;Bench;Br. J. Audiol.,1979

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assessing Speech Intelligibility and Severity Level in Parkinson's Disease Using Wav2Vec 2.0;2024 47th International Conference on Telecommunications and Signal Processing (TSP);2024-07-10