Abstract
Automatic speech recognition (ASR) software has been suggested as a candidate model of the human auditory system thanks to recent dramatic improvements in performance. To test this hypothesis, we compared several state-of-the-art ASR systems to results from humans on a barrage of standard psychometric experiments. While some systems showed qualitative agreement with humans in certain tests, in others all tested systems diverged markedly from humans. In particular, all systems used spectral invariance, temporal fine structure and speech periodicity differently from humans. We conclude that none of the tested ASR systems can yet act as a strong proxy for human speech recognition. However, we note that the more recent systems with better performance also tend to better match human results, suggesting that continued cross-fertilisation of ideas between human and automatic speech recognition may be fruitful. Our open source toolbox allows researchers to assess future ASR systems or add additional psychoacoustic measures.
Publisher
Cold Spring Harbor Laboratory
Reference54 articles.
1. Arai T , Greenberg S . Speech intelligibility in the presence of cross-channel spectral asynchrony. In: 1998 IEEE International conference on Acoustics, Speech and Signal Processing, (ICASSP), vol. 2 IEEE; 1998. p. 933–936.
2. Syllable intelligibility for temporally filtered LPC cepstral trajectories
3. Baevski A , Zhou H , Mohamed A , Auli M . wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:200611477. 2020;.
4. Boersma P . Praat: doing phonetics by computer. http://www.praatorg/. 2021;.
5. Identification of concurrent harmonic and inharmonic vowels: A test of the theory of harmonic cancellation and enhancement
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献