Analyzing the Effects of Transcription Errors on Summary Generation of Bengali Spoken Documents-Reference-Cited by-同舟云学术

Analyzing the Effects of Transcription Errors on Summary Generation of Bengali Spoken Documents

Published:2024-08-16 Issue:9 Volume:23 Page:1-28
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Chowdhury Priyanjana¹^ORCID,Sarkar Nabanika¹^ORCID,Nath Sanghamitra²^ORCID,Sharma Utpal³^ORCID

Affiliation:

1. Computer science and engineering, Tezpur University School of Engineering, Napaam, India

2. Computer Science and Engineering, Tezpur University, Napaam, India

3. Computer Science & Engineering, Tezpur University, Tezpur, India

Abstract

Automatic speech recognition (ASR) has become an indispensable part of the AI domain, with various speech technologies reliant on it. The quality of speech recognition depends on the amount of annotated data used to train an ASR system, among other factors. For a low-resourced language, this is a severe constraint and thus ASR quality is often poor. Humans can read through text containing ASR-errors, provided the context of the sentence is preserved. Yet in cases of transcripts generated by ASR systems of low-resource languages, multiple important words are misrecognized and the context is mostly lost; discerning such a text becomes nearly impossible. This article analyzes the types of transcription errors that occur while generating ASR transcripts of spoken documents in Bengali, an under-resourced language predominantly spoken in India and Bangladesh. The transcripts of the Bengali spoken document are generated using the ASR of Google Cloud Speech. The article also explores if there is an effect of such transcription errors in generating speech summaries of these spoken documents. Summarization is carried out extractively; sentences are selected from the ASR-generated text of the spoken document. Speech summaries are created by aggregating the speech-segments from the original speech of the selected sentences. Subjective evaluation shows the “readability” of the spoken summaries are not degraded by ASR errors, but the quality is affected due to the reliance on intermediate text-summary containing transcription errors.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3678005

Reference61 articles.

1. Tomonori Kikuchi, Sadaoki Furui, and Chiori Hori. 2003. Automatic speech summarization based on sentence extraction and compaction. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, I–I.

2. Enhanced language modeling with proximity and sentence relatedness information for extractive broadcast news summarization;Liu Shih-Hung;ACM Transactions on Asian and Low-Resource Language Information Processing,2020

3. Shih-Hung Liu, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, and Wen-Lian Hsu. 2015. Positional language modeling for extractive broadcast news speech summarization. In Proceedings of the 16th Annual Conference of the International Speech Communication Association.

4. Combining Relevance Language Modeling and Clarity Measure for Extractive Speech Summarization

5. Extractive summarization of multi-party meetings through discourse segmentation