Analyzing the Effects of Transcription Errors on Summary Generation of Bengali Spoken Documents

Author:

Chowdhury Priyanjana1ORCID,Sarkar Nabanika1ORCID,Nath Sanghamitra2ORCID,Sharma Utpal3ORCID

Affiliation:

1. Computer science and engineering, Tezpur University School of Engineering, Napaam, India

2. Computer Science and Engineering, Tezpur University, Napaam, India

3. Computer Science & Engineering, Tezpur University, Tezpur, India

Abstract

Automatic speech recognition (ASR) has become an indispensable part of the AI domain, with various speech technologies reliant on it. The quality of speech recognition depends on the amount of annotated data used to train an ASR system, among other factors. For a low-resourced language, this is a severe constraint and thus ASR quality is often poor. Humans can read through text containing ASR-errors, provided the context of the sentence is preserved. Yet in cases of transcripts generated by ASR systems of low-resource languages, multiple important words are misrecognized and the context is mostly lost; discerning such a text becomes nearly impossible. This article analyzes the types of transcription errors that occur while generating ASR transcripts of spoken documents in Bengali, an under-resourced language predominantly spoken in India and Bangladesh. The transcripts of the Bengali spoken document are generated using the ASR of Google Cloud Speech. The article also explores if there is an effect of such transcription errors in generating speech summaries of these spoken documents. Summarization is carried out extractively; sentences are selected from the ASR-generated text of the spoken document. Speech summaries are created by aggregating the speech-segments from the original speech of the selected sentences. Subjective evaluation shows the “readability” of the spoken summaries are not degraded by ASR errors, but the quality is affected due to the reliance on intermediate text-summary containing transcription errors.

Publisher

Association for Computing Machinery (ACM)

Reference61 articles.

1. Tomonori Kikuchi, Sadaoki Furui, and Chiori Hori. 2003. Automatic speech summarization based on sentence extraction and compaction. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE, I–I.

2. Enhanced language modeling with proximity and sentence relatedness information for extractive broadcast news summarization;Liu Shih-Hung;ACM Transactions on Asian and Low-Resource Language Information Processing,2020

3. Shih-Hung Liu, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, and Wen-Lian Hsu. 2015. Positional language modeling for extractive broadcast news speech summarization. In Proceedings of the 16th Annual Conference of the International Speech Communication Association.

4. Combining Relevance Language Modeling and Clarity Measure for Extractive Speech Summarization

5. Extractive summarization of multi-party meetings through discourse segmentation

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3