Affiliation:
1. St. Francis College, India
Abstract
People who have difficulty hearing can use speech recognition software to communicate differently. The task is audio-visual speech recognition for better lip-reading comprehension. Audio speech recognition is the process of turning spoken words into text. The neural network model is trained using the Librispeech dataset. The input sound signal creates sound frames with a stride of 10 milliseconds and a window size of 20-25 milliseconds. It uses audio as the input, and feature extraction extracts information from features. A visual speech recognition system automatically recognizes spoken words by observing how the speaker moves their lips. The suggestion considers body language to understand the communicator's spoken words, increasing interpretation accuracy by 5.05%.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献