A Review of Deep Learning Based Speech Synthesis-Reference-Cited by-同舟云学术

A Review of Deep Learning Based Speech Synthesis

Published:2019-09-27 Issue:19 Volume:9 Page:4050
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Ning Yishuang,He Sheng,Wu Zhiyong,Xing Chunxiao,Zhang Liang-Jie

Abstract

Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more attention. Recent advances on speech synthesis are overwhelmingly contributed by deep learning or even end-to-end techniques which have been utilized to enhance a wide range of application scenarios such as intelligent speech interaction, chatbot or conversational artificial intelligence (AI). For speech synthesis, deep learning based techniques can leverage a large scale of <text, speech> pairs to learn effective feature representations to bridge the gap between text and speech, thus better characterizing the properties of events. To better understand the research dynamics in the speech synthesis field, this paper firstly introduces the traditional speech synthesis methods and highlights the importance of the acoustic modeling from the composition of the statistical parametric speech synthesis (SPSS) system. It then gives an overview of the advances on deep learning based speech synthesis, including the end-to-end approaches which have achieved start-of-the-art performance in recent years. Finally, it discusses the problems of the deep learning methods for speech synthesis, and also points out some appealing research directions that can bring the speech synthesis research into a new frontier.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/9/19/4050/pdf

Reference79 articles.

1. Review of text‐to‐speech conversion for English

2. From Text to Speech: The MITalk System;Allen,1987

3. Emotional stress in synthetic speech: Progress and future directions

4. Festivalhttp://www.cstr.ed.ac.uk/projects/festival/

Cited by 99 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Bio‐Plausible Multimodal Learning with Emerging Neuromorphic Devices;Advanced Science;2024-09-11

2. Speaker-Attributed Training for Multi-Speaker Speech Recognition Using Multi-Stage Encoders and Attention-Weighted Speaker Embedding;Applied Sciences;2024-09-10

3. Planning the development of text-to-speech synthesis models and datasets with dynamic deep learning;Journal of King Saud University - Computer and Information Sciences;2024-09

4. Raspberry-Pi Based Physical Media to Audio Conversion device for Visually Impaired Individuals;International Journal of Scientific Research in Science, Engineering and Technology;2024-08-29

5. Harnessing AI and NLP Tools for Innovating Brand Name Generation and Evaluation: A Comprehensive Review;Multimodal Technologies and Interaction;2024-07-01