Neural Text-to-Speech Synthesis -  Xu Tan

Neural Text-to-Speech Synthesis (eBook)

(Autor)

eBook Download: PDF
2023 | 1. Auflage
XXV, 201 Seiten
Springer Nature Singapore (Verlag)
978-981-99-0827-1 (ISBN)
Systemvoraussetzungen
149,79 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen

Text-to-speech (TTS) aims to synthesize intelligible and natural speech based on the given text. It is a hot topic in language, speech, and machine learning research and has broad applications in industry. This book introduces neural network-based TTS in the era of deep learning, aiming to provide a good understanding of neural TTS, current research and applications, and the future research trend.

This book first introduces the history of TTS technologies and overviews neural TTS, and provides preliminary knowledge on language and speech processing, neural networks and deep learning, and deep generative models. It then introduces neural TTS from the perspective of key components (text analyses, acoustic models, vocoders, and end-to-end models) and advanced topics (expressive and controllable, robust, model-efficient, and data-efficient TTS). It also points some future research directions and collects some resources related to TTS.

This book is the first to introduce neural TTS in a comprehensive and easy-to-understand way and can serve both academic researchers and industry practitioners working on TTS.




Xu Tan is a Principal Researcher and Research Manager at Microsoft Research Asia. His research interests cover deep learning and its applications in language/speech/music processing and digital human creation. He has rich research experience in text-to-speech synthesis. He has developed high-quality TTS systems such as FastSpeech 1/2 (widely used in the TTS community), DelightfulTTS (winning the champion of the Blizzard TTS Challenge), and NaturalSpeech (achieving human-level quality on the TTS benchmark dataset), and transferred many research works to improve the experience of Microsoft Azure TTS services. He has given a series of tutorials on TTS at top conferences such as IJCAI, ICASSP, and INTERSPEECH, and written a comprehensive survey paper on TTS.

Besides speech synthesis, he has designed several popular language models (e.g., MASS) and AI music systems (e.g., Muzic), developed machine translation systems that achieved human parity in Chinese-English translation and won several champions in WMT machine translation competitions. He has published over 100 papers at prestigious conferences such as ICML, NeurIPS, ICLR, AAAI, IJCAI, ACL, EMNLP, NAACL, ICASSP, INTERSPEECH, KDD, and IEEE/ACM Transactions, and served as the area chair or action editor of some AI conferences and journals (e.g., NeurIPS, AAAI, ICASSP, TMLR).



Text-to-speech (TTS) aims to synthesize intelligible and natural speech based on the given text. It is a hot topic in language, speech, and machine learning research and has broad applications in industry. This book introduces neural network-based TTS in the era of deep learning, aiming to provide a good understanding of neural TTS, current research and applications, and the future research trend.This book first introduces the history of TTS technologies and overviews neural TTS, and provides preliminary knowledge on language and speech processing, neural networks and deep learning, and deep generative models. It then introduces neural TTS from the perspective of key components (text analyses, acoustic models, vocoders, and end-to-end models) and advanced topics (expressive and controllable, robust, model-efficient, and data-efficient TTS). It also points some future research directions and collects some resources related to TTS.This book is the first to introduceneural TTS in a comprehensive and easy-to-understand way and can serve both academic researchers and industry practitioners working on TTS.
Erscheint lt. Verlag 29.5.2023
Reihe/Serie Artificial Intelligence: Foundations, Theory, and Algorithms
Zusatzinfo XXV, 201 p. 24 illus. in color.
Sprache englisch
Themenwelt Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Technik Elektrotechnik / Energietechnik
Schlagworte Acoustic Model • Artificial Intelligence • Deep learning • End-to-End TTS • language and speech processing • Neural Network based TTS • Speech Synthesis • text to speech • Vocoder
ISBN-10 981-99-0827-2 / 9819908272
ISBN-13 978-981-99-0827-1 / 9789819908271
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 9,3 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
der Praxis-Guide für Künstliche Intelligenz in Unternehmen - Chancen …

von Thomas R. Köhler; Julia Finkeissen

eBook Download (2024)
Campus Verlag
38,99
Wie du KI richtig nutzt - schreiben, recherchieren, Bilder erstellen, …

von Rainer Hattenhauer

eBook Download (2023)
Rheinwerk Computing (Verlag)
24,90