Audio Processing and Speech Recognition -  Nilanjan Dey,  Anjan Dutta,  Soumya Sen

Audio Processing and Speech Recognition (eBook)

Concepts, Techniques and Research Overviews
eBook Download: PDF
2019 | 1st ed. 2019
XIV, 96 Seiten
Springer Singapore (Verlag)
978-981-13-6098-5 (ISBN)
Systemvoraussetzungen
53,49 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
This book offers an overview of audio processing, including the latest advances in the methodologies used in audio processing and speech recognition. First, it discusses the importance of audio indexing and classical information retrieval problem and presents two major indexing techniques, namely Large Vocabulary Continuous Speech Recognition (LVCSR) and Phonetic Search. It then offers brief insights into the human speech production system and its modeling, which are required to produce artificial speech. It also discusses various components of an automatic speech recognition (ASR) system. 
 
Describing the chronological developments in ASR systems, and briefly examining the statistical models used in ASR as well as the related mathematical deductions, the book summarizes a number of state-of-the-art classification techniques and their application in audio/speech classification. 
 
By providing insights into various aspects of audio/speech processing and speech recognition, this book appeals a wide audience, from researchers and postgraduate students to those new to the field.





Soumya Sen is an Assistant Professor at A. K. Choudhury School of Information Technology, University of Calcutta. He received his Ph.D. (Tech) degree from the Department of Computer Science and Engineering, at the same university, in 2016. Before joining A. K. Choudhury School of Information Technology, he worked at IBM India Pvt. Ltd and RS Software. His industrial expertise includes ERP and data warehousing. Currently his research interests are data warehousing and OLAP tools, data mining, big data, service engineering, distributed databases, and machine learning. He has published 1 book, 70 research papers in peer-reviewed journals and international conferences and registered 3 patents in USA, Japan and South Korea. Dr. Sen is a PC member and reviewer for numerous International conferences.
 
Anjan Dutta was born in Kolkata, India, in 1986. He received his B.Tech degree in Information Technology from West Bengal University of Technology in 2008 and M.Tech in Information Technology in 2011 from Calcutta University.
He served in IXIA Technologies LTD and TATA Consultancy Services Ltd. (TCSL) over 6 years of period. Initially he worked as a protocol developer in IXIA Technologies LTD and worked on 3gpp wireless protocols. Thereafter he worked as an IT Analyst in TATA Consultancy Services Ltd.(TCSL) Form July, 2011 to July, 2017. He is now employed as an Assistant Professor in Department of Information Technology, Techno India College of Technology, India. He is an active researcher in the field of Big Data, Data Mining, Audio processing and Audio classification etc. 

Nilanjan Dey was born in Kolkata, India, in 1984. He received his B.Tech. degree in
Information Technology from West Bengal University of Technology in 2005, M.Tech.in Information Technology in 2011 from the same University and Ph.D. in digital image processing in 2015 from Jadavpur University, India.
In 2011, he was appointed as an Assistant Professor in the Department of Information Technology at JIS College of Engineering, Kalyani, India followed by Bengal College of Engineering College, Durgapur, India in 2014. He is now employed as an Assistant Professor in Department of Information Technology, Techno India College of Technology, India. He is a visiting fellow of the University of Reading, UK. His research topic is signal processing, machine learning and information security.
Dr. Dey is an Associate Editor of IEEE ACCESS and is currently the Editor in-Chief of the International Journal of Ambient Computing and Intelligence. Series Co-editor of Advances in Ubiquitous Sensing Applications for Healthcare (AUSAH), Elsevier and Springer Tracts in Nature-Inspired Computing (STNIC).

This book offers an overview of audio processing, including the latest advances in the methodologies used in audio processing and speech recognition. First, it discusses the importance of audio indexing and classical information retrieval problem and presents two major indexing techniques, namely Large Vocabulary Continuous Speech Recognition (LVCSR) and Phonetic Search. It then offers brief insights into the human speech production system and its modeling, which are required to produce artificial speech. It also discusses various components of an automatic speech recognition (ASR) system.  Describing the chronological developments in ASR systems, and briefly examining the statistical models used in ASR as well as the related mathematical deductions, the book summarizes a number of state-of-the-art classification techniques and their application in audio/speech classification.  By providing insights into various aspects of audio/speech processing and speech recognition, this book appeals a wide audience, from researchers and postgraduate students to those new to the field.

Soumya Sen is an Assistant Professor at A. K. Choudhury School of Information Technology, University of Calcutta. He received his Ph.D. (Tech) degree from the Department of Computer Science and Engineering, at the same university, in 2016. Before joining A. K. Choudhury School of Information Technology, he worked at IBM India Pvt. Ltd and RS Software. His industrial expertise includes ERP and data warehousing. Currently his research interests are data warehousing and OLAP tools, data mining, big data, service engineering, distributed databases, and machine learning. He has published 1 book, 70 research papers in peer-reviewed journals and international conferences and registered 3 patents in USA, Japan and South Korea. Dr. Sen is a PC member and reviewer for numerous International conferences. Anjan Dutta was born in Kolkata, India, in 1986. He received his B.Tech degree in Information Technology from West Bengal University of Technology in 2008 and M.Tech in Information Technology in 2011 from Calcutta University.He served in IXIA Technologies LTD and TATA Consultancy Services Ltd. (TCSL) over 6 years of period. Initially he worked as a protocol developer in IXIA Technologies LTD and worked on 3gpp wireless protocols. Thereafter he worked as an IT Analyst in TATA Consultancy Services Ltd.(TCSL) Form July, 2011 to July, 2017. He is now employed as an Assistant Professor in Department of Information Technology, Techno India College of Technology, India. He is an active researcher in the field of Big Data, Data Mining, Audio processing and Audio classification etc. Nilanjan Dey was born in Kolkata, India, in 1984. He received his B.Tech. degree inInformation Technology from West Bengal University of Technology in 2005, M.Tech.in Information Technology in 2011 from the same University and Ph.D. in digital image processing in 2015 from Jadavpur University, India.In 2011, he was appointed as an Assistant Professor in the Department of Information Technology at JIS College of Engineering, Kalyani, India followed by Bengal College of Engineering College, Durgapur, India in 2014. He is now employed as an Assistant Professor in Department of Information Technology, Techno India College of Technology, India. He is a visiting fellow of the University of Reading, UK. His research topic is signal processing, machine learning and information security.Dr. Dey is an Associate Editor of IEEE ACCESS and is currently the Editor in-Chief of the International Journal of Ambient Computing and Intelligence. Series Co-editor of Advances in Ubiquitous Sensing Applications for Healthcare (AUSAH), Elsevier and Springer Tracts in Nature-Inspired Computing (STNIC).

Preface 7
Objective of the Book 7
Organization of the Book 7
Chapter 1: Audio Indexing 8
Chapter 2: Speech Processing and Recognition System 8
Chapter 3: Feature Extraction 9
Chapter 4: Audio Classification 10
Contents 11
About the Authors 13
1 Audio Indexing 15
1.1 Introduction 15
1.2 Audio Indexing and Classic Information Retrieval Problem 16
1.3 Large Vocabulary Continuous Speech Recognition (LVCSR) 16
1.3.1 Recognition Errors and Vocabulary Limitations 17
1.3.2 The Out-of-Vocabulary Problem 19
1.3.3 Pros and Cons of LVCSR Speech Analytics 19
1.4 Phonetic Search 20
1.4.1 Phases of Phonetic Search 21
1.4.2 Pros and Cons of Phonetic Search 22
1.5 Comparison Between LVCSR and Phonetic Search 22
1.6 Summary 23
References 24
2 Speech Processing and Recognition System 26
2.1 Introduction 26
2.2 Human Speech Production System 27
2.2.1 Speech Generation 27
2.2.2 Speech Perception 28
2.2.3 Voiced and Unvoiced Speech 28
2.2.4 Model of Human Speech 29
2.3 Automatic Speech Recognition System 30
2.3.1 History of ASR 31
2.3.2 Structure of an ASR System 32
2.3.3 Neural Network and Speech Recognition System 41
2.3.4 Pronunciation Model 46
2.3.5 Language Model 48
2.3.6 Central Decoder 49
2.4 Summary 52
References 54
3 Feature Extraction 57
3.1 Introduction 57
3.2 Basic Audio Features 58
3.2.1 Pitch 58
3.2.2 Timbral Features 59
3.2.3 Rhythmic Features 61
3.2.4 Inharmonicity 61
3.2.5 Autocorrelation 61
3.2.6 Other Features 62
3.2.7 MPEG-7 Features 63
3.3 Feature Extraction Techniques 64
3.3.1 Linear Prediction Coding (LPC) 64
3.3.2 Mel-Frequency Cepstral Coefficient (MFCC) 66
3.3.3 Perceptual Linear Prediction (PLP) 68
3.3.4 Discrete Wavelet Transform (DWT) 70
3.4 Summary 75
References 76
4 Audio Classification 79
4.1 Introduction 79
4.2 Classification Strategies 80
4.2.1 k-Nearest Neighbors (k-NN) 80
4.2.2 Naïve Bayes (NB) Classifier 83
4.2.3 Decision Tree and Speech Classification 86
4.2.4 Support Vector Machine (SVM) and Speech Classification 97
4.3 Neural Network in Speech Classification 99
4.4 Deep Neural Network in Speech Recognition and Classification 101
4.5 Summary 101
References 102
5 Conclusion 106

Erscheint lt. Verlag 30.1.2019
Reihe/Serie SpringerBriefs in Applied Sciences and Technology
SpringerBriefs in Computational Intelligence
Zusatzinfo XIV, 96 p. 41 illus., 3 illus. in color.
Verlagsort Singapore
Sprache englisch
Themenwelt Informatik Software Entwicklung User Interfaces (HCI)
Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Technik Elektrotechnik / Energietechnik
Schlagworte audio indexing • Automatic speech recognition • Bayesian classifier • Classic Information Retrieval problem • Decision Tree • Discrete Wavelet Transform (DWT) • feature extraction • Hidden Markov Model(HMM) • k-nearest neighbors • Large vocabulary continuous speech recognition • Linear prediction Cepstral Coefficient • Linear prediction coding • Mel frequency Cepstral Coefficient • Neural network and speech recognition • Perceptual Linear Prediction (PLP) • Phonetic Search • Support Vector Machines (SVM) • Wavelet Packet Decomposition (WPD)
ISBN-10 981-13-6098-7 / 9811360987
ISBN-13 978-981-13-6098-5 / 9789811360985
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 2,3 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Zusätzliches Feature: Online Lesen
Dieses eBook können Sie zusätzlich zum Download auch online im Webbrowser lesen.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
Eine praxisorientierte Einführung mit Anwendungen in Oracle, SQL …

von Edwin Schicker

eBook Download (2017)
Springer Vieweg (Verlag)
34,99
Unlock the power of deep learning for swift and enhanced results

von Giuseppe Ciaburro

eBook Download (2024)
Packt Publishing Limited (Verlag)
35,99