Turkish Natural Language Processing (eBook)

eBook Download: PDF
2018 | 1st ed. 2018
XV, 355 Seiten
Springer International Publishing (Verlag)
978-3-319-90165-7 (ISBN)

Lese- und Medienproben

Turkish Natural Language Processing -
Systemvoraussetzungen
96,29 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
This book brings together work on Turkish natural language and speech processing over the last 25 years, covering numerous fundamental tasks ranging from morphological processing and language modeling, to full-fledged deep parsing and machine translation, as well as computational resources developed along the way to enable most of this work. Owing to its complex morphology and free constituent order, Turkish has proved to be a fascinating language for natural language and speech processing research and applications.

After an overview of the aspects of Turkish that make it challenging for natural language and speech processing tasks, this book discusses in detail the main tasks and applications of Turkish natural language and speech processing. A compendium of the work on Turkish natural language and speech processing, it is a valuable reference for new researchers considering computational work on Turkish, as well as a one-stop resource for commercial and research institutions planning to develop applications for Turkish. It also serves as a blueprint for similar work on other Turkic languages such as Azeri, Turkmen and Uzbek.



Kemal Oflazer received his Ph.D. in computer science from Carnegie Mellon University in Pittsburgh, USA, and his M.Sc. in computer science and B.Sc. in electrical and electronics engineering from Middle East Technical University, Ankara, Turkey. He is currently a faculty member at Carnegie Mellon University in Doha, Qatar, where he is also the Associate Dean for Research. He has held visiting positions at the Computing Research Laboratory at New Mexico State University, Las Cruces, USA and at the Language Technologies Institute, Carnegie Mellon University. Prior to joining CMU-Qatar, he worked at Sabancı University in Istanbul, Turkey (2000-2008) and Bilkent University in Ankara, Turkey (1989-2000). He has worked extensively on developing natural language processing techniques and resources for Turkish. Oflazer's current research interests include statistical machine translation into morphologically complex languages, the use of NLP for language learning and machine learning for computational morphology. In addition, he was a member of the editorial boards of Computational Linguistics, the Journal of Artificial Intelligence Research, Machine Translation, and Research on Language and Computation and was a book review editor for Natural Language Engineering. He was a member of the nomination and advisory boards for EACL, and served as the program co-chair for ACL 2005, an area chair for COLING 2000, EACL 2003, ACL 2004, ACL 2012, and EMNLP 2013 and the organization committee co-chair for EMNLP 2014. Currently, he is an editorial board member of both Language Resources and Evaluation and Natural Language Engineering journals and is a member of the advisory board for 'SpringerBriefs in Natural Language Processing'.

Murat Saraçlar received his B.Sc. degree in 1994 from the Electrical and Electronics Engineering Department at Bilkent University, Ankara, Turkey, his M.S.E. degree in 1997 and Ph.D. degree in 2001 from the Electrical and Computer Engineering Department at the Johns Hopkins University, Baltimore, USA. From 2000 to 2005, he was with the multimedia services department at AT&T Labs Research, and in 2005 joined the Electrical and Electronic Engineering Department of Boğaziçi University, Istanbul, Turkey, where he is currently a full professor. He was a visiting research scientist at Google Inc., New York, USA (2011-2012) and an academic visitor at IBM T.J. Watson Research Center (2012-2013). Saraçlar was awarded the AT&T Labs Research Excellence Award in 2002, the Turkish Academy of Sciences Young Scientist (TUBA-GEBIP) Award in 2009, and the IBM Faculty Award in 2010. He has published more than 100 articles in journals and conference proceedings. Furthermore, he served as an associate editor for IEEE Signal Processing Letters (2009-2012) and IEEE Transactions on Audio, Speech, and Language Processing (2012-2016). He was an editorial board member of Language Resources and Evaluation from 2012 to 2016, and is currently an editorial board member of Computer Speech and Language as well as a member of the IEEE Signal Processing Society Speech and Language Technical Committee (2007-2009, 2015-2018).

Kemal Oflazer received his Ph.D. in computer science from Carnegie Mellon University in Pittsburgh, USA, and his M.Sc. in computer science and B.Sc. in electrical and electronics engineering from Middle East Technical University, Ankara, Turkey. He is currently a faculty member at Carnegie Mellon University in Doha, Qatar, where he is also the Associate Dean for Research. He has held visiting positions at the Computing Research Laboratory at New Mexico State University, Las Cruces, USA and at the Language Technologies Institute, Carnegie Mellon University. Prior to joining CMU-Qatar, he worked at Sabancı University in Istanbul, Turkey (2000-2008) and Bilkent University in Ankara, Turkey (1989-2000). He has worked extensively on developing natural language processing techniques and resources for Turkish. Oflazer’s current research interests include statistical machine translation into morphologically complex languages, the use of NLP for language learning and machine learning for computational morphology. In addition, he was a member of the editorial boards of Computational Linguistics, the Journal of Artificial Intelligence Research, Machine Translation, and Research on Language and Computation and was a book review editor for Natural Language Engineering. He was a member of the nomination and advisory boards for EACL, and served as the program co-chair for ACL 2005, an area chair for COLING 2000, EACL 2003, ACL 2004, ACL 2012, and EMNLP 2013 and the organization committee co-chair for EMNLP 2014. Currently, he is an editorial board member of both Language Resources and Evaluation and Natural Language Engineering journals and is a member of the advisory board for “SpringerBriefs in Natural Language Processing”.Murat Saraçlar received his B.Sc. degree in 1994 from the Electrical and Electronics Engineering Department at Bilkent University, Ankara, Turkey, his M.S.E. degree in 1997 and Ph.D. degree in 2001 from the Electrical and Computer Engineering Department at the Johns Hopkins University, Baltimore, USA. From 2000 to 2005, he was with the multimedia services department at AT&T Labs Research, and in 2005 joined the Electrical and Electronic Engineering Department of Boğaziçi University, Istanbul, Turkey, where he is currently a full professor. He was a visiting research scientist at Google Inc., New York, USA (2011-2012) and an academic visitor at IBM T.J. Watson Research Center (2012-2013). Saraçlar was awarded the AT&T Labs Research Excellence Award in 2002, the Turkish Academy of Sciences Young Scientist (TUBA-GEBIP) Award in 2009, and the IBM Faculty Award in 2010. He has published more than 100 articles in journals and conference proceedings. Furthermore, he served as an associate editor for IEEE Signal Processing Letters (2009-2012) and IEEE Transactions on Audio, Speech, and Language Processing (2012-2016). He was an editorial board member of Language Resources and Evaluation from 2012 to 2016, and is currently an editorial board member of Computer Speech and Language as well as a member of the IEEE Signal Processing Society Speech and Language Technical Committee (2007-2009, 2015-2018).

1 Turkish and its Challenges for Language and Speech Processing . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Turkish Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Constituent Order and Morphology-Syntax Interface . . . . . . . . . . . . 71.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5 State-of-the-art Tools and Resources for Turkish . . . . . . . . . . . . . . . 15References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Morphological Processing for Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . 212.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Overview of Turkish Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3 Morphophonology and Morphographemics . . . . . . . . . . . . . . . . . . . . 232.4 Root Lexicons and Morphotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4.1 Representational Convention . . . . . . . . . . . . . . . . . . . . . . . . 282.4.2 Nominal Morphotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.3 Verbal Morphotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.4 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4.5 Examples of Morphological Analyses . . . . . . . . . . . . . . . . 322.5 The Architecture of the Turkish Morphological Processor . . . . . . . . 342.6 Processing Real Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.6.1 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.6.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.6.3 Foreign Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.6.4 Unknown Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.7 Multiword Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.7.1 Lexicalized Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.7.2 Semi-lexicalized Collocations . . . . . . . . . . . . . . . . . . . . . . . 382.7.3 Non-lexicalized Collocations . . . . . . . . . . . . . . . . . . . . . . . . 402.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Morphological Disambiguation for Turkish . . . . . . . . . . . . . . . . . . . . . . 533.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.3 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.3.1 Rule-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.3.2 Learning the Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.3.3 Models Based on Inflectional Group n-grams . . . . . . . . . . 593.3.4 Discriminative Methods for Disambiguation . . . . . . . . . . . 603.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.4.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654 Language Modeling for Turkish Text and Speech Processing . . . . . . . 69Ebru Arısoy and Murat Saraçlar4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.2 Language Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.3 Challenges in Statistical Language Modeling for Turkish . . . . . . . . 734.4 Sub-lexical Units for Statistical Language Modeling . . . . . . . . . . . . 754.4.1 Linguistic Sub-lexical Units . . . . . . . . . . . . . . . . . . . . . . . . . 764.4.2 Statistical Sub-lexical Units . . . . . . . . . . . . . . . . . . . . . . . . . 774.5 Statistical Language Modeling for Turkish . . . . . . . . . . . . . . . . . . . . 784.5.1 Language Modeling with Linguistic Sub-lexical Units . . . 784.5.2 Statistical Sub-lexical Units – Morphs . . . . . . . . . . . . . . . . 814.6 Discriminative Language Modeling for Turkish . . . . . . . . . . . . . . . . 814.6.1 Discriminative Language Model . . . . . . . . . . . . . . . . . . . . . 824.6.2 Feature Sets for Turkish DLM . . . . . . . . . . . . . . . . . . . . . . . 834.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895 Turkish Speech Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Ebru Arısoy and Murat Saraçlar5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.2 Foundations of Automatic Speech Recognition . . . . . . . . . . . . . . . . 965.3 Turkish Language Resources for ASR . . . . . . . . . . . . . . . . . . . . . . . . 1005.3.1 Turkish Acoustic and Text Data . . . . . . . . . . . . . . . . . . . . . . 1005.3.2 Linguistic Tools Used in Turkish ASR . . . . . . . . . . . . . . . . 1055.4 Turkish ASR Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.4.1 Newspaper Content Transcription System . . . . . . . . . . . . . 1065.4.2 Turkish Broadcast News Transcription System . . . . . . . . . 1095.4.3 LVCSR System for Call Center Conversations . . . . . . . . . 1125.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1146 Turkish Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119Reyyan Yeniterzi, Gökhan Tür and Kemal Oflazer6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196.2 NER on Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206.3 Task Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.3.1 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216.3.2 Evaluating NER Performance . . . . . . . . . . . . . . . . . . . . . . . 1226.4 Domain and Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.4.1 Formal Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.4.2 Informal Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.4.3 Challenges of Informal Texts for NER . . . . . . . . . . . . . . . . 1266.5 Preprocessing for NER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.5.1 Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.5.2 Morphological Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.5.3 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.6 Approaches used in Turkish NER . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.6.1 Rule-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1296.6.2 Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306.6.3 Machine Learning Approaches . . . . . . . . . . . . . . . . . . . . . . 1316.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1347 Dependency Parsing of Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377.2 Dependency Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1397.3 Morphology and Dependency Relations in Turkish . . . . . . . . . . . . . 1407.3.1 Dependency Relations in Turkish . . . . . . . . . . . . . . . . . . . . 1437.4 An Incremental Data-driven Statistical Dependency ParsingSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1447.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457.4.2 Modeling Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1477.4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1507.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528 Wide-coverage parsing, semantics and morphology . . . . . . . . . . . . . . . 1578.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1578.2 Morphology and Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1608.3 Radical Lexicalization and Predicate-Argument Structure ofsub-lexical Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1618.4 Combinatory Categorial Grammar: CCG. . . . . . . . . . . . . . . . . . . . . . 1628.5 The Turkish Categorial Lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1668.5.1 The Lexemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1688.5.2 The Morphemic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1708.6 Parsing with Automatically Induced CCG Lexicons . . . . . . . . . . . . 1728.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1759 Deep Parsing of Turkish with Lexical-Functional Grammar . . . . . . . . 1799.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1799.2 Lexical-Functional Grammar and Xerox Linguistic Environment . 1809.3 Inflectional Groups as First-class Syntactic Citizens . . . . . . . . . . . . 1819.4 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1849.5 LFG Analyses of Various Linguistic Phenomena . . . . . . . . . . . . . . . 1859.5.1 Noun Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1859.5.2 Adjective Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1869.5.3 Adverbial Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1879.5.4 Postpositional Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1879.5.5 Temporal Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1889.6 Sentential Derivations, Sentences and Free Constituent Order . . . . 1899.6.1 Sentential Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1899.6.2 Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1949.6.3 Handling Constituent Order Variations . . . . . . . . . . . . . . . . 1959.7 Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1989.8 Valency Alternations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1999.8.1 Causatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1999.8.2 Passives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2029.9 Non-canonical Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2049.10 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2069.10.1 Manual Test Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2079.10.2 Sentence Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2079.10.3 Noun Phrase Test Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2089.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20910 Statistical Machine Translation and Turkish . . . . . . . . . . . . . . . . . . . . . 21310.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21310.2 Handling Morphology in Statistical Machine Translation . . . . . . . . 21510.3 The Morpheme Segmentation Approach . . . . . . . . . . . . . . . . . . . . . . 21610.3.1 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 21910.3.2 Word Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22210.3.3 Sample Translations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22310.3.4 Observations on the Morpheme Segmentation Approach . 22410.4 The Syntax-to-Morphology Mapping Approach . . . . . . . . . . . . . . . . 22510.4.1 Mapping Source-side Syntax to Target-side Morphology . 22610.4.2 Experimental Setup and Results . . . . . . . . . . . . . . . . . . . . . 23010.4.3 Experiments with Constituent Reordering . . . . . . . . . . . . . 23710.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24111 Machine Translation Between Turkic Languages . . . . . . . . . . . . . . . . . 24511.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24511.2 Turkic Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24611.2.1 Similarities and Differences of Turkic Languages . . . . . . . 24611.3 Machine Translation between Turkic Languages . . . . . . . . . . . . . . . 25011.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25011.3.2 Morphological Disambiguation . . . . . . . . . . . . . . . . . . . . . . 25311.3.3 Morphological Feature Transfer . . . . . . . . . . . . . . . . . . . . . 25411.3.4 Lexical Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25411.3.5 Statistical Disambiguation Module . . . . . . . . . . . . . . . . . . . 25611.3.6 Sentence Level Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25711.3.7 Morphological Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 25811.4 Machine Translation Evaluation on Turkic Languages . . . . . . . . . . . 25811.4.1 Root Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25911.4.2 Feasible Suffix Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26011.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26112 Sentiment Analysis in Turkish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26512.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26512.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26812.3 Main Difficulties for Turkish Sentiment Analysis . . . . . . . . . . . . . . . 27012.4 Practical Sentiment Analysis for Turkish . . . . . . . . . . . . . . . . . . . . . . 27112.4.1 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27112.4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27312.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27612.5.1 Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27612.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27712.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27913 The Turkish Treebank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28313.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28313.2 What information needs to be represented? . . . . . . . . . . . . . . . . . . . . 28413.2.1 Representing Morphological Information . . . . . . . . . . . . . . 28413.2.2 Representing Syntactic Relations . . . . . . . . . . . . . . . . . . . . 28613.2.3 Example of a Treebank Sentence . . . . . . . . . . . . . . . . . . . . . 28813.3 Evolution of the Turkish Treebank . . . . . . . . . . . . . . . . . . . . . . . . . . . 29013.3.1 The CoNLL Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29013.3.2 Branches of the Turkish Treebank . . . . . . . . . . . . . . . . . . . . 29213.4 The ITU Web Treebank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29313.5 The Annotation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29413.6 The Turkish Universal Dependencies Treebank . . . . . . . . . . . . . . . . 29613.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29814 Linguistic corpora: A view from Turkish . . . . . . . . . . . . . . . . . . . . . . . . 30114.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30114.2 Brief History of Corpus Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . 30214.3 Linguistic Corpora and Corpus Linguistics . . . . . . . . . . . . . . . . . . . . 30414.4 Use of Corpora in Linguistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30814.5 Turkish Linguistic Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30914.5.1 METU-Turkish Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31014.5.2 Turkish National Corpus (TNC) . . . . . . . . . . . . . . . . . . . . . 31314.5.3 Spoken Turkish Corpus (STC) . . . . . . . . . . . . . . . . . . . . . . . 31514.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32115 Turkish Wordnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32715.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32715.2 Basic Structure of Turkish Wordnet . . . . . . . . . . . . . . . . . . . . . . . . . . 32815.2.1 Semantic Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32815.2.2 Linking Wordnets to Each Other . . . . . . . . . . . . . . . . . . . . . 32915.3 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33015.3.1 Merge vs. Expand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33115.3.2 Parts-of-Speech, Definitions and Sense Numbers . . . . . . . 33115.3.3 Lexical Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33215.3.4 No Dangling Nodes or Relations . . . . . . . . . . . . . . . . . . . . . 33215.3.5 Validating Semantic Relations . . . . . . . . . . . . . . . . . . . . . . . 33315.4 The Development Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33315.4.1 First Set of Concepts (Subset I) . . . . . . . . . . . . . . . . . . . . . . 33315.4.2 Extracting Semantic Relations from MonolingualResources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33415.4.3 Second Set of Concepts (Subset II) . . . . . . . . . . . . . . . . . . . 33615.4.4 Shifting to Princeton Wordnet 1.7.1 . . . . . . . . . . . . . . . . . . 33715.4.5 Third Set of Concepts (Subset III) . . . . . . . . . . . . . . . . . . . . 33815.4.6 Shifting to Princeton Wordnet 2.0 . . . . . . . . . . . . . . . . . . . . 33815.4.7 Adding Balkanet-specific Concepts. . . . . . . . . . . . . . . . . . . 33815.4.8 Final Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33915.5 Current Status of Turkish Wordnet . . . . . . . . . . . . . . . . . . . . . . . . . . . 33915.6 Quality Validation and Coverage Tests . . . . . . . . . . . . . . . . . . . . . . . . 34015.7 Applications of Turkish Wordnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34215.7.1 Capturing Semantic Relations through Morphology . . . . . 34215.7.2 Turkish Wordnet in Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34415.8 Conclusion and Directions for Future Work . . . . . . . . . . . . . . . . . . . 345References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34516 Turkish Discourse Bank: Connectives and Their Configurations . . . . 34916.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34916.2 The TDB Annotation Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35116.2.1 Major Sources of Disagreements among Annotators . . . . 35316.2.2 The Discourse Annotation Tool for Turkish . . . . . . . . . . . . 35516.3 Connectives and Discourse Structure . . . . . . . . . . . . . . . . . . . . . . . . . 35516.4 Discourse relation configurations in the TDB . . . . . . . . . . . . . . . . . . 35616.4.1 Independent Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35716.4.2 Full Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35816.4.3 Nested Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35816.4.4 Shared Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36016.4.5 Properly Contained Argument . . . . . . . . . . . . . . . . . . . . . . . 36016.4.6 Properly Contained Relation . . . . . . . . . . . . . . . . . . . . . . . . 36216.4.7 Partially Overlapping Arguments . . . . . . . . . . . . . . . . . . . . 36216.4.8 Pure Crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36416.5 Results and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

Erscheint lt. Verlag 20.7.2018
Reihe/Serie Theory and Applications of Natural Language Processing
Zusatzinfo XV, 355 p. 65 illus., 9 illus. in color.
Verlagsort Cham
Sprache englisch
Themenwelt Schulbuch / Wörterbuch Wörterbuch / Fremdsprachen
Geisteswissenschaften Sprach- / Literaturwissenschaft Sprachwissenschaft
Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Technik Elektrotechnik / Energietechnik
Schlagworte Computational Linguistics • Language resources • Morphologically complex languages • Natural Language Processing • Speech Recognition • statistical language modeling • Turkish language
ISBN-10 3-319-90165-6 / 3319901656
ISBN-13 978-3-319-90165-7 / 9783319901657
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 6,4 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
der Praxis-Guide für Künstliche Intelligenz in Unternehmen - Chancen …

von Thomas R. Köhler; Julia Finkeissen

eBook Download (2024)
Campus Verlag
38,99