Handbook of Linguistic Annotation (eBook)

Nancy Ide, James Pustejovsky (Herausgeber)

eBook Download: PDF

2017 | 1st ed. 2017
IX, 1459 Seiten
Springer Netherlands (Verlag)
978-94-024-0881-2 (ISBN)

Linguistic annotation is an increasingly important activity in the field of computational linguistics because of its critical role in the development of language models for natural language processing applications. Part one of this book covers all phases of the linguistic annotation process, from annotation scheme design and choice of representation format through both the manual and automatic annotation process, evaluation, and iterative improvement of annotation accuracy. The second part of the book includes case studies of annotation projects across the spectrum of linguistic annotation types, including morpho-syntactic tagging, syntactic analyses, a range of semantic analyses (semantic roles, named entities, sentiment and opinion), time and event and spatial analyses, and discourse level analyses including discourse structure, co-reference, etc. Each case study addresses the various phases and processes discussed in the chapters of part one.

James Pustejovsky is the TJX Feldberg professor of computer science at Brandeis University in Waltham, Massachusetts, United States. His expertise includes theoretical and computational modeling of language, specifically: Computational linguistics, Lexical semantics, Knowledge representation, temporal and spatial reasoning and Extraction. His main topics of research are Natural language processing generally, and in particular, the computational analysis of linguistic meaning. He proposed Generative Lexicon theory in lexical semantics. His other interests include temporal reasoning, event semantics, spatial language, language annotation, computational linguistics, and machine learning.

This handbook offers a thorough treatment of the science of linguistic annotation. Leaders in the field guide the reader through the process of modeling, creating an annotation language, building a corpus and evaluating it for correctness. Essential reading for both computer scientists and linguistic researchers.Linguistic annotation is an increasingly important activity in the field of computational linguistics because of its critical role in the development of language models for natural language processing applications. Part one of this book covers all phases of the linguistic annotation process, from annotation scheme design and choice of representation format through both the manual and automatic annotation process, evaluation, and iterative improvement of annotation accuracy. The second part of the book includes case studies of annotation projects across the spectrum of linguistic annotation types, including morpho-syntactic tagging, syntactic analyses, a range of semantic analyses (semantic roles, named entities, sentiment and opinion), time and event and spatial analyses, and discourse level analyses including discourse structure, co-reference, etc. Each case study addresses the various phases and processes discussed in the chapters of part one.

Nancy Ide is Professor of Computer Science at Vassar College in Poughkeepsie, New York, USA. She has been in the field of computational linguistics for over 30 years and made significant contributions to research in word sense disambiguation, computational lexicography, discourse analysis, and the use of semantic web technologies for language data. She is founder of the Text Encoding Initiative (TEI), the first major standard for representing electronic language data, and later developed the XML Corpus Encoding Standard (XCES). More recently, she co-developed the ISO LAF/GrAF representation format for linguistically annotated data. She has also developed major corpora for American English, including the Open American National Corpus (OANC) and the Manually Annotated Sub-Corpus (MASC), and has been a pioneer in efforts to foster open data and resources. Professor Ide is Co-Editor-in-Chief of the journal Language Resources and Evaluation and Editor of the Springer book series Text, Speech, and Language Technology. James Pustejovsky is the TJX Feldberg professor of computer science at Brandeis University in Waltham, Massachusetts, United States. His expertise includes theoretical and computational modeling of language, specifically: Computational linguistics, Lexical semantics, Knowledge representation, temporal and spatial reasoning and Extraction. His main topics of research are Natural language processing generally, and in particular, the computational analysis of linguistic meaning. He proposed Generative Lexicon theory in lexical semantics. His other interests include temporal reasoning, event semantics, spatial language, language annotation, computational linguistics, and machine learning.

Part One.- Introduction, Nancy Ide and James Pustejovsky.- Designing annotation schemes: From theory to model, James Pustejovsky et al..- Designing annotation schemes: From model to representation, Nancy Ide et al..- Community standards, Nancy Ide, et al..- Creating annotations.- Overview of Annotation Creation: Processes and Tools, Tomaz Erjavec and Mark Finlayson.- The Evolution of Text Annotation Frameworks, Graham Wilcock.- Tools for Multi-modal Annotation, Steve Cassidy and Thomas Schmidt.- Collaborative Web-based Tools for Multi-layer Text Annotation, Chris Biemann et al..- Iterative enhancement, Markus Dickinson and Dan Tufis.- Crowdsourcing, Massimo Poesio et al..- Inter-annotator Agreement, Ron Artstein.- Towards Behavior-based Corpus Evaluation, Tokunaga Takenobu.- Using annotations.- Machine learning for Higher-level Linguistic Tasks, Anna Rumshimsky and Amber Stubbs.- Sustainable Development and Refinement of Complex Linguistic Annotations at Scale, Dan Flickinger et al..- Linguistic Annotation in/for Corpus Linguistics, Stefan Th. Gries and Andrea Berez.- Developing Linguistic Theories Using Annotated Corpora, Marie-Catherine de Marneffe and Christopher Potts.- Part Two : Case studies.- General corpora.- MULTEXT-East, Tomaz Erjavec.- The Groningen Meaning Bank, Johan Bos et al..- The Manually Annotated Sub-Corpus (MASC), Nancy Ide.- OntoNotes, Sameer Pradhan et al..- Treebanks.- Prague Dependency Treebank, Jan Hajic et al..- German Treebanks: TIGER and TuBa-D/Z, Stefanie Dipper and Sandra Keubler.- Sinica Treebank, Chu-Ren Huang and Keh-Jiann Chen.- The Hindi/Urdu Treebank Project, Riyaz Ahmad Bhat et al..- Semantic annotation.- Sense tagging.- Semantic Annotation of MASC, Christiane Fellbaum, et al..- VerbNet/PropBank-based Sense Annotation, Meredith Green et al..- Semantic roles.- Current Directions in English and Arabic PropBank, Claire Bonial et al..- FrameNet, Collin Baker.- Opinion, sentiment, subjectivity.- MPQA Opinion Corpus, Theresa Wilson et al..- The JDPA Sentiment Corpus for the Automotive Domain, Jason S. Kessler and Nicolas Nicolov.- Named entities.- Czech Named Entity Corpus, Jana Strakova et al..- Crowdsourcing Named Entity Recognition and Entity Linking Corpora, Kalina Bontcheva et al..- Case Study: Chemistry, Colin Batchelor et al..- Building FactBank or How to Annotate Event Factuality One Step at a Time, Roser Sauri.- Time and event annotation.- TimeML/TimeBank, James Pustejovsky et al..- IIt-TimeML and the Ita-TimeBank: Language Specific Adaptations for Temporal Annotation, Tommaso Caselli and Rachele Sprugnoli.- Space annotation.- ISO-Space, James Pustejovsky.- Spatial Role Labeling Annotation Scheme, Parisa Kordjamshidi et al..- Metaphor.- VU Amsterdam Metaphor Corpus, Tina Krennmayr and Gerard Steen.- Annotation of Linguistic and Conceptual Metaphor, Ekaterina Shutova.- Textual entailment.- FATE : Annotating a Textual Entailment Corpus with FrameNet, Aljoscha Burchardt and Marco Pennacchiotti.- The Recognizing Textual Entailment Challenges Datasets, Luisa Bentivogli et al..-Discourse level annotation.- Coreference.- Phrase Detectives, Massimo Poesio et al..- NAIST Text Corpus: Annotating Predicate-Argument and Coreference Relations in Japanese, Ryu Iida et al..- Discourse structure.- The Penn Discourse Treebank: An Annotated Corpus of Discourse Relations, Rashmi Prasad and Aravind Joshi.- Pair Annotation as a Novel Annotation Procedure: The Case of Turkish Discourse Bank, Isin Demirsahin and Deniz Zeyrek.- Annodis and Related Projects: Case Studies on the Annotation of Discourse Structure, Nicholas Asher et al..- Dialogue Acts.- NICT Kyoto Dialogue corpus, Kiyonori Ohtake and Etsuo Mizukami.- Speech (transcribed).- Case Study: The Austalk Corpus, Steve Cassidy et al..- Annotations in the Nordic Dialect Corpus, Janne Bondi Johannessen.- The Corpus of Interactional Data: a Large Multimodal Annotated Resource, Philippe Blache et al..-Biomedical annotations.- Annotating the Clinical Text -MiPACQ, ShARe, SHARPn and THYME Corpora, Guergana Savova et al..- The Colorado Richly Annotated Full Text (CRAFT) Corpus: Multi-Model Annotation In The Biomedical Domain, K. Bretonnel Cohen et al..- The GENIA Corpus: Annotation Levels and Applications, Paul Thompson et al..- De-identification of Medical Records Through Annotation, Amber Stubbs and Ozlem Uzuner.

Erscheint lt. Verlag	16.6.2017
Zusatzinfo	IX, 1459 p. 264 illus.
Verlagsort	Dordrecht
Sprache	englisch
Themenwelt	Geisteswissenschaften ► Sprach- / Literaturwissenschaft ► Sprachwissenschaft
	Mathematik / Informatik ► Informatik ► Datenbanken
	Informatik ► Software Entwicklung ► User Interfaces (HCI)
Schlagworte	Corpus Linguistics • Evaluating annotations • Intergration of annotations • Language models for natural language processing applications • Linguistic annotation • morphosyntactic tagging
ISBN-10	94-024-0881-9 / 9402408819
ISBN-13	978-94-024-0881-2 / 9789402408812

Haben Sie eine Frage zum Produkt?

PDF (Wasserzeichen)
Größe: 39,4 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasserzeichen und ist damit für Sie personalisiert. Bei einer missbräuchlichen Weitergabe des eBooks an Dritte ist eine Rückverfolgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seitenlayout eignet sich die PDF besonders für Fachbücher mit Spalten, Tabellen und Abbildungen. Eine PDF kann auf fast allen Geräten angezeigt werden, ist aber für kleine Displays (Smartphone, eReader) nur eingeschränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Print-Ausgabe

Buch | Hardcover

427,99 €