Building and Using Comparable Corpora for Multilingual Natural Language Processing (eBook)

eBook Download: PDF
2023 | 2023
VIII, 133 Seiten
Springer International Publishing (Verlag)
978-3-031-31384-4 (ISBN)

Lese- und Medienproben

Building and Using Comparable Corpora for Multilingual Natural Language Processing - Serge Sharoff, Reinhard Rapp, Pierre Zweigenbaum
Systemvoraussetzungen
42,79 inkl. MwSt
  • Download sofort lieferbar
  • Zahlungsarten anzeigen
This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.



Serge Sharoff, Ph.D.,  is Professor of Language Technology and Digital Humanities at the Centre for Translation Studies, University of Leeds. His research focuses on Natural Language Processing, including automated methods for collecting very large corpora from the Web, their analysis in terms of domains, genres or text quality, as well as extraction of lexicons and terminology from corpora. The application domains for this kind of research in the Digital Humanities include text annotation, information retrieval, machine translation and computer-assisted language learning. His research stresses the inherent multilingualism of NLP, which implies that tools and resources can be ported across languages by paying attention to the respective linguistic properties.

Pierre Zweigenbaum, Ph.D., FACMI, FIAHSI, is a Senior Researcher at the Interdisciplinary Laboratory for Digital Sciences (LISN, Orsay, France), a laboratory of the French National Center for Scientific Research (CNRS) and Université Paris-Saclay, where he has led the ILES Natural Language Processing group. Before CNRS he was a researcher at Paris Public Hospitals in an Inserm team. He also was a part-time professor at the National Institute for Oriental Languages and Civilizations.  His research focus is Natural Language Processing, with medicine as a main application domain. He has also designed methods to acquire linguistic knowledge automatically from corpora and thesauri, to help extend monolingual and bilingual lexicons and terminologies, using parallel and comparable corpora.

Reinhard Rapp, Ph.D., is Professor of Applied Translation Studies at Magdeburg-Stendal University of Applied Sciences and is also affiliated with the University of Mainz. He has conducted EU-funded research projects at the University of Geneva, the University of Tarragona, the University of Leeds, at Aix-Marseille University, at the University of Mainz and at the Athena Research Center in Athens. His main research interests are in computational linguistics, translation studies and cognitive science. His publications have dealt with unsupervised language learning from text corpora, word sense disambiguation, text mining, thesaurus construction, bilingual dictionary induction from parallel and comparable corpora, and with statistical and neural machine translation. 

Erscheint lt. Verlag 23.8.2023
Reihe/Serie Synthesis Lectures on Human Language Technologies
Synthesis Lectures on Human Language Technologies
Zusatzinfo VIII, 133 p. 31 illus., 14 illus. in color.
Sprache englisch
Themenwelt Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Schlagworte comparable corpora • Cross-lingual Models • Machine Translation • Multilingual Natural Language Processing • Natural Language Processing • Parallel Corpora • vector space model
ISBN-10 3-031-31384-4 / 3031313844
ISBN-13 978-3-031-31384-4 / 9783031313844
Haben Sie eine Frage zum Produkt?
PDFPDF (Wasserzeichen)
Größe: 5,3 MB

DRM: Digitales Wasserzeichen
Dieses eBook enthält ein digitales Wasser­zeichen und ist damit für Sie persona­lisiert. Bei einer missbräuch­lichen Weiter­gabe des eBooks an Dritte ist eine Rück­ver­folgung an die Quelle möglich.

Dateiformat: PDF (Portable Document Format)
Mit einem festen Seiten­layout eignet sich die PDF besonders für Fach­bücher mit Spalten, Tabellen und Abbild­ungen. Eine PDF kann auf fast allen Geräten ange­zeigt werden, ist aber für kleine Displays (Smart­phone, eReader) nur einge­schränkt geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür einen PDF-Viewer - z.B. den Adobe Reader oder Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür einen PDF-Viewer - z.B. die kostenlose Adobe Digital Editions-App.

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

Mehr entdecken
aus dem Bereich
der Praxis-Guide für Künstliche Intelligenz in Unternehmen - Chancen …

von Thomas R. Köhler; Julia Finkeissen

eBook Download (2024)
Campus Verlag
38,99
Wie du KI richtig nutzt - schreiben, recherchieren, Bilder erstellen, …

von Rainer Hattenhauer

eBook Download (2023)
Rheinwerk Computing (Verlag)
17,43