Text Analysis with R

For Students of Literature

Matthew L. Jockers, Rosamond Thalken (Autoren)

Buch | Hardcover

XXIII, 277 Seiten

2020 | 2nd ed. 2020
Springer International Publishing (Verlag)
978-3-030-39642-8 (ISBN)

Artikel merken

This practical introduction explores core R procedures and processes and offers a thorough understanding of the possibilities of computational text analysis at both micro and macro scales. Each chapter concludes with a set of practice exercises.

Now in its second edition, Text Analysis with R provides a practical introduction to computational text analysis using the open source programming language R. R is an extremely popular programming language, used throughout the sciences; due to its accessibility, R is now used increasingly in other research areas. In this volume, readers immediately begin working with text, and each chapter examines a new technique or process, allowing readers to obtain a broad exposure to core R procedures and a fundamental understanding of the possibilities of computational text analysis at both the micro and the macro scale. Each chapter builds on its predecessor as readers move from small scale "microanalysis" of single texts to large scale "macroanalysis" of text corpora, and each concludes with a set of practice exercises that reinforce and expand upon the chapter lessons. The book's focus is on making the technical palatable and making the technical useful and immediately gratifying.

Text Analysis with R is written with students and scholars of literature in mind but will be applicable to other humanists and social scientists wishing to extend their methodological toolkit to include quantitative and computational approaches to the study of text. Computation provides access to information in text that readers simply cannot gather using traditional qualitative methods of close reading and human synthesis. This new edition features two new chapters: one that introduces dplyr and tidyr in the context of parsing and analyzing dramatic texts to extract speaker and receiver data, and one on sentiment analysis using the syuzhet package. It is also filled with updated material in every chapter to integrate new developments in the field, current practices in R style, and the use of more efficient algorithms.

Matthew L. Jockers is Professor of English and Data Analytics as well as Dean of the College of Arts and Sciences at Washington State University. He leverages computers and statistical learning methods to extract information from large collections of books. Using tools and techniques from linguistics, natural language processing, and machine learning, Jockers crunches the numbers (and the words) looking for patterns and connections. This computational approach to the study of literature facilitates a type of literary "macroanalysis" or "distant reading" that goes beyond what a traditional literary scholar could hope to study. Dr. Jockers's most recent book, The Bestseller Code (2016, with Jodie Archer), has earned critical praise, and the algorithms at the heart of its research won the University of Nebraska's Breakthrough Innovation of the Year in 2018. In addition to his academic research, Jockers has worked in industry, first as Director of Research at a data-driven book industry startup company and then as Principal Research Scientist and Software Development Engineer in iBooks at Apple, Inc. In 2017, he and Jodie Archer founded "Archer Jockers, LLC," a text mining and consulting company that helps authors develop more successful novels through data analytics. In late 2019, Jockers and others founded a new text mining startup focused on helping independent authors ("indies").

Part I Microanalysis.- 1 R Basics.- 2 First Foray into Text Analysis with R.- 3 Accessing and Comparing Word Frequency Data.- 4 Token Distribution and Regular Expressions.- 5 Token Distribution Analysis by Chapter.- 6 Correlation.- 7 Measures of Lexical Variety.- 8 Hapax Richness.- 9 Do it KWIC.- 10 Do it KWIC(er) (And Better).- Part II Metadata.- 11 Introduction to dplyr.- 12 Parsing TEI XML- 13 Parsing and Analyzing Hamlet.- 14 Sentiment Analysis.- Part III Macroanalysis.- 15 Clustering.- 16 Classification.- 17 Topic Modeling.- 18 Part of Speech Tagging and Named Entity Recognition.- Appendices.- Index.- List of Tables.- List of Figures.

Erscheinungsdatum	03.04.2020
Reihe/Serie	Quantitative Methods in the Humanities and Social Sciences
Zusatzinfo	XXIII, 277 p. 33 illus., 12 illus. in color.
Verlagsort	Cham
Sprache	englisch
Maße	155 x 235 mm
Gewicht	613 g
Themenwelt	Mathematik / Informatik ► Mathematik ► Computerprogramme / Computeralgebra
Themenwelt	Mathematik / Informatik ► Mathematik ► Wahrscheinlichkeit / Kombinatorik
Schlagworte	Computational Literary Studies • Corpus Linguistics and R • digital humanities • Linguistic Computing • Literary Studies • Programming and Literature • R • text analysis • text classification • Text Clustering • Text Mining
ISBN-10	3-030-39642-8 / 3030396428
ISBN-13	978-3-030-39642-8 / 9783030396428
Zustand	Neuware