Data Mining Methods for the Content Analyst - Kalev Leetaru

Data Mining Methods for the Content Analyst

An Introduction to the Computational Analysis of Content

(Autor)

Buch | Hardcover
116 Seiten
2011
Routledge (Verlag)
978-0-415-89513-2 (ISBN)
168,35 inkl. MwSt
With continuous advancements and an increase in user popularity, data mining technologies serve as an invaluable resource for researchers across a wide range of disciplines in the humanities and social sciences. In this comprehensive guide, author and research scientist Kalev Leetaru introduces the approaches, strategies, and methodologies of current data mining techniques, offering insights for new and experienced users alike.

Designed as an instructive reference to computer-based analysis approaches, each chapter of this resource explains a set of core concepts and analytical data mining strategies, along with detailed examples and steps relating to current data mining practices. Every technique is considered with regard to context, theory of operation and methodological concerns, and focuses on the capabilities and strengths relating to these technologies. In addressing critical methodologies and approaches to automated analytical techniques, this work provides an essential overview to a broad innovative field.

Kalev Leetaru is Senior Research Scientist for Content Analysis at the University of Illinois Institute for Computing in Humanities, Arts, and Social Science and Center Affiliate of the National Center for Supercomputing Applications. He leads a number of large initiatives centering on the application of high performance computing to grand challenge problems using massive-scale document and data archives.

Chapter 1 - Introduction






What Is Content Analysis?



Why Use Computerized Analysis Techniques?



Standalone Tools Or Integrated Suites



Transitioning From Theory To Practice




Chapter 2 - Obtaining And Preparing Data





Collecting Data From Digital Text Repositories





Are The Data Meaningful?



Using Data In Unintended Ways



Analytical Resolution



Types Of Data Sources



Finding Sources



Searching Text Collections



Sources Of Incompleteness



Licensing Restrictions And Content Blackouts



Measuring Viewership



Accuracy And Convenience Samples



Random Samples




Multimedia Content





Converting To Textual Format



Prosody




Example Data Sources





Patterns In Historical War Coverage



Competitive Intelligence



Global News Coverage




Downloading Content





Digital Content



Print Content




Preparing Content





Document Extraction



Cleaning



Post Filtering



Reforming/Reshaping



Content Proxy Extraction






Chapter 3 - Vocabulary Analysis





The Basics





Word Histograms



Readability Indexes



Normative Comparison



Non-Word Analysis



Colloquialisms: Abbreviations And Slang



Restricting The Analytical Window




Vocabulary Comparison And Evolution / Chronemics



Advanced Topics





Syllables, Rhyming, And ‘Sounds Like’



Gender And Language



Authorship Attribution



Word Morphology, Stemming, And Lemmatization






Chapter 4 – Correlation And Co-Occurrence





Understanding Correlation



Computing Word Correlations



Directionality



Concordance



Co-Occurrence And Search



Language Variation And Lexicons



Non-Co-Occurrence



Correlation With Metadata




Chapter 5 – Lexicons, Entity Extraction, And Geocoding





Lexicons





Lexicons And Categorization



Lexical Correlation



Lexicon Consistency Checks



Thesauri And Vocabulary Expanders




Named Entity Extraction





Lexicons And Processing



Applications




Geocoding, Gazetteers, And Spatial Analysis





Geocoding



Gazetteers And The Geocoding Process



Operating Under Uncertainty



Spatial Analysis






Chapter 6 – Topic Extraction





How Machines Process Text





Unstructured Text



Extracting Meaning From Text




Applications Of Topic Extraction





Comparing/Clustering Documents



Automatic Summarization



Automatic Keyword Generation




Multilingual Analysis: Topic Extraction With Multiple Languages




Chapter 7 – Sentiment Analysis





Examining Emotions





Evolution



Evaluation



Analytical Resolution: Documents vs Objects



Hand-Crafted vs Automatically-Generated Lexicons



Other Sentiment Scales



Limitations



Measuring Language Rather Than Worldview






Chapter 8 – Similarity, Categorization and Clustering





Categorization





The Vector-Space Model



Feature Selection



Feature Reduction



Learning Algorithm



Evaluating ATC Results



Benefits of ATC Over Human Categorization



Limitations of ATC



Applications of ATC




Clustering





Automated Clustering



Hierarchical Clustering



Partitional Clustering




Document Similarity





Vector Space Model



Contingency Tables






Chapter 9 – Network Analysis





Understanding Network Analysis



Network Content Analysis



Representing Network Data



Constructing the Network



Network Structure



The Triad Census



Network Evolution



Visualization and Clustering

Erscheint lt. Verlag 2.2.2012
Reihe/Serie Routledge Communication Series
Zusatzinfo 6 Tables, black and white
Verlagsort London
Sprache englisch
Maße 152 x 229 mm
Gewicht 380 g
Themenwelt Informatik Datenbanken Data Warehouse / Data Mining
Informatik Theorie / Studium Künstliche Intelligenz / Robotik
Sozialwissenschaften Kommunikation / Medien Kommunikationswissenschaft
ISBN-10 0-415-89513-8 / 0415895138
ISBN-13 978-0-415-89513-2 / 9780415895132
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Mehr entdecken
aus dem Bereich
Datenanalyse für Künstliche Intelligenz

von Jürgen Cleve; Uwe Lämmel

Buch | Softcover (2024)
De Gruyter Oldenbourg (Verlag)
74,95
Auswertung von Daten mit pandas, NumPy und IPython

von Wes McKinney

Buch | Softcover (2023)
O'Reilly (Verlag)
44,90