Data Mining Methods for the Content Analyst
Routledge (Verlag)
978-0-415-89513-2 (ISBN)
Designed as an instructive reference to computer-based analysis approaches, each chapter of this resource explains a set of core concepts and analytical data mining strategies, along with detailed examples and steps relating to current data mining practices. Every technique is considered with regard to context, theory of operation and methodological concerns, and focuses on the capabilities and strengths relating to these technologies. In addressing critical methodologies and approaches to automated analytical techniques, this work provides an essential overview to a broad innovative field.
Kalev Leetaru is Senior Research Scientist for Content Analysis at the University of Illinois Institute for Computing in Humanities, Arts, and Social Science and Center Affiliate of the National Center for Supercomputing Applications. He leads a number of large initiatives centering on the application of high performance computing to grand challenge problems using massive-scale document and data archives.
Chapter 1 - Introduction
What Is Content Analysis?
Why Use Computerized Analysis Techniques?
Standalone Tools Or Integrated Suites
Transitioning From Theory To Practice
Chapter 2 - Obtaining And Preparing Data
Collecting Data From Digital Text Repositories
Are The Data Meaningful?
Using Data In Unintended Ways
Analytical Resolution
Types Of Data Sources
Finding Sources
Searching Text Collections
Sources Of Incompleteness
Licensing Restrictions And Content Blackouts
Measuring Viewership
Accuracy And Convenience Samples
Random Samples
Multimedia Content
Converting To Textual Format
Prosody
Example Data Sources
Patterns In Historical War Coverage
Competitive Intelligence
Global News Coverage
Downloading Content
Digital Content
Print Content
Preparing Content
Document Extraction
Cleaning
Post Filtering
Reforming/Reshaping
Content Proxy Extraction
Chapter 3 - Vocabulary Analysis
The Basics
Word Histograms
Readability Indexes
Normative Comparison
Non-Word Analysis
Colloquialisms: Abbreviations And Slang
Restricting The Analytical Window
Vocabulary Comparison And Evolution / Chronemics
Advanced Topics
Syllables, Rhyming, And ‘Sounds Like’
Gender And Language
Authorship Attribution
Word Morphology, Stemming, And Lemmatization
Chapter 4 – Correlation And Co-Occurrence
Understanding Correlation
Computing Word Correlations
Directionality
Concordance
Co-Occurrence And Search
Language Variation And Lexicons
Non-Co-Occurrence
Correlation With Metadata
Chapter 5 – Lexicons, Entity Extraction, And Geocoding
Lexicons
Lexicons And Categorization
Lexical Correlation
Lexicon Consistency Checks
Thesauri And Vocabulary Expanders
Named Entity Extraction
Lexicons And Processing
Applications
Geocoding, Gazetteers, And Spatial Analysis
Geocoding
Gazetteers And The Geocoding Process
Operating Under Uncertainty
Spatial Analysis
Chapter 6 – Topic Extraction
How Machines Process Text
Unstructured Text
Extracting Meaning From Text
Applications Of Topic Extraction
Comparing/Clustering Documents
Automatic Summarization
Automatic Keyword Generation
Multilingual Analysis: Topic Extraction With Multiple Languages
Chapter 7 – Sentiment Analysis
Examining Emotions
Evolution
Evaluation
Analytical Resolution: Documents vs Objects
Hand-Crafted vs Automatically-Generated Lexicons
Other Sentiment Scales
Limitations
Measuring Language Rather Than Worldview
Chapter 8 – Similarity, Categorization and Clustering
Categorization
The Vector-Space Model
Feature Selection
Feature Reduction
Learning Algorithm
Evaluating ATC Results
Benefits of ATC Over Human Categorization
Limitations of ATC
Applications of ATC
Clustering
Automated Clustering
Hierarchical Clustering
Partitional Clustering
Document Similarity
Vector Space Model
Contingency Tables
Chapter 9 – Network Analysis
Understanding Network Analysis
Network Content Analysis
Representing Network Data
Constructing the Network
Network Structure
The Triad Census
Network Evolution
Visualization and Clustering
Erscheint lt. Verlag | 2.2.2012 |
---|---|
Reihe/Serie | Routledge Communication Series |
Zusatzinfo | 6 Tables, black and white |
Verlagsort | London |
Sprache | englisch |
Maße | 152 x 229 mm |
Gewicht | 380 g |
Themenwelt | Informatik ► Datenbanken ► Data Warehouse / Data Mining |
Informatik ► Theorie / Studium ► Künstliche Intelligenz / Robotik | |
Sozialwissenschaften ► Kommunikation / Medien ► Kommunikationswissenschaft | |
ISBN-10 | 0-415-89513-8 / 0415895138 |
ISBN-13 | 978-0-415-89513-2 / 9780415895132 |
Zustand | Neuware |
Haben Sie eine Frage zum Produkt? |
aus dem Bereich