Big Data and Social Science

A Practical Guide to Methods and Tools

Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, JULIA LANE (Herausgeber)

Buch | Hardcover

356 Seiten

2016
Productivity Press (Verlag)
978-1-4987-5140-7 (ISBN)

Titel erscheint in neuer Auflage

Artikel merken

Zu diesem Artikel existiert eine Nachauflage

Big Data and Social Science

Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter, JULIA LANE

2020

Buch | Hardcover

168, ³⁵ €

zur Neuauflage

Both Traditional Students and Working Professionals Acquire the Skills to Analyze Social Problems.

Big Data and Social Science: A Practical Guide to Methods and Tools shows how to apply data science to real-world problems in both research and the practice. The book provides practical guidance on combining methods and tools from computer science, statistics, and social science. This concrete approach is illustrated throughout using an important national problem, the quantitative study of innovation.

The text draws on the expertise of prominent leaders in statistics, the social sciences, data science, and computer science to teach students how to use modern social science research principles as well as the best analytical and computational tools. It uses a real-world challenge to introduce how these tools are used to identify and capture appropriate data, apply data science models and tools to that data, and recognize and respond to data errors and limitations.

For more information, including sample chapters and news, please visit the author's website.

Ian Foster is a professor of computer science at the University of Chicago as well as a senior scientist and distinguished fellow at Argonne National Laboratory. His research addresses innovative applications of distributed, parallel, and data-intensive computing technologies to scientific problems in such domains as climate change and biomedicine. Methods and software developed under his leadership underpin many large national and international cyberinfrastructures. He is a fellow of the American Association for the Advancement of Science, the Association for Computing Machinery, and the British Computer Society. He received a PhD in computer science from Imperial College London. Rayid Ghani is the director of the Center for Data Science and Public Policy, research director at the Computation Institute, and senior fellow at the Harris School of Public Policy at the University of Chicago. His research focuses on using machine learning and data science for high-impact social good and public policy problems in areas such as education, healthcare, energy, transportation, economic development, and public safety. Ron S. Jarmin is the assistant director for research and methodology at the U.S. Census Bureau, where he oversees a broad research program in statistics, survey methodology, and economics to improve economic and social measurement within the U.S. federal statistical system. He is the author of many papers in the areas of industrial organization, business dynamics, entrepreneurship, technology and firm performance, urban economics, data access, and statistical disclosure avoidance. He earned a PhD in economics from the University of Oregon. Frauke Kreuter is a professor at both the University of Maryland and the University of Mannheim. She is also head of the Statistical Methods Group at the Institute for Employment Research in Germany. Among her over 100 publications are several textbooks in survey statistics and data analysis. She established the International Program in Survey and Data Science and is a fellow of the American Statistical Association. She received a PhD from the University of Konstanz. Julia Lane is a professor at the NYU Wagner Graduate School of Public Service and the NYU Center for Urban Science and Progress. She is also an NYU Provostial Fellow for Innovation Analytics. She co-founded the UMETRICS and STAR METRICS programs at the National Science Foundation, established a data enclave at NORC/University of Chicago, and co-founded the Longitudinal Employer-Household Dynamics Program at the U.S. Census Bureau and the Linked Employer Employee Database at Statistics New Zealand. She is the author/editor of 10 books and the author of over 70 articles in leading journals, including Nature and Science. She is an elected fellow of the American Association for the Advancement of Science and a fellow of the American Statistical Association. She received a PhD in economics from the University of Missouri.

Introduction
Why this book?
Defining big data and its value
Social science, inference, and big data
Social science, data quality, and big data
New tools for new data
The book’s "use case"
The structure of the book
Resources

Capture and Curation
Working with Web Data and APIs
Introduction
Scraping information from the web
New data in the research enterprise
A functional view
Programming against an API
Using the ORCID API via a wrapper
Quality, scope, and management
Integrating data from multiple sources
Working with the graph of relationships
Bringing it together: Tracking pathways to impact
Summary
Resources
Acknowledgements and copyright

Record Linkage
Motivation
Introduction to record linkage
Preprocessing data
Classification
Record linkage and data protection
Summary
Resources

Databases
Introduction
DBMS: When and why
Relational DBMSs
Linking DBMSs and other tools
NoSQL databases
Spatial databases
Which database to use?
Summary
Resources

Programming with Big Data
Introduction
The MapReduce programming model
Apache Hadoop MapReduce
Apache Spark
Summary
Resources

Modeling and Analysis
Machine Learning
Introduction
What is machine learning?
The machine learning process
Problem formulation: Mapping a problem to machine learning methods
Methods
Evaluation
Practical tips
How can social scientists benefit from machine learning?
Advanced topics
Summary
Resources

Text Analysis
Understanding what people write
How to analyze text
Approaches and applications
Evaluation
Text analysis tools
Summary
Resources

Networks: The Basics
Introduction
Network data
Network measures
Comparing collaboration networks
Summary
Resources

Inference and Ethics
Information Visualization
Introduction
Developing effective visualizations
A data-by-tasks taxonomy
Challenges
Summary
Resources

Errors and Inference
Introduction
The total error paradigm
Illustrations of errors in big data
Errors in big data analytics
Some methods for mitigating, detecting, and compensating for errors
Summary
Resources

Privacy and Confidentiality
Introduction
Why is access at all important?
Providing access
The new challenges
Legal and ethical framework
Summary
Resources

Workbooks
Introduction
Environment
Workbook details
Resources

Bibliography

Erscheinungsdatum	25.05.2016
Reihe/Serie	Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences
Zusatzinfo	this is a full color (figure + text) latex disk book.; 22 Tables, color; 60 Illustrations, color
Verlagsort	Portland
Sprache	englisch
Maße	191 x 235 mm
Gewicht	960 g
Themenwelt	Geisteswissenschaften ► Psychologie
	Mathematik / Informatik ► Mathematik
	Sozialwissenschaften ► Soziologie ► Empirische Sozialforschung
	Technik ► Elektrotechnik / Energietechnik
ISBN-10	1-4987-5140-7 / 1498751407
ISBN-13	978-1-4987-5140-7 / 9781498751407
Zustand	Neuware