Text Analytics with Python

A Practical Real-World Approach to Gaining Actionable Insights from your Data

Dipanjan Sarkar (Autor)

Buch | Softcover

385 Seiten

2016 | 1st ed.
Apress (Verlag)
978-1-4842-2387-1 (ISBN)

Titel erscheint in neuer Auflage

Artikel merken

Zu diesem Artikel existiert eine Nachauflage

Text Analytics with Python

Dipanjan Sarkar

2019

Buch | Softcover

48, ¹⁴ €

zur Neuauflage

Derive useful insights from your data using Python. You will learn both basic and advanced concepts, including text and language syntax, structure, and semantics. You will focus on algorithms and techniques, such as text classification, clustering, topic modeling, and text summarization.

Text Analytics with Python teaches you the techniques related to natural language processing and text analytics, and you will gain the skills to know which technique is best suited to solve a particular problem. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.

What You Will Learn:

Understand the major concepts and techniques of natural language processing (NLP) and text analytics, including syntax and structure

Build a text classification system to categorize news articles, analyze app or game reviews using topic modeling and text summarization, and cluster popular movie synopses and analyze the sentiment of movie reviews

Implement Python and popular open source libraries in NLP and text analytics, such as the natural language toolkit (nltk), gensim, scikit-learn, spaCy and Pattern

Who This Book Is For :
IT professionals, analysts, developers, linguistic experts, data scientists, and anyone with a keen interest in linguistics, analytics, and generating insights from textual data

Dipanjan Sarkar is a Data Scientist at Intel, the world's largest silicon company which is on a mission to make the world more connected and productive. He primarily works on Analytics, Business Intelligence, Application Development and building large scale Intelligent Systems. He received his master's degree in Information Technology from the International Institute of Information Technology, Bangalore with a focus on Data Science and Software Engineering. He is also an avid supporter of self-learning, especially Massive Open Online Courses and holds a Data Science Specialization from Johns Hopkins University on Coursera. He has been an analytics practitioner for over 4 years now specializing in statistical, predictive and text analytics. He has also authored a couple of books on R and Machine Learning and occasionally reviews technical books and acts as a course beta tester for Coursera. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science and more recently, artificial intelligence and deep learning. In his spare time he loves reading, gaming and watching popular sitcoms and football.

Chapter 1: Natural Language Basics.-Chapter Goal: Introduces the readers to the basics of NLP and Text processingNo of pages: 40 - 50Sub -Topics1.Language Syntax and Structure2.Text formats and grammars3.Lexical and Text Corpora resources4.Deep dive into the Wordnet corpus5.Parts of speech, Stemming and lemmatizationChapter 2: Python Refresher for Text AnalyticsChapter Goal: A useful chapter for people who do not know python as well as for experienced people who can use it as a quick reference for useful commands and techniques for text processing using pythonNo of pages: 30 - 35Sub - Topics1.Python data structures and constructs 2.Functions, conditionals and code flow3. Handling strings with Python4. Regular Expressions with Python5.Quick glance into nltk, gensim and patternChapter 3: Text Processing Chapter Goal: This chapter covers all the techniques and capabilities needed for processing and parsing text into easy to understand formats. We also look at how to segment and normalize text. No of pages : 35 - 40Sub - Topics: 1.Sentence and word tokenization2.Text tagging and chunking3.Text Parse Trees3.Text normalization4. Text spell checks and removal of redundant characters5. Synonyms and Synsets
Chapter 4: Text ClassificationChapter Goal: Introduces readers to the concept of classification as a supervised machine learning problem and looks at a real world example for classifying text documentsNo of pages: 40 - 45Sub - Topics: 1. Classification basics2. Types of classifiers3. Feature generation of text documents4.Types of feature generators5.Building a text classifier on real world data 6.Evaluating Classifiers7.Binary and multi-class classification models

Chapter 5: Text summarization and topic modelingChapter Goal: Introduces the concepts of text summarization, n-gram tagging analysis and topic models to the readers and looks at some real world datasets and hands-on implementations on the sameNo of pages: 40 - 45Sub - Topics: 1.Text summarization concepts2.Dimensionality reduction3. N-gram tagging models4. Topic modeling using LDA and LSA5. Generate topics from real world data6. N-gram analysis to generate patterns from app reviews

Chapter 6: Text Clustering and Similarity analysisChapter Goal: We look at unsupervised machine learning concepts here like text clustering and similarity measuresNo of pages: 35 - 40Sub - Topics: 1. Clustering concepts2. Analyzing text similarity3. Implementing text similarity with cosine, jaccard measures4. Text clustering algorithms5. Hands on text clustering on real world data
Chapter 7: Sentiment Analysis Chapter Goal: We look at solving a popular problem of analyzing sentiment from text using a combination of methods learnt earlier including classification and also lexical analysisNo of pages: 35 - 40Sub - Topics: 1. What is sentiment analysis2. Looking at lexical corpora for sentiment 3. Analyzing sentiment using lexical analysis (hands-on)4. Building a sentiment analysis classifier (hands-on)

Erscheinungsdatum	16.12.2016
Zusatzinfo	33 Illustrations, color; 21 Illustrations, black and white; XXI, 385 p. 54 illus., 33 illus. in color.
Verlagsort	Berkley
Sprache	englisch
Maße	155 x 235 mm
Gewicht	6204 g
Themenwelt	Informatik ► Datenbanken ► Data Warehouse / Data Mining
	Mathematik / Informatik ► Informatik ► Netzwerke
	Mathematik / Informatik ► Informatik ► Programmiersprachen / -werkzeuge
Schlagworte	Deep Learning in Text Analysis • Natural Language Basics • Python • sentiment analysis • text classification • Text Clustering • Text Mining
ISBN-10	1-4842-2387-X / 148422387X
ISBN-13	978-1-4842-2387-1 / 9781484223871
Zustand	Neuware