Amazon cover image
Image from Amazon.com
Image from Google Jackets

Text Mining in Practice with R.

By: Language: spa. Publication details: India Wiley 2017Edition: Primera ediciónDescription: 320 páginas; Gráficos, Tablas; 15.7 x 2.3 x 23.4 cmISBN:
  • 9781119282013
Subject(s): DDC classification:
  • 004.3 K98
Contents:
TABLE OF CONTENTS Foreword 1 Chapter 1: What is Text Mining? 1 1.1 What is it? 1 1.1.1 What is text mining in practice? 1 1.1.2 Where does text mining fit? 1 1.2 Why we care about text mining? 1 1.2.1 What are the consequences of ignoring text? 1 1.2.2 What are the benefits of text mining? 1 1.2.3 Setting Expectations: When text mining should (and should not) be used. 1 1.3 A basic workflow. How the process works. 1 1.4 What tools do I need to get started with this? 1 1.5 A Simple Example 1 1.6 A Real World Use Case 1 1.7 Summary 1 Chapter 2: Basics of text mining 1 2.1 What is Text Mining in a practical sense? 1 2.2 Types of Text Mining: Bag of Words. 1 2.2.1 Types of Text Mining: Syntactic Parsing. 1 2.3 The text mining process in context 1 2.4 String Manipulation: Number of Characters & Substitutions 1 2.4.1 String Manipulations: Paste, Character Splits & Extractions 1 2.5 Keyword Scanning 1 2.6 String Packages stringr & stringi 1 2.7 Preprocessing Steps for Bag of Words Text Mining 1 2.8 Spell Check 1 2.9 Frequent Terms & Associations 1 2.9 Delta Assist Wrap Up 1 2.10 Summary 1 Chapter 3: Common Text Mining Visualizations 1 3.1 A tale of two (or three) cultures 1 3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1 3.2.1 Term Frequency 1 3.2.2 Word Associations 1 3.2.3 Word Networks 1 3.3 Simple Word Clusters: Hierarchical Dendrograms 1 3.4 Word Clouds: Overused but Effective 1 3.4.1 One Corpus Word Clouds 1 3.4.2 Comparing and Contrasting Corpora in Word Clouds 1 3.4.3 Polarized Tag Plot 1 3.5 Summary 1 Chapter 4: Sentiment Scoring 1 4.1 What is Sentiment Analysis? 1 4.2 Sentiment Scoring: Parlor Trick or Insightful? 1 4.3 Polarity: Simple Sentiment Scoring 1 4.3.1 Subjectivity Lexicons 1 4.3.2 Qdap’s Scoring for positive and negative word choice 1 4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1 4.4 Emoticons :) Dealing with these perplexing clues 1 4.4.1 Symbol-Based Emoticons Native to R 1 4.4.2 Punctuation Based Emoticons 1 4.4.3 Emoji 1 4.5 R’s Archived Sentiment Scoring Library 1 4.5 Sentiment the tidytext way 1 4.6 Airbnb.com Boston Wrap Up 1 4.7 Summary 1 Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1 5.1 What is clustering? 1 5.1.1 K Means Clustering 1 5.1.2 Spherical K Means Clustering 1 5.1.3 K Mediod Clustering 1 5.1.4 Evaluating the cluster approaches 1 5.2 Calculating & Exploring String Distance 1 5.2.1 What is string distance? 1 5.2.2 Fuzzy Matching-amatch, ain 1 5.2.3 Similarity Distances- stringdist, stringdistmatrix 1 5.3 LDA Topic Modeling Explained 1 5.3.2 Topic Modeling Case Study 1 5.3.2 LDA &LDAvis 1 5.4 Text to Vectors using “text2vec” 1 5.4.1 text2vec 1 5.5 Summary 1 Chapter 6: Document Classification: Finding Clickbait from Headlines 1 6.1 What is document classification? 1 6.2 Clickbait Case Study 1 6.2.2 Session & Data Set Up 1 6.2.3 GLMNET Training 1 6.2.4 GLMNET Test Predictions 1 6.2.5 Test Set Evaluation 1 6.2.6 Finding the most impactful words 1 6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1 6.3 Summary 1 Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1 7.1 Classification Vs Prediction 1 7.2 Case Study I: Will this patient come back to the hospital? 1 7.2.2 Patient Readmission in the Text Mining Workflow 1 7.2.3 Session & Data Set Up 1 7.2.4 Patient Modeling 1 7.2.5 More Model KPI: AUC, Recall, Precision & F1 1 7.2.5.1 Additional Evaluation Metrics 1 7.2.6 Apply the model to new patients 1 7.2.7 Patient Readmission Conclusion 1 7.3 Case Study II: Predicting Box Office Success 1 7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1 7.3.3 Session & Data Set Up 1 7.3.4 Opening Weekend Modeling 1 7.3.5 Model Evaluation 1 7.3.6 Apply the Model to new Movie Reviews 1 7.3.7 Movie Revenue Conclusion 1 7.4 Summary 1 Chapter 8: The OpenNLP Project 1 8.1 What is the OpenNLP project? 1 8.2 R’s OpenNLP Package 1 8.3 Named Entities in Hillary Clinton’s Email 1 8.3.1 R Session Set-up 1 8.3.2 Minor Text Cleaning 1 8.3.3 Using OpenNLP on a single email 1 8.3.4 Using OpenNLP on multiple documents 1 8.3.5 Revisiting the Text Mining Workflow 1 8.4 Analyzing the Named Entities 1 8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1 8.4.2 Mapping Only European Locations 1 8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1 8.4.4 Stock Charts for Entities 1 8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1 8.5 Summary 1 Chapter 9: Text Sources 1 9.1 Sourcing Text 1 9.2 Web Sources 1 9.2.1 Web Scraping a Single Page with rvest 1 9.2.2 Web Scraping Multiple Pages with rvest 1 9.2.3 Application Program Interfaces (APIs) 1 9.2.4 Newspaper Articles from The Guardian Newspaper 1 9.2.5 Tweets using the “twitteR” Package 1 9.2.6 Calling an API without a dedicated R package 1 9.2.7 Using jsonlite to access the New York Times 1 9.2.8 Using RCurl & XML to Parse Google News Feeds 1 9.2.9 The tm library Web-Mining Plugin 1 9.3 Getting Text from File Sources 1 9.3.1 Individual CSV, TXT and Microsoft Office Files 1 9.3.2 Reading multiple files quickly 1 9.3.2 Extracting Text from PDFs 1 9.3.3 Optical Character Recognition: Extracting Text from Images 1 9.4 Summary 1
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)

A reliable, cost-effective approach to extracting priceless business information from all sources of text

Excavating actionable business insights from data is a complex undertaking, and that complexity is magnified by an order of magnitude when the focus is on documents and other text information. This book takes a practical, hands-on approach to teaching you a reliable, cost-effective approach to mining the vast, untold riches buried within all forms of text using R.

Author Ted Kwartler clearly describes all of the tools needed to perform text mining and shows you how to use them to identify practical business applications to get your creative text mining efforts started right away. With the help of numerous real-world examples and case studies from industries ranging from healthcare to entertainment to telecommunications, he demonstrates how to execute an array of text mining processes and functions, including sentiment scoring, topic modelling, predictive modelling, extracting clickbait from headlines, and more. You'll learn how to:

Identify actionable social media posts to improve customer service
Use text mining in HR to identify candidate perceptions of an organisation, match job descriptions with resumes, and more
Extract priceless information from virtually all digital and print sources, including the news media, social media sites, PDFs, and even JPEG and GIF image files
Make text mining an integral component of marketing in order to identify brand evangelists, impact customer propensity modelling, and much more

TABLE OF CONTENTS
Foreword 1

Chapter 1: What is Text Mining? 1

1.1 What is it? 1

1.1.1 What is text mining in practice? 1

1.1.2 Where does text mining fit? 1

1.2 Why we care about text mining? 1

1.2.1 What are the consequences of ignoring text? 1

1.2.2 What are the benefits of text mining? 1

1.2.3 Setting Expectations: When text mining should (and should not) be used. 1

1.3 A basic workflow. How the process works. 1

1.4 What tools do I need to get started with this? 1

1.5 A Simple Example 1

1.6 A Real World Use Case 1

1.7 Summary 1

Chapter 2: Basics of text mining 1

2.1 What is Text Mining in a practical sense? 1

2.2 Types of Text Mining: Bag of Words. 1

2.2.1 Types of Text Mining: Syntactic Parsing. 1

2.3 The text mining process in context 1

2.4 String Manipulation: Number of Characters & Substitutions 1

2.4.1 String Manipulations: Paste, Character Splits & Extractions 1

2.5 Keyword Scanning 1

2.6 String Packages stringr & stringi 1

2.7 Preprocessing Steps for Bag of Words Text Mining 1

2.8 Spell Check 1

2.9 Frequent Terms & Associations 1

2.9 Delta Assist Wrap Up 1

2.10 Summary 1

Chapter 3: Common Text Mining Visualizations 1

3.1 A tale of two (or three) cultures 1

3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1

3.2.1 Term Frequency 1

3.2.2 Word Associations 1

3.2.3 Word Networks 1

3.3 Simple Word Clusters: Hierarchical Dendrograms 1

3.4 Word Clouds: Overused but Effective 1

3.4.1 One Corpus Word Clouds 1

3.4.2 Comparing and Contrasting Corpora in Word Clouds 1

3.4.3 Polarized Tag Plot 1

3.5 Summary 1

Chapter 4: Sentiment Scoring 1

4.1 What is Sentiment Analysis? 1

4.2 Sentiment Scoring: Parlor Trick or Insightful? 1

4.3 Polarity: Simple Sentiment Scoring 1

4.3.1 Subjectivity Lexicons 1

4.3.2 Qdap’s Scoring for positive and negative word choice 1

4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1

4.4 Emoticons :) Dealing with these perplexing clues 1

4.4.1 Symbol-Based Emoticons Native to R 1

4.4.2 Punctuation Based Emoticons 1

4.4.3 Emoji 1

4.5 R’s Archived Sentiment Scoring Library 1

4.5 Sentiment the tidytext way 1

4.6 Airbnb.com Boston Wrap Up 1

4.7 Summary 1

Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1

5.1 What is clustering? 1

5.1.1 K Means Clustering 1

5.1.2 Spherical K Means Clustering 1

5.1.3 K Mediod Clustering 1

5.1.4 Evaluating the cluster approaches 1

5.2 Calculating & Exploring String Distance 1

5.2.1 What is string distance? 1

5.2.2 Fuzzy Matching-amatch, ain 1

5.2.3 Similarity Distances- stringdist, stringdistmatrix 1

5.3 LDA Topic Modeling Explained 1

5.3.2 Topic Modeling Case Study 1

5.3.2 LDA &LDAvis 1

5.4 Text to Vectors using “text2vec” 1

5.4.1 text2vec 1

5.5 Summary 1

Chapter 6: Document Classification: Finding Clickbait from Headlines 1

6.1 What is document classification? 1

6.2 Clickbait Case Study 1

6.2.2 Session & Data Set Up 1

6.2.3 GLMNET Training 1

6.2.4 GLMNET Test Predictions 1

6.2.5 Test Set Evaluation 1

6.2.6 Finding the most impactful words 1

6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1

6.3 Summary 1

Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1

7.1 Classification Vs Prediction 1

7.2 Case Study I: Will this patient come back to the hospital? 1

7.2.2 Patient Readmission in the Text Mining Workflow 1

7.2.3 Session & Data Set Up 1

7.2.4 Patient Modeling 1

7.2.5 More Model KPI: AUC, Recall, Precision & F1 1

7.2.5.1 Additional Evaluation Metrics 1

7.2.6 Apply the model to new patients 1

7.2.7 Patient Readmission Conclusion 1

7.3 Case Study II: Predicting Box Office Success 1

7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1

7.3.3 Session & Data Set Up 1

7.3.4 Opening Weekend Modeling 1

7.3.5 Model Evaluation 1

7.3.6 Apply the Model to new Movie Reviews 1

7.3.7 Movie Revenue Conclusion 1

7.4 Summary 1

Chapter 8: The OpenNLP Project 1

8.1 What is the OpenNLP project? 1

8.2 R’s OpenNLP Package 1

8.3 Named Entities in Hillary Clinton’s Email 1

8.3.1 R Session Set-up 1

8.3.2 Minor Text Cleaning 1

8.3.3 Using OpenNLP on a single email 1

8.3.4 Using OpenNLP on multiple documents 1

8.3.5 Revisiting the Text Mining Workflow 1

8.4 Analyzing the Named Entities 1

8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1

8.4.2 Mapping Only European Locations 1

8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1

8.4.4 Stock Charts for Entities 1

8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1

8.5 Summary 1

Chapter 9: Text Sources 1

9.1 Sourcing Text 1

9.2 Web Sources 1

9.2.1 Web Scraping a Single Page with rvest 1

9.2.2 Web Scraping Multiple Pages with rvest 1

9.2.3 Application Program Interfaces (APIs) 1

9.2.4 Newspaper Articles from The Guardian Newspaper 1

9.2.5 Tweets using the “twitteR” Package 1

9.2.6 Calling an API without a dedicated R package 1

9.2.7 Using jsonlite to access the New York Times 1

9.2.8 Using RCurl & XML to Parse Google News Feeds 1

9.2.9 The tm library Web-Mining Plugin 1

9.3 Getting Text from File Sources 1

9.3.1 Individual CSV, TXT and Microsoft Office Files 1

9.3.2 Reading multiple files quickly 1

9.3.2 Extracting Text from PDFs 1

9.3.3 Optical Character Recognition: Extracting Text from Images 1

9.4 Summary 1

There are no comments on this title.

to post a comment.