Go Web Scraping Quick Start Guide - Vincent Smith

Blick ins Buch

Go Web Scraping Quick Start Guide (eBook)

Implement the power of Go to scrape and crawl data from the web

Vincent Smith (Autor)

eBook Download: EPUB

2019
132 Seiten
Packt Publishing (Verlag)
978-1-78961-294-3 (ISBN)

Lese- und Medienproben

Ebook-Leseprobe (EPUB)

Learn how some Go-specific language features help to simplify building web scrapers along with common pitfalls and best practices regarding web scraping.

Key Features

Use Go libraries like Goquery and Colly to scrape the web

Common pitfalls and best practices to effectively scrape and crawl

Learn how to scrape using the Go concurrency model

Book Description

Web scraping is the process of extracting information from the web using various tools that perform scraping and crawling. Go is emerging as the language of choice for scraping using a variety of libraries. This book will quickly explain to you, how to scrape data data from various websites using Go libraries such as Colly and Goquery.

The book starts with an introduction to the use cases of building a web scraper and the main features of the Go programming language, along with setting up a Go environment. It then moves on to HTTP requests and responses and talks about how Go handles them. You will also learn about a number of basic web scraping etiquettes.

You will be taught how to navigate through a website, using a breadth-first and then a depth-first search, as well as find and follow links. You will get to know about the ways to track history in order to avoid loops and to protect your web scraper using proxies.

Finally the book will cover the Go concurrency model, and how to run scrapers in parallel, along with large-scale distributed web scraping.

What you will learn

Implement Cache-Control to avoid unnecessary network calls

Coordinate concurrent scrapers

Design a custom, larger-scale scraping system

Scrape basic HTML pages with Colly and JavaScript pages with chromedp

Discover how to search using the 'strings' and 'regexp' packages

Set up a Go development environment

Retrieve information from an HTML document

Protect your web scraper from being blocked by using proxies

Control web browsers to scrape JavaScript sites

Who this book is for

Data scientists, and web developers with a basic knowledge of Golang wanting to collect web data and analyze them for effective reporting and visualization.

Learn how some Go-specific language features help to simplify building web scrapers along with common pitfalls and best practices regarding web scraping.Key FeaturesUse Go libraries like Goquery and Colly to scrape the webCommon pitfalls and best practices to effectively scrape and crawlLearn how to scrape using the Go concurrency modelBook DescriptionWeb scraping is the process of extracting information from the web using various tools that perform scraping and crawling. Go is emerging as the language of choice for scraping using a variety of libraries. This book will quickly explain to you, how to scrape data data from various websites using Go libraries such as Colly and Goquery.The book starts with an introduction to the use cases of building a web scraper and the main features of the Go programming language, along with setting up a Go environment. It then moves on to HTTP requests and responses and talks about how Go handles them. You will also learn about a number of basic web scraping etiquettes.You will be taught how to navigate through a website, using a breadth-first and then a depth-first search, as well as find and follow links. You will get to know about the ways to track history in order to avoid loops and to protect your web scraper using proxies.Finally the book will cover the Go concurrency model, and how to run scrapers in parallel, along with large-scale distributed web scraping.What you will learnImplement Cache-Control to avoid unnecessary network callsCoordinate concurrent scrapersDesign a custom, larger-scale scraping systemScrape basic HTML pages with Colly and JavaScript pages with chromedpDiscover how to search using the "e;strings"e; and "e;regexp"e; packagesSet up a Go development environmentRetrieve information from an HTML documentProtect your web scraper from being blocked by using proxiesControl web browsers to scrape JavaScript sitesWho this book is forData scientists, and web developers with a basic knowledge of Golang wanting to collect web data and analyze them for effective reporting and visualization.

Erscheint lt. Verlag	30.1.2019
Sprache	englisch
Themenwelt	Sachbuch/Ratgeber ► Freizeit / Hobby ► Sammeln / Sammlerkataloge
Themenwelt	Mathematik / Informatik ► Informatik ► Theorie / Studium
Schlagworte	Colly • Concurrency • Go • golang • Goquery • Web Scraping
ISBN-10	1-78961-294-2 / 1789612942
ISBN-13	978-1-78961-294-3 / 9781789612943

Haben Sie eine Frage zum Produkt?

EPUB (Adobe DRM)
Größe: 2,9 MB

Kopierschutz: Adobe-DRM
Adobe-DRM ist ein Kopierschutz, der das eBook vor Mißbrauch schützen soll. Dabei wird das eBook bereits beim Download auf Ihre persönliche Adobe-ID autorisiert. Lesen können Sie das eBook dann nur auf den Geräten, welche ebenfalls auf Ihre Adobe-ID registriert sind.
Details zum Adobe-DRM

Dateiformat: EPUB (Electronic Publication)
EPUB ist ein offener Standard für eBooks und eignet sich besonders zur Darstellung von Belletristik und Sachbüchern. Der Fließtext wird dynamisch an die Display- und Schriftgröße angepasst. Auch für mobile Lesegeräte ist EPUB daher gut geeignet.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen eine Adobe-ID und die Software Adobe Digital Editions (kostenlos). Von der Benutzung der OverDrive Media Console raten wir Ihnen ab. Erfahrungsgemäß treten hier gehäuft Probleme mit dem Adobe DRM auf.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen eine Adobe-ID sowie eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.

EPUB (Ohne DRM)

Digital Rights Management: ohne DRM
Dieses eBook enthält kein DRM oder Kopierschutz. Eine Weitergabe an Dritte ist jedoch rechtlich nicht zulässig, weil Sie beim Kauf nur die Rechte an der persönlichen Nutzung erwerben.

Systemvoraussetzungen:
PC/Mac: Mit einem PC oder Mac können Sie dieses eBook lesen. Sie benötigen dafür die kostenlose Software Adobe Digital Editions.
eReader: Dieses eBook kann mit (fast) allen eBook-Readern gelesen werden. Mit dem amazon-Kindle ist es aber nicht kompatibel.
Smartphone/Tablet: Egal ob Apple oder Android, dieses eBook können Sie lesen. Sie benötigen dafür eine kostenlose App.
Geräteliste und zusätzliche Hinweise

Buying eBooks from abroad
For tax law reasons we can sell eBooks just within Germany and Switzerland. Regrettably we cannot fulfill eBook-orders from other countries.