Perl & LWP -  Sean M. Burke

Perl & LWP

Buch | Softcover
245 Seiten
2002
O'Reilly Media (Verlag)
978-0-596-00178-0 (ISBN)
35,90 inkl. MwSt
This text covers topics including: understanding LWP and its design; fetching and analyzing URLs; extracting information from HTML using regular expressions and tokens; working with the structure of HTML documents using trees; and setting and inspecting HTTP headers and response codes.
Perl soared to popularity as a language for creating and managing web content, but with LWP (Library for WWW in Perl), Perl is equally adept at consuming information on the Web. LWP is a suite of modules for fetching and processing web pages. The Web is a vast data source that contains everything from stock prices to movie credits, and with LWP all that data is just a few lines of code away. Anything you do on the Web, whether it's buying or selling, reading or writing, uploading or downloading, news to e-commerce, can be controlled with Perl and LWP. You can automate Web-based purchase orders as easily as you can set up a program to download MP3 files from a web site.
Perl & LWP covers: *Understanding LWP and its design *Fetching and analyzing URLs *Extracting information from HTML using regular expressions and tokens *Working with the structure of HTML documents using trees *Setting and inspecting HTTP headers and response codes *Managing cookies *Accessing information that requires authentication *Extracting links *Cooperating with proxy caches *Writing web spiders (also known as robots) in a safe fashion Perl & LWP includes many step-by-step examples that show how to apply the various techniques. Programs to extract information from the web sites of BBC News, Altavista, ABEBooks.com, and the Weather Underground, to name just a few, are explained in detail, so that you understand how and why they work. Perl programmers who want to automate and mine the web can pick up this book and be immediately productive. Written by a contributor to LWP, and with a foreword by one of LWP's creators, Perl & LWP is the authoritative guide to this powerful and popular toolkit.

Sean Burke is an active member in the Perl community and one of CPAN's most prolific module authors. He has been a columnist for The Perl Journal since 1998, and is an authority on markup languages. Trained as a linguist, he also develops tools for software internationalization and Native language preservation.

Foreword Preface 1. Introduction to Web Automation The Web as Data Source History of LWP Installing LWP Words of Caution LWP in Action 2. Web Basics URLs An HTTP Transaction LWP::Simple Fetching Documents Without LWP::Simple Example: AltaVista HTTP POST Example: Babelfish 3. The LWP Class Model The Basic Classes Programming with LWP Classes Inside the do_GET and do_POST Functions User Agents HTTP::Response Objects LWP Classes: Behind the Scenes 4. URLs Parsing URLs Relative URLs Converting Absolute URLs to Relative Converting Relative URLs to Absolute 5. Forms Elements of an HTML Form LWP and GET Requests Automating Form Analysis Idiosyncrasies of HTML Forms POST Example: License Plates POST Example: ABEBooks.com File Uploads Limits on Forms 6. Simple HTML Processing with Regular Expressions Automating Data Extraction Regular Expression Techniques Troubleshooting When Regular Expressions Aren't Enough Example: Extracting Links from a Bookmark File Example: Extracting Links from Arbitrary HTML Example: Extracting Temperatures from Weather Underground 7. HTML Processing with Tokens HTML as Tokens Basic HTML::TokeParser Use Individual Tokens Token Sequences More HTML::TokeParser Methods Using Extracted Text 8. Tokenizing Walkthrough The Problem Getting the Data Inspecting the HTML First Code Narrowing In Rewrite for Features Alternatives 9. HTML Processing with Trees Introduction to Trees HTML::TreeBuilder Processing Example: BBC News Example: Fresh Air 10. Modifying HTML with Trees Changing Attributes Deleting Images Detaching and Reattaching Attaching in Another Tree Creating New Elements 11. Cookies, Authentication, and Advanced Requests Cookies Adding Extra Request Header Lines Authentication An HTTP Authentication Example: The Unicode Mailing Archive 12. Spiders Types of Web-Querying Programs A User Agent for Robots Example: A Link-Checking Spider Ideas for Further Expansion A. LWP Modules B. HTTP Status Codes C. Common MIME Types D. Language Tags E. Common Content Encodings F. ASCII Table G. User's View of Object-Oriented Modules Index

Erscheint lt. Verlag 30.7.2002
Verlagsort Sebastopol
Sprache englisch
Einbandart kartoniert
Themenwelt Informatik Programmiersprachen / -werkzeuge Perl
ISBN-10 0-596-00178-9 / 0596001789
ISBN-13 978-0-596-00178-0 / 9780596001780
Zustand Neuware
Haben Sie eine Frage zum Produkt?
Mehr entdecken
aus dem Bereich
Keeping the Easy, Hard, and Impossible Within Reach

von Brian D. Foy

Buch | Softcover (2018)
O'Reilly Media (Verlag)
53,85