Reguläre Ausdrücke

Jeffrey E F Friedl (Autor)

Buch | Softcover

500 Seiten

2003 | 2., Aufl.
O'Reilly (Verlag)
978-3-89721-349-4 (ISBN)

Titel ist leider vergriffen;
keine Neuauflage

Artikel merken

Reguläre Ausdrücke sind ein leistungsstarkes Mittel zur Verarbeitung von Texten und Daten. Sie sind eine Schatzkiste für kreatives Programmieren und elegante Lösungen und sind mittlerweile standardmäßig in vielen Sprachen und Werkzeugen integriert.

Wenn Sie sie bisher noch nicht eingesetzt haben, wird Ihnen dieses Buch eine ganz neue Welt eröffnen. Aufgrund der ausgesprochen gründlichen und detaillierten Behandlung des Themas ist dieses Buch aber auch für Experten eine wahre Trouvaille. Die neue Auflage dieses anerkannten Standardwerks wurde vollständig überarbeitet und behandelt jetzt auch die neuen Regex-Eigenschaften seit Perl 5.6 und andere Sprachen wie Java, VB.NET, C#, Python, JavaScript, Ruby und Tcl.

Reguläre Ausdrücke sind leicht verfügbar, flexibel und sehr leistungsfähig. Dennoch bleibt ihre Anwendung oft unter ihren Möglichkeiten. Mit regulären Ausdrücken können Sie komplexe und subtile Textverarbeitungsprobleme lösen, von denen Sie vielleicht nie vermutet hätten, daß sie sich automatisieren lassen. Reguläre Ausdrücke ersparen Ihnen Arbeit und Ärger, und viele Probleme lassen sich mit ihnen auf elegante Weise lösen.

Was in der Hand von Experten eine sehr nützliche Fähigkeit ist, kann sich als Stolperstein für Ungeübte herausstellen. Dieses Buch zeigt einen Weg durch das unwägbare Gebiet und hilft Ihnen, selbst Experte zu werden. Wenn Sie die regulären Ausdrücke beherrschen, werden sie zu einem unverzichtbaren Teil Ihres Werkzeugkastens. Sie werden sich fragen, wie Sie je ohne sie arbeiten konnten.

Diese 2. Auflage von Reguläre Ausdrücke wurde gründlich überarbeitet: Sie behandelt jetzt alle Neuerungen von Perl Version 5.8 und von verschiedenen anderen Programmiersprachen insbesondere Java und VB.NET, aber auch C#, Python, JavaScript, Tcl und Ruby. Der klare und unterhaltsame Stil des Buchs hat schon Tausenden von Programmierern das an sich trockene Thema nähergebracht, und mit den vielen Beispielen zu Problemen aus dem Programmieralltag ist Reguläre Ausdrücke eine praktische Hilfe bei der täglichen Arbeit.

Über den Autor:

Jeffrey E. F. Friedl ist auf dem Land, in Rootstown, Ohio, aufgewachsen und wollte eigentlich Astronom werden, bis er in einer Ecke des Chemielabors einen unbenutzten TRS-80 Model I entdeckte (mit vollen 16 KB RAM, immerhin). 1980 entdeckte er Unix und damit die regulären Ausdrücke. Nach Abschlüssen in Computer Science (Informatik) an der Kent State University (B. S.) und der University of New Hampshire (M. S.) arbeitete er als Ingenieur bei der Omron Corporation, Kyoto, Japan. 1997 zog es ihn ins Silicon Valley, wo er sein Regex-Know-how für die Finanz-Nachrichten einer kaum bekannten Firma namens Yahoo! einsetzt.

Wenn es um das schwierige Problem geht, was er mit all seiner freien Zeit anfangen soll, spielt Jeffrey mit Kollegen von Yahoo! Frisbee oder Basketball, programmiert die Gadgets in seinem Haus und füttert die Vögel und Eichhörnchen im Garten. Viel Zeit verbringt er mit seiner Frau Fumie und ihrem gemeinsamen „Software-Projekt“ Anthony.

Über den Übersetzer:

Andreas Karrer wurde 1957 bei Zürich geboren und lebt in Zürich. Nach zehn Jahren Chemie (ETH Zürich bis zum Doktorat, Post-Doc in den USA, Max-Planck-Institut für Biochemie) arbeitete er mehr als zehn Jahre als Unix- und Netzwerk-Systemadministrator an der ETH. 1997 wurde er von O’Reilly für die Übersetzung der 1. Auflage von Reguläre Ausdrücke gewonnen und hat seither ein paar weitere Bücher über Perl und Unix übersetzt.

Table of Content (der englischen Originalausgabe)

1: Introduction to Regular Expressions

Solving Real Problems

Regular Expressions as a Language

The Filename Analogy

The Language Analogy

The Regular-Expression Frame of Mind

If You Have Some Regular-Expression Experience

Searching Text Files: Egrep

Egrep Metacharacters

Start and End of the Line

Character Classes

Matching Any Character with Dot

Alternation

Ignoring Differences in Capitalization

Word Boundaries

In a Nutshell

Optional Items

Other Quantifiers: Repetition

Parentheses and Backreferences

The Great Escape

Expanding the Foundation

Linguistic Diversification

The Goal of a Regular Expression

A Few More Examples

Regular Expression Nomenclature

Improving on the Status Quo

Summary

Personal Glimpses

2: Extended Introductory Examples

About the Examples

A Short Introduction to Perl

Matching Text with Regular Expressions

Toward a More Real-World Example

Side Effects of a Successful Match

Intertwined Regular Expressions

Intermission

Modifying Text with Regular Expressions

Example: Form Letter

Example: Prettifying a Stock Price

Automated Editing

A Small Mail Utility

Adding Commas to a Number with Lookaround

Text-to-HTML Conversion

That Doubled-Word Thing

3: Overview of Regular Expression Features and Flavors

A Casual Stroll Across the Regex Landscape

The Origins of Regular Expressions

At a Glance

Care and Handling of Regular Expressions

Integrated Handling

Procedural and Object-Oriented Handling

A Search-and-Replace Example

Search and Replace in Other Languages

Care and Handling: Summary

Strings, Character Encodings, and Modes

Strings as Regular Expressions

Character-Encoding Issues

Regex Modes and Match Modes

Common Metacharacters and Features

Character Representations

Character Classes and Class-Like Constructs

Anchors and Other Zero-Width Assertions

Comments and Mode Modifiers

Grouping, Capturing, Conditionals, and Control

Guide to the Advanced Chapters

4: The Mechanics of Expression Processing

Start Your Engines!

Two Kinds of Engines

New Standards

Regex Engine Types

From the Department of Redundancy Department

Testing the Engine Type

Match Basics

About the Examples

Rule 1: The Match That Begins Earliest Wins

Engine Pieces and Parts

Rule 2: The Standard Quantifiers Are Greedy

Regex-Directed Versus Text-Directed

NFA Engine: Regex-Directed

DFA Engine: Text-Directed

First Thoughts: NFA and DFA in Comparison

Backtracking

A Really Crummy Analogy

Two Important Points on Backtracking

Saved States

Backtracking and Greediness

More About Greediness and Backtracking

Problems of Greediness

Multi-Character Quotes

Using Lazy Quantifiers

Greediness and Laziness Always Favor a Match

The Essence of Greediness, Laziness, and Backtracking

Possessive Quantifiers and Atomic Grouping

Possessive Quantifiers, ?+, ++, ++, and {m,n}+

The Backtracking of Lookaround

Is Alternation Greedy?

Taking Advantage of Ordered Alternation

NFA, DFA, and POS

"The Longest-Leftmost"

POSIX and the Longest-Leftmost Rule

Speed and Efficiency

Summary: NFA and DFA in Comparison

Summary

5: Practical Regex Techniques

Regex Balancing Act

A Few Short Examples

Continuing with Continuation Lines

Matching an IP Address

Working with Filenames

Matching Balanced Sets of Parentheses

Watching Out for Unwanted Matches

Matching Delimited Text

Knowing Your Data and Making Assumptions

Stripping Leading and Trailing Whitespace

HTML-Related Examples

Matching an HTML Tag

Matching an HTML Link

Examining an HTTP URL

Validating a Hostname

Plucking Out a URL in the Real World

Extended Examples

Keeping in Sync with Your Data

Parsing CSV Files

6: Crafting an Efficient Expression

A Sobering Example

A Simple Change Placing Your Best Foot Forward

Efficiency Verses Correctness

Advancing Further Localizing the Greediness

Reality Check

A Global View of Backtracking

More Work for a POSIX NFA

Work Required During a Non-Match

Being More Specific

Alternation Can Be Expensive

Benchmarking

Know What You're Measuring

Benchmarking with Java

Benchmarking with VB.NET

Benchmarking with Python

Benchmarking with Ruby

Benchmarking with Tcl

Common Optimizations

No Free Lunch

Everyone's Lunch is Different

The Mechanics of Regex Application

Pre-Application Optimizations

Optimizations with the Transmission

Optimizations of the Regex Itself

Techniques for Faster Expressions

Common Sense Techniques

Expose Literal Text

Expose Anchors

Lazy Versus Greedy: Be Specific

Split Into Multiple Regular Expressions

Mimic Initial-Character Discrimination

Use Atomic Grouping and Possessive Quantifiers

Lead the Engine to a Match

Unrolling the Loop

Method 1: Building a Regex From Past Experiences

The Real Unrolling-the-Loop Pattern

Method 2: A Top-Down View

Method 3: An Internet Hostname

Observations

Using Atomic Grouping and Possessive Quantifiers

Short Unrolling Examples

Unrolling C Comments

The Freeflowing Regex

A Helping Hand to Guide the Match

A Well-Guided Regex is a Fast Regex

Wrapup

In Summary: Think!

7: Perl

Regular Expressions as a Language Component

Perl's Greatest Strength

Perl's Greatest Weakness

Perl's Regex Flavor

Regex Operands and Regex Literals

How Regex Literals Are Parsed

Regex Modifiers

Regex-Related Perlisms

Expression Context

Dynamic Scope and Regex Match Effects

Special Variables Modified by a Match

The qr/-/ Operator and Regex Objects

Building and Using Regex Objects

Viewing Regex Objects

Using Regex Objects for Efficiency

The Match Operator

Match's Regex Operand

Specifying the Match Target Operand

Different Uses of the Match Operator

Iterative Matching: Scalar Context, with /g

The Match Operators Environmental Relations

The Substitution Operator

The Replacement Operand

The /e Modifier

Context and Return Value

The Split Operator

Basic Split

Returning Empty Elements

Splits Special Regex Operands

Splits Match Operand with Capturing Parentheses

Fun with Perl Enhancements

Using a Dynamic Regex to Match Nested Pairs

Using the Embedded-Code Construct

Using local in an Embedded-Code Construct

A Warning About Embedded Code and my Variables

Matching Nested Constructs with Embedded Code

Overloading Regex Literals

Problems with Regex-Literal Overloading

Mimicking Named Capture

Perl Efficiency Issues

"There's More Than One Way to Do It"

Regex Compilation, the /o Modifier, qr/-/, and Efficiency

Understanding the Pre-Match Copy

The Study Function

Benchmarking

Regex Debugging Information

Final Comments

8: Java

Judging a Regex Package

Technical Issues

Social and Political Issues

Object Models

A Few Abstract Object Models

Growing Complexity

Packages, Packages, Packages

Why So Many Perl5 Flavors?

Lies, Damn Lies, and Benchmarks

Recommendations

Sun's Regex Package

Regex Flavor

Using java.util.regex

The Pattern.compile() Factory

The Matcher Object

Other Pattern Methods

A Quick Look at Jakarta-ORO
ORO's Perl5Util

A Mini Perl5Util Reference

Using ORO's Underlying Classes

9: .NET

.NET's Regex Flavor

Additional Comments on the Flavor

Using .NET Regular Expressions

Regex Quickstart

Package Overview

Core Object Overview

Core Object Details

Creating Regex Objects

Using Regex Objects

Using Match Objects

Using Group Objects

Static Convenience Functions

Regex Caching

Support Functions

Advanced .NET

Regex Assemblies

Matching Nested Constructs

Capture Objects

Index

Übersetzer	Andreas Karrer
Sprache	deutsch
Einbandart	kartoniert
Themenwelt	Mathematik / Informatik ► Informatik
Schlagworte	HC/Informatik, EDV/Programmiersprachen • Java • JavaScript • Perl • Programmierung • TB/Informatik, EDV/Informatik
ISBN-10	3-89721-349-4 / 3897213494
ISBN-13	978-3-89721-349-4 / 9783897213494
Zustand	Neuware