What makes Perl regex different from other languages?

Perl integrates regular expressions directly into the language syntax with binding operators (=~ and !~), making text processing feel natural. Perl's regex engine inspired PCRE (Perl-Compatible Regular Expressions) used in Python, JavaScript, and many other languages.

What is the difference between greedy and non-greedy quantifiers?

Greedy quantifiers (*, +, ?) match as much as possible, while non-greedy quantifiers (*?, +?, ??) match as little as possible. Add ? after a greedy quantifier to make it non-greedy. This is crucial for parsing structured text like HTML.

What are lookahead and lookbehind assertions?

Lookahead (?=...) and lookbehind (?<=...) are zero-width assertions that match positions without consuming characters. Positive versions check if a pattern exists, negative versions (?!...) and (?<!...) check if it doesn't exist.

Perl Regular Expressions Guide

Q: How do capture groups work in Perl regex?

Parentheses create capture groups that extract parts of a match. Captured content is stored in numbered variables ($1, $2, etc.). Perl 5.10+ also supports named capture groups with (? ...) syntax, accessed via $+{name}.

Perl's regex engine is legendary. For over 35 years, it has been the gold standard for pattern matching, inspiring regex implementations in Python, JavaScript, Java, and countless other languages. When developers talk about "Perl-compatible regular expressions" (PCRE), they're acknowledging Perl's foundational role in modern text processing.

This guide covers everything from basic pattern matching to advanced techniques like lookahead assertions and non-greedy quantifiers. Whether you're parsing log files, validating user input, or transforming data formats, Perl regex gives you the power to solve complex text problems with elegant, concise code.

Why Perl Regex is Different

Unlike other languages where regex is an add-on library, Perl integrates regular expressions directly into the language syntax. The binding operators (=~ and !~) work seamlessly with Perl's control structures, making text processing feel natural rather than bolted on.

In 2025, Perl jumped from 27th to 10th place in the TIOBE index, with text processing cited as a key reason. As the i-Programmer blog noted: "Even in this era of AI, everything is still governed by text formats; text is still the King. XML, JSON calling APIs, YAML, Markdown, Log files... Perl with its first-class-citizen regular expressions, the wealth of text manipulation libraries up on CPAN and its full Unicode support of all the latest standards, was and is still the best."

Pattern Matching Basics

Use the match operator (m// or simply //) to search strings for patterns. The binding operator connects a variable to a pattern for matching.

# Basic pattern match
my $text = "Hello, World!";
if ($text =~ /World/) {
    print "Found it!\n";
}

# Case-insensitive match with /i modifier
if ($text =~ /world/i) {
    print "Found (case-insensitive)!\n";
}

# Negated match - true if pattern NOT found
if ($text !~ /Goodbye/) {
    print "Goodbye not found\n";
}

The match operator returns true if the pattern is found, making it perfect for conditionals and loops. Combined with Perl's default variable $_, you can write extremely concise code for processing text files line by line.

Character Classes

Character classes match sets of characters. Perl provides predefined classes for common patterns like digits, whitespace, and word characters. You can also create custom character classes using square brackets.

# Predefined character classes
\d  # Digit: [0-9]
\w  # Word character: [a-zA-Z0-9_]
\s  # Whitespace: [ \t\n\r\f]
\D  # Non-digit: [^0-9]
\W  # Non-word: [^a-zA-Z0-9_]
\S  # Non-whitespace: [^ \t\n\r\f]

# Custom character classes
[aeiou]     # Match any vowel
[^aeiou]    # Match any non-vowel (negated)
[a-z]       # Match lowercase letter
[A-Za-z]    # Match any letter
[0-9a-fA-F] # Match hexadecimal digit

The uppercase versions (\D, \W, \S) are negated versions of their lowercase counterparts. This symmetry makes Perl regex intuitive once you understand the basic pattern.

Quantifiers

Quantifiers specify how many times a pattern element should match. Perl supports both greedy (match as much as possible) and non-greedy (match as little as possible) quantifiers.

# Greedy quantifiers (match as much as possible)
*      # Zero or more
+      # One or more
?      # Zero or one (optional)
{n}    # Exactly n times
{n,}   # At least n times
{n,m}  # Between n and m times

# Non-greedy quantifiers (add ? after)
*?     # Zero or more (minimal)
+?     # One or more (minimal)
??     # Zero or one (minimal)

# Example: Greedy vs non-greedy
my $html = "<b>bold</b> and <i>italic</i>";
$html =~ /<.+>/;   # Greedy: matches "<b>bold</b> and <i>italic</i>"
$html =~ /<.+?>/;  # Non-greedy: matches "<b>"

Understanding the difference between greedy and non-greedy matching is crucial for parsing structured text like HTML, XML, or log files. Greedy matching often captures more than intended, while non-greedy matching stops at the first valid match.

Capture Groups

Parentheses create capture groups that let you extract parts of a match. Captured content is stored in numbered variables ($1, $2, etc.) and can be referenced within the same regex using backreferences.

# Basic capture groups
my $date = "2025-01-06";
if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
    print "Year: $1, Month: $2, Day: $3\n";
}

# Named capture groups (Perl 5.10+)
if ($date =~ /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/) {
    print "Year: $+{year}, Month: $+{month}, Day: $+{day}\n";
}

# Non-capturing groups (for grouping without capturing)
my $text = "color or colour";
if ($text =~ /colou?r/) {
    print "Found color/colour\n";
}

Named capture groups ((?<name>...)) make your regex more readable and maintainable. Instead of remembering that $3 is the day, you can use $+{day} for self-documenting code.

Substitutions

The substitution operator (s///) finds and replaces text. It's one of Perl's most powerful features for text transformation. Use modifiers for global replacement, case-insensitive matching, and more.

# Basic substitution
my $text = "Hello, World!";
$text =~ s/World/Perl/;
print $text;  # "Hello, Perl!"

# Global substitution with /g modifier
my $csv = "a,b,c,d";
$csv =~ s/,/ | /g;
print $csv;  # "a | b | c | d"

# Case-insensitive with /i modifier
$text =~ s/perl/PERL/gi;

# Using capture groups in replacement
my $name = "Smith, John";
$name =~ s/(\w+), (\w+)/$2 $1/;
print $name;  # "John Smith"

# Evaluate replacement as Perl code with /e modifier
my $prices = "Item costs $10 and $25";
$prices =~ s/\$(\d+)/"\$" . ($1 * 1.1)/ge;
print $prices;  # "Item costs $11 and $27.5"

The /e modifier is particularly powerful, allowing you to execute Perl code in the replacement string. This enables complex transformations that would require multiple steps in other languages.

Lookahead and Lookbehind Assertions

Lookahead and lookbehind assertions match positions in the string without consuming characters. They're essential for complex pattern matching where you need context without including it in the match.

# Positive lookahead: (?=...)
# Match "foo" only if followed by "bar"
$text =~ /foo(?=bar)/;

# Negative lookahead: (?!...)
# Match "foo" only if NOT followed by "bar"
$text =~ /foo(?!bar)/;

# Positive lookbehind: (?<=...)
# Match "bar" only if preceded by "foo"
$text =~ /(?<=foo)bar/;

# Negative lookbehind: (?<!...)
# Match "bar" only if NOT preceded by "foo"
$text =~ /(?<!foo)bar/;

# Practical example: Add commas to numbers
my $num = "1234567890";
$num =~ s/(?<=\d)(?=(\d{3})+$)/,/g;
print $num;  # "1,234,567,890"

Lookahead and lookbehind are zero-width assertions, meaning they don't consume characters from the input string. This makes them perfect for inserting text at specific positions or validating context without affecting the match result.

Common Modifiers

Perl regex modifiers control how patterns are interpreted and matched. Here are the most commonly used modifiers:

/i  # Case-insensitive matching
/g  # Global - match all occurrences
/m  # Multi-line mode - ^ and $ match line boundaries
/s  # Single-line mode - . matches newlines
/x  # Extended mode - allows whitespace and comments
/e  # Evaluate replacement as Perl code (s/// only)
/r  # Return modified string, don't modify original (Perl 5.14+)

# Extended mode example for readable regex
my $phone = "555-123-4567";
if ($phone =~ /
    ^           # Start of string
    (\d{3})     # Area code
    -           # Separator
    (\d{3})     # Exchange
    -           # Separator
    (\d{4})     # Subscriber number
    $           # End of string
/x) {
    print "Valid phone: $1-$2-$3\n";
}

The /x modifier is particularly valuable for complex patterns. It allows you to add whitespace and comments, making your regex self-documenting and much easier to maintain.

Real-World Examples

Email Validation

my $email = 'user@example.com';
if ($email =~ /^[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}$/) {
    print "Valid email\n";
}

Log File Parsing

# Apache log format parsing
my $log = '192.168.1.1 - - [01/Jan/2025:12:00:00 +0000] "GET /page HTTP/1.1" 200 1234';
if ($log =~ /^([\d.]+) .* \[([^\]]+)\] "(\w+) ([^"]+)" (\d+) (\d+)/) {
    my ($ip, $date, $method, $path, $status, $size) = ($1, $2, $3, $4, $5, $6);
    print "IP: $ip, Status: $status\n";
}

CSV Processing

# Handle quoted fields with commas
my $line = 'John,"Doe, Jr.",30,"New York"';
my @fields = $line =~ /("(?:[^"]|"")*"|[^,]*),?/g;
print join(" | ", @fields), "\n";

Performance Tips

Regex performance matters when processing large files or running patterns millions of times. Here are key optimization strategies:

Anchor patterns - Use ^ and $ when possible to prevent unnecessary backtracking
Avoid catastrophic backtracking - Patterns like (a+)+ can cause exponential time complexity
Use atomic groups - (?>...) prevents backtracking into the group
Precompile patterns - Use qr// to compile regex once and reuse
Use \K - Reset match start position for efficient substitutions

# Precompile for reuse
my $pattern = qr/\b\d{3}-\d{4}\b/;
while (<FILE>) {
    print if /$pattern/;
}

# Use \K to simplify substitutions
$text =~ s/prefix\K-suffix/-replacement/;
# Equivalent to: s/(prefix)-suffix/$1-replacement/

Learn More

Explore our question database for specific regex patterns and techniques. Free Perl Code has hundreds of verified examples covering everything from basic matching to advanced text transformations.

Search for specific regex patterns
Regex topics for categorized tutorials
All questions for regex examples

Need help with a specific regex problem? Use our AI chat assistant to get instant answers grounded in verified Perl code examples.