Perl Regular Expressions Guide
Perl's regex engine is legendary. For over 35 years, it has been the gold standard for pattern matching, inspiring regex implementations in Python, JavaScript, Java, and countless other languages. When developers talk about "Perl-compatible regular expressions" (PCRE), they're acknowledging Perl's foundational role in modern text processing.
This guide covers everything from basic pattern matching to advanced techniques like lookahead assertions and non-greedy quantifiers. Whether you're parsing log files, validating user input, or transforming data formats, Perl regex gives you the power to solve complex text problems with elegant, concise code.
Why Perl Regex is Different
Unlike other languages where regex is an add-on library, Perl integrates regular expressions
directly into the language syntax. The binding operators (=~ and !~)
work seamlessly with Perl's control structures, making text processing feel natural
rather than bolted on.
In 2025, Perl jumped from 27th to 10th place in the TIOBE index, with text processing cited as a key reason. As the i-Programmer blog noted: "Even in this era of AI, everything is still governed by text formats; text is still the King. XML, JSON calling APIs, YAML, Markdown, Log files... Perl with its first-class-citizen regular expressions, the wealth of text manipulation libraries up on CPAN and its full Unicode support of all the latest standards, was and is still the best."
Pattern Matching Basics
Use the match operator (m// or simply //) to search strings
for patterns. The binding operator connects a variable to a pattern for matching.
# Basic pattern match
my $text = "Hello, World!";
if ($text =~ /World/) {
print "Found it!\n";
}
# Case-insensitive match with /i modifier
if ($text =~ /world/i) {
print "Found (case-insensitive)!\n";
}
# Negated match - true if pattern NOT found
if ($text !~ /Goodbye/) {
print "Goodbye not found\n";
}
The match operator returns true if the pattern is found, making it perfect for
conditionals and loops. Combined with Perl's default variable $_,
you can write extremely concise code for processing text files line by line.
Character Classes
Character classes match sets of characters. Perl provides predefined classes for common patterns like digits, whitespace, and word characters. You can also create custom character classes using square brackets.
# Predefined character classes
\d # Digit: [0-9]
\w # Word character: [a-zA-Z0-9_]
\s # Whitespace: [ \t\n\r\f]
\D # Non-digit: [^0-9]
\W # Non-word: [^a-zA-Z0-9_]
\S # Non-whitespace: [^ \t\n\r\f]
# Custom character classes
[aeiou] # Match any vowel
[^aeiou] # Match any non-vowel (negated)
[a-z] # Match lowercase letter
[A-Za-z] # Match any letter
[0-9a-fA-F] # Match hexadecimal digit
The uppercase versions (\D, \W, \S) are negated
versions of their lowercase counterparts. This symmetry makes Perl regex intuitive
once you understand the basic pattern.
Quantifiers
Quantifiers specify how many times a pattern element should match. Perl supports both greedy (match as much as possible) and non-greedy (match as little as possible) quantifiers.
# Greedy quantifiers (match as much as possible)
* # Zero or more
+ # One or more
? # Zero or one (optional)
{n} # Exactly n times
{n,} # At least n times
{n,m} # Between n and m times
# Non-greedy quantifiers (add ? after)
*? # Zero or more (minimal)
+? # One or more (minimal)
?? # Zero or one (minimal)
# Example: Greedy vs non-greedy
my $html = "<b>bold</b> and <i>italic</i>";
$html =~ /<.+>/; # Greedy: matches "<b>bold</b> and <i>italic</i>"
$html =~ /<.+?>/; # Non-greedy: matches "<b>" Understanding the difference between greedy and non-greedy matching is crucial for parsing structured text like HTML, XML, or log files. Greedy matching often captures more than intended, while non-greedy matching stops at the first valid match.
Capture Groups
Parentheses create capture groups that let you extract parts of a match. Captured
content is stored in numbered variables ($1, $2, etc.)
and can be referenced within the same regex using backreferences.
# Basic capture groups
my $date = "2025-01-06";
if ($date =~ /(\d{4})-(\d{2})-(\d{2})/) {
print "Year: $1, Month: $2, Day: $3\n";
}
# Named capture groups (Perl 5.10+)
if ($date =~ /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/) {
print "Year: $+{year}, Month: $+{month}, Day: $+{day}\n";
}
# Non-capturing groups (for grouping without capturing)
my $text = "color or colour";
if ($text =~ /colou?r/) {
print "Found color/colour\n";
}
Named capture groups ((?<name>...)) make your regex more readable
and maintainable. Instead of remembering that $3 is the day, you can
use $+{day} for self-documenting code.
Substitutions
The substitution operator (s///) finds and replaces text. It's one of
Perl's most powerful features for text transformation. Use modifiers for global
replacement, case-insensitive matching, and more.
# Basic substitution
my $text = "Hello, World!";
$text =~ s/World/Perl/;
print $text; # "Hello, Perl!"
# Global substitution with /g modifier
my $csv = "a,b,c,d";
$csv =~ s/,/ | /g;
print $csv; # "a | b | c | d"
# Case-insensitive with /i modifier
$text =~ s/perl/PERL/gi;
# Using capture groups in replacement
my $name = "Smith, John";
$name =~ s/(\w+), (\w+)/$2 $1/;
print $name; # "John Smith"
# Evaluate replacement as Perl code with /e modifier
my $prices = "Item costs $10 and $25";
$prices =~ s/\$(\d+)/"\$" . ($1 * 1.1)/ge;
print $prices; # "Item costs $11 and $27.5"
The /e modifier is particularly powerful, allowing you to execute
Perl code in the replacement string. This enables complex transformations that
would require multiple steps in other languages.
Lookahead and Lookbehind Assertions
Lookahead and lookbehind assertions match positions in the string without consuming characters. They're essential for complex pattern matching where you need context without including it in the match.
# Positive lookahead: (?=...)
# Match "foo" only if followed by "bar"
$text =~ /foo(?=bar)/;
# Negative lookahead: (?!...)
# Match "foo" only if NOT followed by "bar"
$text =~ /foo(?!bar)/;
# Positive lookbehind: (?<=...)
# Match "bar" only if preceded by "foo"
$text =~ /(?<=foo)bar/;
# Negative lookbehind: (?<!...)
# Match "bar" only if NOT preceded by "foo"
$text =~ /(?<!foo)bar/;
# Practical example: Add commas to numbers
my $num = "1234567890";
$num =~ s/(?<=\d)(?=(\d{3})+$)/,/g;
print $num; # "1,234,567,890" Lookahead and lookbehind are zero-width assertions, meaning they don't consume characters from the input string. This makes them perfect for inserting text at specific positions or validating context without affecting the match result.
Common Modifiers
Perl regex modifiers control how patterns are interpreted and matched. Here are the most commonly used modifiers:
/i # Case-insensitive matching
/g # Global - match all occurrences
/m # Multi-line mode - ^ and $ match line boundaries
/s # Single-line mode - . matches newlines
/x # Extended mode - allows whitespace and comments
/e # Evaluate replacement as Perl code (s/// only)
/r # Return modified string, don't modify original (Perl 5.14+)
# Extended mode example for readable regex
my $phone = "555-123-4567";
if ($phone =~ /
^ # Start of string
(\d{3}) # Area code
- # Separator
(\d{3}) # Exchange
- # Separator
(\d{4}) # Subscriber number
$ # End of string
/x) {
print "Valid phone: $1-$2-$3\n";
}
The /x modifier is particularly valuable for complex patterns. It
allows you to add whitespace and comments, making your regex self-documenting
and much easier to maintain.
Real-World Examples
Email Validation
my $email = 'user@example.com';
if ($email =~ /^[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}$/) {
print "Valid email\n";
} Log File Parsing
# Apache log format parsing
my $log = '192.168.1.1 - - [01/Jan/2025:12:00:00 +0000] "GET /page HTTP/1.1" 200 1234';
if ($log =~ /^([\d.]+) .* \[([^\]]+)\] "(\w+) ([^"]+)" (\d+) (\d+)/) {
my ($ip, $date, $method, $path, $status, $size) = ($1, $2, $3, $4, $5, $6);
print "IP: $ip, Status: $status\n";
} CSV Processing
# Handle quoted fields with commas
my $line = 'John,"Doe, Jr.",30,"New York"';
my @fields = $line =~ /("(?:[^"]|"")*"|[^,]*),?/g;
print join(" | ", @fields), "\n"; Performance Tips
Regex performance matters when processing large files or running patterns millions of times. Here are key optimization strategies:
- Anchor patterns - Use
^and$when possible to prevent unnecessary backtracking - Avoid catastrophic backtracking - Patterns like
(a+)+can cause exponential time complexity - Use atomic groups -
(?>...)prevents backtracking into the group - Precompile patterns - Use
qr//to compile regex once and reuse - Use
\K- Reset match start position for efficient substitutions
# Precompile for reuse
my $pattern = qr/\b\d{3}-\d{4}\b/;
while (<FILE>) {
print if /$pattern/;
}
# Use \K to simplify substitutions
$text =~ s/prefix\K-suffix/-replacement/;
# Equivalent to: s/(prefix)-suffix/$1-replacement/ Learn More
Explore our question database for specific regex patterns and techniques. Free Perl Code has hundreds of verified examples covering everything from basic matching to advanced text transformations.
- Search for specific regex patterns
- Regex topics for categorized tutorials
- All questions for regex examples
Need help with a specific regex problem? Use our AI chat assistant to get instant answers grounded in verified Perl code examples.