-
Ben Nadel
A little bit about me...
-
Chief Software Engineer, Epicenter
-
Author of The Blog of Ben Nadel
— www.bennadel.com
-
Adobe Community Profession
-
Adobe Certified ColdFusion Developer
-
Co-Manager New York CFUG
-
ColdFusion, XHTML, CSS, jQuery
-
First Things First...
Regular Expressions Are
Awesome!
-
What Are Regular Expressions?
-
A way of describing patterns in text
-
What Can We Do With Regular Expressions?
-
Gather text
-
Replace / Transform text
-
Search / Validate text
-
Where Can We Use Regular Expressions?
Everywhere!
-
ColdFusion Gives Us Three Options
-
Native ColdFusion RegEx Engine
-
reFind(), reReplace(), reMatch(), CFParam, CFProperty
-
Java RegEx Engine
-
.NET RegEx Engine
-
Not All Engines Are Created Equal
-
Each engine is a "flavor" of Regular Expressions
-
Javascript « ColdFusion « Java
-
Before We Get Technical
Regular Expressions Are
NOT
Meant To Be Read
-
Basic Regular Expression Components
-
Character Literals
-
Special Characters / Metacharacters
-
Character Classes
-
Short-Hand Classes
-
Non-printable Characters and Anchors
-
Quantifiers
-
Alternation And Grouping
-
Character Literals
-
Most characters match themselves
-
A matches "A"
-
B matches "B"
-
ColdFusion matches "ColdFusion"
-
Ooops, I did it again! matches "Ooops, I did it again!"
-
Character Literal Examples
ben
matches
I like watching benevolent Ben benchpress.
-
Character Literal Examples
word.
matches
I like cools words. Zoftig is a cool word.
-
Special Characters / Metacharacters
-
Most characters are literals
-
About ~13 are special
-
[
\
^
$
.
|
?
*
+
{
}
(
)
-
These can be escaped with \
-
Special Character Examples
\$9\.95
matches
That burrito costs $9.95. Delicious!
-
Special Character Examples
c:\\app\\log\.txt
matches
The log file is located at c:\app\log.txt.
-
Important Note On Escaping "\"
-
Some languages see "\" as a special string character
-
ColdFusion does NOT
-
"\$12\.95"
-
"C:\\ColdFusion9\\"
-
Javascript DOES
-
"\\$12\\.95"
-
"C:\\\\ColdFusion9\\\\"
-
Character Classes / Sets
-
[ ... ] defines a set of characters
-
[aeiou] - Matches any vowel
-
[^ ... ] defines a negated set of characters
-
[^aeiou] - Matches anything but a vowel
-
Can define character ranges using dash
-
[a-zA-Z] - Matches any letter
-
[^0-9] - Matches anything but a digit
-
Character Class Examples
[0-9\-]
matches
Give me a call - my number is 917-555-1234.
-
Character Class Examples
[^0-9.]
matches
$1,234,567.89
-
Character Class Joins
-
Set Union
-
Set Intersection
-
[a-z&&[d-f]] - Matches d through f
-
Set Subtraction
-
[a-z&&[^d-f]] - Matches a through c, g through z
-
Short-Hand Classes
-
Match one of several characters
-
. - Any character except new-line*
-
\w - Word character, [A-Za-z0-9_]
-
\d - Digit character, [0-9]
-
\s - Space character, [ \t\r\n]
-
Do NOT match one of several characters
-
\W - Anything but a word character, [^A-Za-z0-9_]
-
\D - Anything but a digit character, [^0-9]
-
\S - Anything but a space character, [^ \t\r\n]
-
Short-Hand Class Examples
\d\d-\d\d-\d\d\d\d
matches
I was born on 09-21-1980 - go Virgos!
-
Short-Hand Class Examples
[^\d.]
matches
$1,234,567.89
-
Short-Hand Class Examples
[\w\W]
matches
Johnny 5 is Alive!
-
POSIX Character Classes
-
Native ColdFusion support
-
[:alpha:]
-
[:upper:]
-
[:lower:]
-
[:digit:]
-
[:alnum:]
-
[:xdigit:]
-
[:blank:]
-
[:space:]
-
[:print:]
-
[:punct:]
-
[:graph:]
-
[:cntrl:]
-
[:word:]
-
[:ascii:]
-
Java support
-
\p{Lower}
-
\p{Upper}
-
\p{ASCII}
-
\p{Alpha}
-
\p{Digit}
-
\p{Alnum}
-
\p{Punct} - !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
-
\p{Graph}
-
\p{Print}
-
\p{Blank}
-
\p{Cntrl}
-
\p{XDigit}
-
\p{Space}
-
Non-printable Characters And Anchors
-
^ - Matches beginning of string (or line*)
-
\A - Always matches beginning of string.
-
$ - Matches end of string (or line*)
-
\Z - Always matches end of string.
-
\b - Matches a word-boundary
-
Anchor Examples
^[a-z]
matches
are you feeling alright, Lucy?
-
Anchor Examples
cfc$
matches
/demo/contacts/cfcs/api.cfc
-
Anchor Examples
\bCan
matches
Can you dance the CanCan?
-
Quantifiers - How Much To Match?
-
A* - Zero or more.
-
A+ - One or more.
-
A? - Zero or one (ie. optional).
-
A{N} - N matches.
-
A{N,} - N or more matches.
-
A{N,M} - Betweeen N and M matches.
-
Quantifier Examples
<\w+>
matches
Say it with <em>style</em>!
-
Quantifier Examples
[Dd]ogs?
matches
Is that a dog? Dogs are cool!
-
Quantifier Examples
\d{2,4}
matches
If I were born on 03/01/2009, I'd be 2.
-
Quantifiers Are Greedy!
-
They try to match as much as possible.
-
Sometimes, being lazy (reluctant) is better ... ? ...:
-
A*?
-
A+?
-
A??
-
A{N,M}?
-
... etc.
-
Lazy Quantifier Examples
<.+>
matches
Hey there <em>baby cakes</em>!
-
Lazy Quantifier Examples
<.+?>
matches
Hey there <em>baby cakes</em>!
-
Alternation And Grouping
-
| - Alternation, ie. this "or" that
-
( ... ) - Grouping
-
Quantifiers can be applied to groups!!
-
Alternation And Grouping Examples
color|colour
matches
That color really brings out your eyes.
-
Alternation And Grouping Examples
(like|love) you
matches
Joanna, it hurts how much I love you.
-
Alternation And Grouping Examples
(na){2}
matches
Anna is bonkers for bananas!
-
Grouping And Back References
-
Captures groups create back references
-
Can be used in patterns - \N *
-
Can be used in replace - $N *
-
Back Reference Examples
b(an)\1as
matches
I like bananas!
-
Back Reference Examples
n(an)a|b(an)\2as
matches
My nana loves bananas!
-
Back Reference Examples
b(an)\1+as
matches
That's just banananananananananas!
-
Rock On With Your Bad Self!
You Just Learned
80%
Of RegEx Functionality
-
Time For A Little Practice
-
Imagine that we need to validate an employee ID
-
Ex: HR-20080118-M-1234
-
HR - Department
-
HR - Human Resources
-
SM - Sales & Marketing
-
D - Development
-
20080118 - Date employee joined company
-
M - Gender (M or F)
-
1234 - Auto-incrementing value
-
Validate: HR-20080118-M-1234
HR-20080118-M-1234
-
Validate: HR-20080118-M-1234
^HR-20080118-M-1234$
Make sure we're validating entire input string
-
Validate: HR-20080118-M-1234
^(HR|SM|D)-20080118-M-1234$
Allow for each known department abbreviation
-
Validate: HR-20080118-M-1234
^(HR|SM|D)-\d{8}-M-1234$
Allow for 8 digits for the date YYYYMMDD
-
Validate: HR-20080118-M-1234
^(HR|SM|D)-\d{8}-[MF]-1234$
Allow for M (Male) or F (Female)
-
Validate: HR-20080118-M-1234
^(HR|SM|D)-\d{8}-[MF]-\d+$
Allow for an auto-incremented value
-
Validate: HR-20080118-M-1234
^(HR|SM|D)-\d{8}-[MF]-\d+$
That wasn't so bad :)
-
Time For A Little More Practice
-
Imagine that we need to validate a 10-digit phone number
-
Ex: (212) 555-1234
-
But, we want to be pretty flexible
-
(212) 555.1234
-
(212) 555 1234
-
212-555-1234
-
212.555.1234
-
212 555 1234
-
2125551234
-
Validate: (212) 555-1234
(212) 555-1234
-
Validate: (212) 555-1234
^(212) 555-1234$
Make sure we're validating entire input string
-
Validate: (212) 555-1234
^([1-9]12) 555-1234$
Make sure the number can't start with zero (operator)
-
Validate: (212) 555-1234
^([1-9]\d{2}) \d{3}-\d{4}$
Allow for the remaining 9 digits
-
Validate: (212) 555-1234
^([1-9]\d{2})[ .\-]?\d{3}[ .\-]?\d{4}$
Allow for optional separators
-
Validate: (212) 555-1234
^\(?[1-9]\d{2}\)?[ .\-]?\d{3}[ .\-]?\d{4}$
Allow for optional parenthesis
-
Validate: (212) 555-1234
^\(?[1-9]\d{2}\)?[ .\-]?\d{3}[ .\-]?\d{4}$
Like I said - these are NOT meant to be read!
-
Verbose Mode - Making Complex Patterns More Awesome
-
Patterns are not fun to read
-
Verbose mode allows white-space and documentation
-
(?x) - Verbose flag
-
Phone Number Validation In Verbose Mode
(?x) ## Start pattern with verbose flag.
## Match start of string.
^
## First set of digits.
\(?
[1-9]\d{2}
\)?
## Optional separator.
[ .\-]?
## Second set of digits.
\d{3}
## Optional separator.
[ .\-]?
## Third set of digits.
\d{4}
## Match end of string.
$
-
Other Flags To Know About
-
(?xims)
-
(?i) - Ignore case
-
reFindNoCase( "abc" ) == reFind( "(?i)abc" )
-
(?m) - Multi-line
-
(?s) - Single-line
-
Matching vs. Capturing
Look, But Don't Touch
-
Look Ahead
-
(?= ... ) - Positive look ahead
-
(?! ... ) - Negative look ahead
-
Zero-length matches
-
Look Ahead Examples
Cold(?=Fusion)
matches
I love ColdFusion so much!
-
Look Ahead Examples
<a(?=[^>]+?href).+?>
matches
<a href="#">click here</a> now!
-
Look Behind
-
(?<= ... ) - Positive look behind
-
(?<! ... ) - Negative look behind
-
Zero-length matches
-
Look Behind Examples
(?<=Cold)Fusion
matches
I love ColdFusion so much!
-
Misc. Cool Tips
-
\xNN, \uNNNN - Hexadecimal characters
-
Thank You For Listening
-
Ben Nadel
-
Blog: http://www.bennadel.com
-
Twitter: @bennadel
-
Email: ben@bennadel.com
-
Ask Ben: http://www.bennadel.com/ask-ben
-
Consulting: http://www.epicenterconsulting.com