SEO » SES URL » RegEx Cheat Sheet

Recently I have been going through the fantastic book, Mastering Regular Expressions, by Jeffrey Friedl. Since I first learned the basics of creating a URL re-write in ColdFusion from a buddy that I used to work with, I developed a crazy fascination with Regular Expressions. For the CF developers that are reading this, it may seem like really old news to you, but many of the "SEO" folks will find this to be foreign, (a programmer is laughing right now ...)

So, for the non-code oriented readers, regular expressions are a very powerful and efficient set of tools, methods and commands to manipulate strings of text and data. You might be wondering how that can be of any benefit to the SEO community. I shall attempt to explain.

The first major benefit that regular expressions offer is through SES (Search Engine Safe) URL re-writing. For Instance, let's take a URL that would be considered as "Unsafe" for search engine optimization.

~ section508.gov/index.cfm?FuseAction=Content&ID=3

Now, I am probably going to catch hell for using this URL as an example, but you have got to love the fact that the web site for Section 508 which regards usability standards, does not use search engine safe URL's. ( Is there anyone screaming 'REMatch' out there? ). There are several characters in dynamic URL's that cause search engine spiders to stop crawling - Question marks, equal signs, ampersands, and colons, are but a few to mention. So, in the example above, we could simply run the URL through a regular expression which, replaces all of the unwanted characters with ones that are search engine safe.

Since this is not a tutorial on ColdFusion's Regular expression functions, I'm only going to show an example of how this URL could be manipulated with a regular expression.

<cfset dirtyURL = ('#CGI.PATH_INFO# #CGI.QUERY_STRING#')>
<cfset fixit = #ReplaceList(dirtyURL, "?,=,&","/,/,/")#>
<cfset cleanURL = #ReReplace(fixit,"([[:space:]])","/","ALL")#>
<cfoutput>#cleanURL#</cfoutput>

So, here we simply store the URL as a string variable, and replace the unsafe characters with desirable ones.

The end result above, would take an unsafe URL, like this ...

~ section508.gov/index.cfm?FuseAction=Content&ID=3

And return a SES URL like this ...

~ section508.gov/index.cfm/FuseAction/Content/ID/3

Which, is much better for the search engine spiders to index the content on your site ... and ... "it looks prettier".

So, that is just one of the many powerful things that can be done with regular expressions, and as I learn more about them, I'll be sure to post my discoveries, delights, and not-so-friendly encounters for all to see ... (Oh joy ... )

In the mean time, I have concocted a cool little cheat sheet, based on the one from Dave's IloveJackDaniel's site ...

I'm putting it here so that I can remember what the hell all the different Metacharacters in the various flavors of RegEx syntax do.
Feel free to do with it as you will.

» RegEx Cheat Sheet «

Anchors Quantifiers Groups and Ranges
^
\A
$
\Z
\b
\B
\<
\>
Start of string
Start of string
End of string
End of string
Word boundary
Not word boundary
Start of word
End of word
*
+
?
{3}
{3,}
{3,5}
0 or more
1 or more
0 or 1
Exactly 3
3 or more
3, 4 or 5
.
(a|b)
(...)
(?:...)
[abc]
[^abc]
[a-q]
[A-Q]
[0-7]
\n

Any char except new line (\n)
a or b
Group
Passive Group
Range (a or b or c)
Not a or b or c
Letter between a and q
Upper case letter
between A and Q
Digit between 0 and 7
nth group/subpattern

Note: Ranges are inclusive.
Quantifier Modifiers
"x" ~ below represents a quantifier
x? ~ Ungreedy version of "x"
Character Classes Escape Character Pattern Modifiers
\c
\s
\S
\d
\D
\w
\W
\x
\O
Control character
White space
Not white space
Digit
Not digit
Word
Not word
Hexadecimal digit
Octal digit
\ ~ Escape Character g
i
m
s
x
e
U
Global match
Case-insensitive
Multiple lines
Treat string as single line
Allow comments and
white space in pattern
Evaluate replacement
Ungreedy pattern
Metacharacters (must be escaped)
  • ^
    $
    (
    )
    <
  • [
    {
    \
    |
    >
  • .
    *
    +
    ?
POSIX Special Characters String Replacement (Backreferences)
[:upper:]
[:lower:]
[:alpha:]
[:alnum:]
[:digit:]
[:xdigit:]
[:punct:]
[:blank:]
[:space:]
[:cntrl:]
[:graph:]
[:print:]

[:word:]
Upper case letters
Lower case letters
All letters
Digits and letters
Digits
Hexadecimal digits
Punctuation
Space and tab
Blank characters
Control characters
Printed characters
Printed characters and
spaces
Digits, letters and
underscore
\n
\r
\t
\v
\f
\xxx
\xhh
New line
Carriage return
Tab
Vertical tab
Form feed
Octal character xxx
Hex character hh
$n
$2
$1
$`
$'
$+
$&
nth non-passive group
"xyz" in /^(abc(xyz))$/
"xyz" in /^(?:abc)(xyz)$/
Before matched string
After matched string
Last matched string
Entire matched string
Assertions Sample Patterns
?=
?!
?<=
?!= or ? ?>
?()
?()|
?#
Lookahead assertion
Negative lookahead
Lookbehind assertion
Negative lookbehind
Once-only Subexpression
Condition [if then]
Condition [if then else]
Comment
Pattern
([A-Za-z0-9-]+)
(\d{1,2}\/\d{1,2}\/\d{4})
([^\s]+(?=\.(jpg|gif|png))\.\2)
(^[1-9]{1}$|^[1-4]{1}[0-9]{1}$|^50$)
(#?([A-Fa-f0-9]){3}(([A-Fa-f0-9]){3})?)
((?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15})

(\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6})
(\<(/?[^\>]+)\>)
Will Match
Letters, numbers and hyphens
Date (e.g. 5/3/2008)
jpg, gif or png image
Any number from 1 to 50 inclusive
Valid hexadecimal colour code
String with at least one upper case
letter, one lower case letter, and one
digit (useful for passwords).
Email addresses
HTML Tags
Note: These patterns are intended for reference purposes and have not been
extensively tested. Please use with caution and test thoroughly before use.


Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
BlogCFC was created by Raymond Camden. This blog is running version 5.9. Contact Florida Search Engine Optimization L.L.C.
Search Engine Optimization Specialist || Web Designer || Web Developer || Edward J Beckett ||
Search Engine Optimization Company  || SEO Services || Internet Marketing Company || Search Engine Optimization Expert || Florida Search Engine Optimization LLC
Florida Search Engine Optimization || Search Engine Optimization || SEO Services || Florida SEO Blog
Florida Search Engine Optimization
Search Engine Optimization
SEO Services
Florida SEO Blog
July-04-2008
1:45 AM EST