Recently I have been going through the fantastic book, Mastering Regular Expressions, by Jeffrey Friedl. Since I first learned the basics of creating a URL re-write in ColdFusion from a buddy that I used to work with, I developed a crazy fascination with Regular Expressions. For the CF developers that are reading this, it may seem like really old news to you, but many of the “SEO” folks will find this to be foreign, (a programmer is laughing right now …)
So, for the non-code oriented readers, regular expressions are a very powerful and efficient set of tools, methods and commands to manipulate strings of text and data. You might be wondering how that can be of any benefit to the SEO community. I shall attempt to explain.
The first major benefit that regular expressions offer is through SES (Search Engine Safe) URL re-writing. For Instance, let’s take a URL that would be considered as “Unsafe” for search engine optimization.
Now, I am probably going to catch hell for using this URL as an example, but you have got to love the fact that the web site for Section 508 which regards usability standards, does not use search engine safe URL’s. ( Is there anyone screaming ‘REMatch’ out there? ). There are several characters in dynamic URL’s that cause search engine spiders to stop crawling – Question marks, equal signs, ampersands, and colons, are but a few to mention. So, in the example above, we could simply run the URL through a regular expression which, replaces all of the unwanted characters with ones that are search engine safe.
Since this is not a tutorial on ColdFusion’s Regular expression functions, I’m only going to show an example of how this URL could be manipulated with a regular expression.
<cfset dirtyURL = ('#CGI.PATH_INFO# #CGI.QUERY_STRING#')> <cfset fixit = #ReplaceList(dirtyURL, "?,=,&","/,/,/")#> <cfset cleanURL = #ReReplace(fixit,"([[:space:]])","/","ALL")#> <cfoutput>#cleanURL#</cfoutput>
So, here we simply store the URL as a string variable, and replace the unsafe characters with desirable ones.
The end result above, would take an unsafe URL, like this …
And return a SES URL like this …
Which, is much better for the search engine spiders to index the content on your site … and … “it looks prettier”.
So, that is just one of the many powerful things that can be done with regular expressions, and as I learn more about them, I’ll be sure to post my discoveries, delights, and not-so-friendly encounters for all to see … (Oh joy … )
In the mean time, I have concocted a cool little cheat sheet, based on the one from Dave’s IloveJackDaniel’s Old site …
I’m putting it here so that I can remember what the hell all the different Meta-characters in the various flavors of RegEx syntax do.
Feel free to do with it as you will. 😉
» RegEx Cheat Sheet «
|Anchors||Quantifiers||Groups and Ranges|
|Start of string
Start of string
End of string
End of string
Not word boundary
Start of word
End of word
|0 or more
1 or more
0 or 1
3 or more
3, 4 or 5
|Any char except
new line (\n)
a or b
Range (a or b or c)
Not a or b or c
Letter between a and q
Upper case letter
between A and Q
Digit between 0 and 7
nth group/subpatternNote: Ranges are inclusive.
|“x” ~ below represents a quantifier
x? ~ Ungreedy version of “x”
|Character Classes||Escape Character||Pattern Modifiers|
Not white space
|\ ~ Escape Character||g
Treat string as single line
Allow comments and
white space in pattern
|Metacharacters (must be escaped)|
|POSIX||Special Characters||String Replacement (Backreferences)|
|Upper case letters
Lower case letters
Digits and letters
Space and tab
Printed characters and
Digits, letters and
Octal character xxx
Hex character hh
|nth non-passive group
“xyz” in /^(abc(xyz))$/
“xyz” in /^(?:abc)(xyz)$/
Before matched string
After matched string
Last matched string
Entire matched string
?!= or ?<!
Condition [if then]
Condition [if then else]
Letters, numbers and hyphens
Date (e.g. 5/3/2008)
jpg, gif or png image
Any number from 1 to 50 inclusive
Valid hexadecimal colour code
String with at least one upper case
letter, one lower case letter, and one
digit (useful for passwords).
|Note: These patterns are intended for reference purposes and have not been
extensively tested. Please use with caution and test thoroughly before use.
Latest posts by Edward (see all)
- Java Swing Model View Adapter Mediator - September 14, 2015
- Tomcat 8 on Java 8 Behind Apache on CentOS - April 16, 2015
- Getting a Primitive Data Types Wrapper Class in Java - November 13, 2014