SEO » SES URL » RegEx Cheat Sheet
Recently I have been going through the fantastic book, Mastering Regular Expressions, by Jeffrey Friedl. Since I first learned the basics of creating a URL re-write in ColdFusion from a buddy that I used to work with, I developed a crazy fascination with Regular Expressions. For the CF developers that are reading this, it may seem like really old news to you, but many of the "SEO" folks will find this to be foreign, (a programmer is laughing right now ...)
So, for the non-code oriented readers, regular expressions are a very powerful and efficient set of tools, methods and commands to manipulate strings of text and data. You might be wondering how that can be of any benefit to the SEO community. I shall attempt to explain.
The first major benefit that regular expressions offer is through SES (Search Engine Safe) URL re-writing. For Instance, let's take a URL that would be considered as "Unsafe" for search engine optimization.
~ section508.gov/index.cfm?FuseAction=Content&ID=3
Now, I am probably going to catch hell for using this URL as an example, but you have got to love the fact that the web site for Section 508 which regards usability standards, does not use search engine safe URL's. ( Is there anyone screaming 'REMatch' out there? ). There are several characters in dynamic URL's that cause search engine spiders to stop crawling - Question marks, equal signs, ampersands, and colons, are but a few to mention. So, in the example above, we could simply run the URL through a regular expression which, replaces all of the unwanted characters with ones that are search engine safe.
Since this is not a tutorial on ColdFusion's Regular expression functions, I'm only going to show an example of how this URL could be manipulated with a regular expression.
<cfset dirtyURL = ('#CGI.PATH_INFO# #CGI.QUERY_STRING#')>
<cfset fixit = #ReplaceList(dirtyURL, "?,=,&","/,/,/")#>
<cfset cleanURL = #ReReplace(fixit,"([[:space:]])","/","ALL")#>
<cfoutput>#cleanURL#</cfoutput>
So, here we simply store the URL as a string variable, and replace the unsafe characters with desirable ones.
The end result above, would take an unsafe URL, like this ...
~ section508.gov/index.cfm?FuseAction=Content&ID=3
And return a SES URL like this ...
~ section508.gov/index.cfm/FuseAction/Content/ID/3
Which, is much better for the search engine spiders to index the content on your site ... and ... "it looks prettier".
So, that is just one of the many powerful things that can be done with regular expressions, and as I learn more about them, I'll be sure to post my discoveries, delights, and not-so-friendly encounters for all to see ... (Oh joy ... )In the mean time, I have concocted a cool little cheat sheet, based on the one from Dave's IloveJackDaniel's site ...
I'm putting it here so that I can remember what the hell all the different Metacharacters in the various flavors of RegEx syntax do.
Feel free to do with it as you will.
» RegEx Cheat Sheet « |
|||||||
| Anchors | Quantifiers | Groups and Ranges | |||||
| ^ \A $ \Z \b \B \< \> |
Start of string Start of string End of string End of string Word boundary Not word boundary Start of word End of word |
* + ? {3} {3,} {3,5} |
0 or more 1 or more 0 or 1 Exactly 3 3 or more 3, 4 or 5 |
. (a|b) (...) (?:...) [abc] [^abc] [a-q] [A-Q] [0-7] \n |
Any char except
new line (\n) a or b Group Passive Group Range (a or b or c) Not a or b or c Letter between a and q Upper case letter between A and Q Digit between 0 and 7 nth group/subpattern Note: Ranges are inclusive. |
||
| Quantifier Modifiers | |||||||
| "x" ~ below represents a quantifier x? ~ Ungreedy version of "x" |
|||||||
| Character Classes | Escape Character | Pattern Modifiers | |||||
| \c \s \S \d \D \w \W \x \O |
Control character White space Not white space Digit Not digit Word Not word Hexadecimal digit Octal digit |
\ ~ Escape Character | g i m s x e U |
Global match Case-insensitive Multiple lines Treat string as single line Allow comments and white space in pattern Evaluate replacement Ungreedy pattern |
|||
| Metacharacters (must be escaped) | |||||||
|
|
|
|||||
| POSIX | Special Characters | String Replacement (Backreferences) | |||||
| [:upper:] [:lower:] [:alpha:] [:alnum:] [:digit:] [:xdigit:] [:punct:] [:blank:] [:space:] [:cntrl:] [:graph:] [:print:] [:word:] |
Upper case letters Lower case letters All letters Digits and letters Digits Hexadecimal digits Punctuation Space and tab Blank characters Control characters Printed characters Printed characters and spaces Digits, letters and underscore |
\n \r \t \v \f \xxx \xhh |
New line Carriage return Tab Vertical tab Form feed Octal character xxx Hex character hh |
$n $2 $1 $` $' $+ $& |
nth non-passive group "xyz" in /^(abc(xyz))$/ "xyz" in /^(?:abc)(xyz)$/ Before matched string After matched string Last matched string Entire matched string |
||
| Assertions | Sample Patterns | ||||||
| ?= ?! ?<= ?!= or ? ?> ?() ?()| ?# |
Lookahead assertion Negative lookahead Lookbehind assertion Negative lookbehind Once-only Subexpression Condition [if then] Condition [if then else] Comment |
Pattern ([A-Za-z0-9-]+) (\d{1,2}\/\d{1,2}\/\d{4}) ([^\s]+(?=\.(jpg|gif|png))\.\2) (^[1-9]{1}$|^[1-4]{1}[0-9]{1}$|^50$) (#?([A-Fa-f0-9]){3}(([A-Fa-f0-9]){3})?) ((?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15}) (\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6}) (\<(/?[^\>]+)\>) |
Will Match Letters, numbers and hyphens Date (e.g. 5/3/2008) jpg, gif or png image Any number from 1 to 50 inclusive Valid hexadecimal colour code String with at least one upper case letter, one lower case letter, and one digit (useful for passwords). Email addresses HTML Tags |
||||
| Note: These patterns are intended for reference purposes and have not been extensively tested. Please use with caution and test thoroughly before use. |
|||||||













There are no comments for this entry.
[Add Comment]