Regular Expressions, or Regex, are not unique to the Business Intelligence (BI) world, but they are something I frequently encounter and often find puzzling. I wish I had a handy cheat sheet whenever I need to use Regex in my formulas, but I can never find a concise and helpful one through Google searches. So, I decided to create one for people like me who don’t necessarily struggle with Regex but are often annoyed by its complexity.
Characters
Character | Legend | Example | Sample Match |
---|---|---|---|
. | Any character except line break | a.c | abc |
\d | Matches any digit (Arabic numeral). Equivalent to [0-9] . | Order_\d | Order_1 |
\D | .Matches any character that is not a digit (Arabic numeral). Equivalent to [^0-9] | \D\D\D | ABC |
\w | Matches any alphanumeric character, including the underscore. Equivalent to [A-Za-z0-9_] . | \w-\w\w\w | A-b_1 |
\W | Matches any character that is not a word character from the basic Latin alphabet. Equivalent to [^A-Za-z0-9_] | \W\W\W\W | *-+=) |
\s | . Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. | \w-\sa\s\W | c- a * |
\S | Matches a single character other than white space. | \S\S\S | yes |
\t | Matches a horizontal tab. | T\t\w\b | T ab |
\r | Carriage return character | ||
\n | Line feed character | ||
\r\n | Line separator on Windows | AB\r\nCD | AB CD |
\ | Escapes a special character | \.\\\~ | .\~ |
Anchors
Anchor | Legend | Example | Sample Match |
---|---|---|---|
^ | Matches the beginning of input. (But when [^inside brackets], it means “not”) | ^abc .* | abc (line start) |
\A | Start of string | \Aabc | abc(string start) |
$ | End of string, or end of line in multi-line pattern | .*? the end$ | this is the end |
\z | The end of the input | the end\z | this is…\n…the end |
\Z | The end of the input but for the final terminator, if any | the end\Z | this is…\n…the end\n |
\b | Matches a word boundary. | Bob.*\bcat\b | Bob ate the cat |
\B | Not word boundary | c.*\Bcat\B.* | copycats |
\G | The end of the previous match |
Quantifiers
Quantifier | Legend | Example | Sample Match |
---|---|---|---|
+ | One or more | x+ | xxxxx |
* | Zero or more times | A*B*C* | AACCCC |
? | Once or none(Makes quantifiers “lazy) | abc? | abc |
{3} | Exactly three times | \D{3} | ABC |
{2,4} | Two to four times | \d{2,4} | 156 |
{x,} | x or more times | \w{x,} | a_bdcs |
Assertion
Assertion | Legend | Example | Sample Match |
---|---|---|---|
(?=…) | Positive lookahead. Matches “x” only if “x” is followed by “y”. | \d+(?= dollars) | 100 in 100 dollars |
(?<=…) | Positive lookbehind. Matches “x” only if “x” is preceded by “y”. | (?=\d+ dollars)\d+ | 100 in 100 dollars |
(?!…) | Negative lookahead. Matches “x” only if “x” is not followed by “y”. | \d+(?!\d| dollars) | 100 in 100 Yen |
(?<!…) | Negative lookbehind. Negative Lookahead Before the Match | (?!\d+ dollars)\d+ | 100 in 100 Yen |
Character Classes and Groups
Character | Legend | Example | Sample Match |
---|---|---|---|
[ … ] | One of the characters in the brackets | be[ea]r | beer or bear |
[x-y] | One of the characters in the range from x to y | [A-Z]+ | GREAT |
[^x] | One character that is not x | [^a-z]{3} | A1! |
[^x-y] | One of the characters not in the range from x to y | [^a-h]+ | zzz |
[\d\D] | One character that is a digit or a non-digit | [\d\D]+ | Any characters, including new lines |
x|y | Alternation / OR operand | M|L | matches M in [size M] |
( … ) | Capturing group | A(nt|pple) | Apple (captures “pple”) |