Regular Expressions, or Regex, are not unique to the Business Intelligence (BI) world, but they are something I frequently encounter and often find puzzling. I wish I had a handy cheat sheet whenever I need to use Regex in my formulas, but I can never find a concise and helpful one through Google searches. So, I decided to create one for people like me who don’t necessarily struggle with Regex but are often annoyed by its complexity.
Characters
| Character | Legend | Example | Sample Match |
|---|---|---|---|
| . | Any character except line break | a.c | abc |
| \d | Matches any digit (Arabic numeral). Equivalent to [0-9]. | Order_\d | Order_1 |
| \D | .Matches any character that is not a digit (Arabic numeral). Equivalent to [^0-9] | \D\D\D | ABC |
| \w | Matches any alphanumeric character, including the underscore. Equivalent to [A-Za-z0-9_]. | \w-\w\w\w | A-b_1 |
| \W | Matches any character that is not a word character from the basic Latin alphabet. Equivalent to [^A-Za-z0-9_] | \W\W\W\W | *-+=) |
| \s | . Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces. | \w-\sa\s\W | c- a * |
| \S | Matches a single character other than white space. | \S\S\S | yes |
| \t | Matches a horizontal tab. | T\t\w\b | T ab |
| \r | Carriage return character | ||
| \n | Line feed character | ||
| \r\n | Line separator on Windows | AB\r\nCD | AB CD |
| \ | Escapes a special character | \.\\\~ | .\~ |
Anchors
| Anchor | Legend | Example | Sample Match |
|---|---|---|---|
| ^ | Matches the beginning of input. (But when [^inside brackets], it means “not”) | ^abc .* | abc (line start) |
| \A | Start of string | \Aabc | abc(string start) |
| $ | End of string, or end of line in multi-line pattern | .*? the end$ | this is the end |
| \z | The end of the input | the end\z | this is…\n…the end |
| \Z | The end of the input but for the final terminator, if any | the end\Z | this is…\n…the end\n |
| \b | Matches a word boundary. | Bob.*\bcat\b | Bob ate the cat |
| \B | Not word boundary | c.*\Bcat\B.* | copycats |
| \G | The end of the previous match |
Quantifiers
| Quantifier | Legend | Example | Sample Match |
|---|---|---|---|
| + | One or more | x+ | xxxxx |
| * | Zero or more times | A*B*C* | AACCCC |
| ? | Once or none(Makes quantifiers “lazy) | abc? | abc |
| {3} | Exactly three times | \D{3} | ABC |
| {2,4} | Two to four times | \d{2,4} | 156 |
| {x,} | x or more times | \w{x,} | a_bdcs |
Assertion
| Assertion | Legend | Example | Sample Match |
|---|---|---|---|
| (?=…) | Positive lookahead. Matches “x” only if “x” is followed by “y”. | \d+(?= dollars) | 100 in 100 dollars |
| (?<=…) | Positive lookbehind. Matches “x” only if “x” is preceded by “y”. | (?=\d+ dollars)\d+ | 100 in 100 dollars |
| (?!…) | Negative lookahead. Matches “x” only if “x” is not followed by “y”. | \d+(?!\d| dollars) | 100 in 100 Yen |
| (?<!…) | Negative lookbehind. Negative Lookahead Before the Match | (?!\d+ dollars)\d+ | 100 in 100 Yen |
Character Classes and Groups
| Character | Legend | Example | Sample Match |
|---|---|---|---|
| [ … ] | One of the characters in the brackets | be[ea]r | beer or bear |
| [x-y] | One of the characters in the range from x to y | [A-Z]+ | GREAT |
| [^x] | One character that is not x | [^a-z]{3} | A1! |
| [^x-y] | One of the characters not in the range from x to y | [^a-h]+ | zzz |
| [\d\D] | One character that is a digit or a non-digit | [\d\D]+ | Any characters, including new lines |
| x|y | Alternation / OR operand | M|L | matches M in [size M] |
| ( … ) | Capturing group | A(nt|pple) | Apple (captures “pple”) |
Discover more from Daily BI Talks
Subscribe to get the latest posts sent to your email.
