Regular Expressions Regex Cheat Sheet for Data Analysts

Regular Expressions, or Regex, are not unique to the Business Intelligence (BI) world, but they are something I frequently encounter and often find puzzling. I wish I had a handy cheat sheet whenever I need to use Regex in my formulas, but I can never find a concise and helpful one through Google searches. So, I decided to create one for people like me who don’t necessarily struggle with Regex but are often annoyed by its complexity.

Characters

Character	Legend	Example	Sample Match
.	Any character except line break	a.c	abc
\d	Matches any digit (Arabic numeral). Equivalent to `[0-9]`.	Order_\d	Order_1
\D	.Matches any character that is not a digit (Arabic numeral). Equivalent to `[^0-9]`	\D\D\D	ABC
\w	Matches any alphanumeric character, including the underscore. Equivalent to `[A-Za-z0-9_]`.	\w-\w\w\w	A-b_1
\W	Matches any character that is not a word character from the basic Latin alphabet. Equivalent to `[^A-Za-z0-9_]`	\W\W\W\W	*-+=)
\s	. Matches a single white space character, including space, tab, form feed, line feed, and other Unicode spaces.	\w-\sa\s\W	c- a *
\S	Matches a single character other than white space.	\S\S\S	yes
\t	Matches a horizontal tab.	T\t\w\b	T ab
\r	Carriage return character
\n	Line feed character
\r\n	Line separator on Windows	AB\r\nCD	AB CD
\	Escapes a special character	\.\\\~	.\~

Anchors

Anchor	Legend	Example	Sample Match
^	Matches the beginning of input. (But when [^inside brackets], it means “not”)	^abc .*	abc (line start)
\A	Start of string	\Aabc	abc(string start)
$	End of string, or end of line in multi-line pattern	.*? the end$	this is the end
\z	The end of the input	the end\z	this is…\n…the end
\Z	The end of the input but for the final terminator, if any	the end\Z	this is…\n…the end\n
\b	Matches a word boundary.	Bob.*\bcat\b	Bob ate the cat
\B	Not word boundary	c.\Bcat\B.	copycats
\G	The end of the previous match

Quantifiers

Quantifier	Legend	Example	Sample Match
+	One or more	x+	xxxxx
*	Zero or more times	ABC*	AACCCC
?	Once or none(Makes quantifiers “lazy）	abc?	abc
{3}	Exactly three times	\D{3}	ABC
{2,4}	Two to four times	\d{2,4}	156
{x,}	x or more times	\w{x,}	a_bdcs

Assertion

Assertion	Legend	Example	Sample Match
(?=…)	Positive lookahead. Matches “x” only if “x” is followed by “y”.	\d+(?= dollars)	100 in 100 dollars
(?<=…)	Positive lookbehind. Matches “x” only if “x” is preceded by “y”.	(?=\d+ dollars)\d+	100 in 100 dollars
(?!…)	Negative lookahead. Matches “x” only if “x” is not followed by “y”.	\d+(?!\d\| dollars)	100 in 100 Yen
(?<!…)	Negative lookbehind. Negative Lookahead Before the Match	(?!\d+ dollars)\d+	100 in 100 Yen

Character Classes and Groups

Character	Legend	Example	Sample Match
[ … ]	One of the characters in the brackets	be[ea]r	beer or bear
[x-y]	One of the characters in the range from x to y	[A-Z]+	GREAT
[^x]	One character that is not x	[^a-z]{3}	A1!
[^x-y]	One of the characters not in the range from x to y	[^a-h]+	zzz
[\d\D]	One character that is a digit or a non-digit	[\d\D]+	Any characters, including new lines
x\|y	Alternation / OR operand	M\|L	matches M in [size M]
( … )	Capturing group	A(nt\|pple)	Apple (captures “pple”)

Daily BI Talks

Business Intelligence Chats and Tips for Data Professionals!

Regular Expressions Regex Cheat Sheet for Data Analysts

Assertion

Character Classes and Groups