RegEx basics


? = zero or one

+ = one or more

* = zero or more

{3} = 3 literal characters

{3,} = 3 or more

{1,3} = between 1 and 3 characters

An example using {1,3}:

Collections and Negation

[ ] = one of any of these characters

[^] = using the caret (^) inside of a collection means anything BUT these letters

So this regular expression means anything BUT the letters a-z, or the number 4:


So let’s use this and match ANY properly punctuated sentence with the following:



  • [A-Z] = exactly one capital letter A-Z
  • [^   ]+ = negation, one or more of any characters NOT in this collection
  • \.?! = the literal . character, the ? character, or the ! character


Whitespace Characters

\t = tab

\n = new line

\r = carriage return

\f = line feed

\v = vertical tab

New lines from Windows have two characters: \r\n
New lines for macOS/Linux: \n


You can anchor expressions to the start or the end of a string:

^ = anchor at the start of the string

$ = anchor at the end of the string

Anchoring at the start:

Anchoring at the end:

Character classes

Single tokens which can represent a wide variety of characters (with some commonality)

. = means any character except a new line

\s = any kind of whitespace

\S = the inverse of \s, or any kind of character that is NOT whitespace

\d = any digit character [0-9]

\D = any non-digit character [^0-9]

\w = word character [0-9A-Za-z_]

\W = any non-word character [^0-9A-Za-z_]

\b = word boundary (before the \b is \w and after is a \W, or vice versa]

\B = not a word boundary

So to combine anchors with character classes, let’s find a string that starts with a word character and ends with anything that ISN’T a digit:



RegEx Examples (useful for log file searching)

Search for everything up to, but not including “abc”



Remove all blank spaces at the beginning of lines



Replace with:



  • ^ – Beginning of the line
  • \s – A whitespace character
  • * – 0 or more of them
  • ( – Begin a capture group
  • \w – A word character ([a-z] or [A-Z])
  • . – Any character
  • * – 0 or more of them
  • ) – End our capture group
  • $ – End of the line
  • \1 – The contents of the first capture group (Our word character and all characters after that)