Computer programming #code/regexp


Special Characters

. [ { ( ) \ ^ $ | ? * +

symbol function example
\ Escape symbol - makes the next character literal \. dot
\* star
\\ backslash

General tokens

symbol function example
\n Newline
\N Anything but a newline
\t Tab
\0 Null character

Meta sequences

symbol function example
. Any single character other than newline (or including line terminators with the /s flag) /.+/ = a b c
a|b a OR b
\s Any whitespace
\S Any non-whitespace
\d Any digit
\D Any non-digit
\w Any word character
\W Any non-word character
\h Horizontal whitespace character
\r Carriage return
\R Unicode newlines

Character classes

symbol function example
[abc] A single character of a b c
[^abc] A character except a b c
[a-zA-Z] A character in range a - z or A-Z
[a-z] A character in range a - z
[^a-z] A character not in range a - z
:<: Start of word.
POSIX equivalent of the \b (word boundary) is interpreted as \b(?=\w)
:>: End of word.
POSIX equivalent of the \b word boundary is interpreted as \\b(?<=\\w)
:alnum: Letters and digits. Equivalent to [A-Za-z0-9]
:alpha: Letters.
Equivalent to [A-Za-z].
:ascii: ASCII codes 0 - 127.
Equivalent to [\x00-\x7F]
:blank: Space or Tab only (not new lines).
Equivalent to [ \t]
:word: Word character, letters, numbers, underscores.
POSIX equivalent to \w or [a-zA-Z0-9_]
:punct: Matches characters that are not whitespace, letters or numbers.


symbol function example
a? Zero or onw of a. .? = Zero or one any characters
a* Zero or more of a.
Greedy quantifier - matches as many characters as possible
a+ One or more of a
a Exactly 3 of a
a 3 or more of a
a Between 3 and 6 of a
a*? Lazy quantifier - matches as few characters as possible
a*+ Possessive quantifier


symbol function example
\b A word boundary
\B Non-word boundary
^ or \A Start of string
$ or \Z End of string
\z Absolute end of string


symbol function example
(?: ...) Match anything enclosed
(...) Capture anything enclosed


symbol function example
$1 Contents of capture group 1
$` Contents before match
$' Contents after match
$& Complete match content
\x20 Hexadecimal replacement values
\x Hexadecimal replacement values
\t Insert Tab
\r Insert carriage return
\n Insert Newline
\f Insert form-feed

Character Escapes

symbol code meaning
\a \u007 bell
\b \u008 Backspace
\t \u009 Tab
\r \u00D CR
\v \u00B Vertical tab
\f \u00C Formfeed
\n \u00A NL
\e \u001B Escape
\nNN Octal character
\xNN Hex character
\uNNNN Unicode character


symbol function example
g Global
m Multiline
i Case sensitive
u Unicode
U Ungreedy
x Ignore whitespace/verbose

Useful ones

Get array content without parenthesis per word

(?:\[([^\]]+)\]) /gm

Extract json to csv

*"(\w+)": "?(\w*)"?,/gm

MySQL : table structure parser


Match an email address

^\w{4,}@+\w{2,}(.com||.net|.info|.xyz)$ gm

Regex of email with ability to add gmail's +example system
root email = gmail+ =

^[a-zA-Z0-9_.]+[+]?[a-zA-Z0-9]+[@]{1}[a-z0-9]+[\.][a-z]+$ /gm

RFC 2822 Email validation

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])? /g

IP4 address

\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b /ig

IP proxy scrap

\b(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9]))|\:(\d*)\b /g

Match an IPv6 address

^(?1)){1,6}$ /gmi


^(?:[[:xdigit:{1,4}:){5}:(?:[[:xdigit:{1,4}:){1,6}:$ /gm

Detect script tag

<.*?script.*\/?> /ig

Phone number

^\s*(?:\+?(\d{1,3}))?([-. (]*(\d{3})[-. )]*)?((\d{3})[-. ]*(\d{2,4})(?:[-.x ]*(\d+))?)\s*$ /gm

Youtube URL


Makdown link

(\[((?:\[^\[\)*)\]\([ \t]*()<?((?:\([^)]*\)|[^()\s])*?)>?[ \t]*((['"])(.*?)\6[ \t]*)?\)) /g

JS Comment

\/\/(?![\S]{2,}\.[\w]).*|\/\*(.|\n)+?\*\/ /g

Trim whitespace

(?:\s)\s /g

Hex color