RegEXP
Computer programming #code/regexp
Liks
Special Characters
. [ { ( ) \ ^ $ | ? * +
symbol | function | example |
---|---|---|
\ | Escape symbol - makes the next character literal | \. dot \* star \\ backslash |
General tokens
symbol | function | example |
---|---|---|
\n | Newline | |
\N | Anything but a newline | |
\t | Tab | |
\0 | Null character |
Meta sequences
symbol | function | example |
---|---|---|
. | Any single character other than newline (or including line terminators with the /s flag) | /.+/ = a b c |
a|b | a OR b | |
\s | Any whitespace | |
\S | Any non-whitespace | |
\d | Any digit | |
\D | Any non-digit | |
\w | Any word character | |
\W | Any non-word character | |
\h | Horizontal whitespace character | |
\r | Carriage return | |
\R | Unicode newlines |
Character classes
symbol | function | example |
---|---|---|
[abc] | A single character of a b c | |
[^abc] | A character except a b c | |
[a-zA-Z] | A character in range a - z or A-Z | |
[a-z] | A character in range a - z | |
[^a-z] | A character not in range a - z | |
:<: | Start of word. POSIX equivalent of the \b (word boundary) is interpreted as \b(?=\w) |
|
:>: | End of word. POSIX equivalent of the \b word boundary is interpreted as \\b(?<=\\w) |
|
:alnum: | Letters and digits. Equivalent to [A-Za-z0-9] |
|
:alpha: | Letters. Equivalent to [A-Za-z] . |
|
:ascii: | ASCII codes 0 - 127. Equivalent to [\x00-\x7F] |
|
:blank: | Space or Tab only (not new lines). Equivalent to [ \t] |
|
:word: | Word character, letters, numbers, underscores. POSIX equivalent to \w or [a-zA-Z0-9_] |
|
:punct: | Matches characters that are not whitespace, letters or numbers. |
Quantifiers
symbol | function | example |
---|---|---|
a? | Zero or onw of a. | .? = Zero or one any characters |
a* | Zero or more of a. Greedy quantifier - matches as many characters as possible |
|
a+ | One or more of a | |
a | Exactly 3 of a | |
a | 3 or more of a | |
a | Between 3 and 6 of a | |
a*? | Lazy quantifier - matches as few characters as possible | |
a*+ | Possessive quantifier |
Anchors
symbol | function | example |
---|---|---|
\b | A word boundary | |
\B | Non-word boundary | |
^ or \A | Start of string | |
$ or \Z | End of string | |
\z | Absolute end of string |
Groups
symbol | function | example |
---|---|---|
(?: ...) | Match anything enclosed | |
(...) | Capture anything enclosed |
Substitution
symbol | function | example |
---|---|---|
$1 | Contents of capture group 1 | |
$` | Contents before match | |
$' | Contents after match | |
$& | Complete match content | |
\x20 | Hexadecimal replacement values | |
\x | Hexadecimal replacement values | |
\t | Insert Tab | |
\r | Insert carriage return | |
\n | Insert Newline | |
\f | Insert form-feed |
Character Escapes
symbol | code | meaning |
---|---|---|
\a | \u007 | bell |
\b | \u008 | Backspace |
\t | \u009 | Tab |
\r | \u00D | CR |
\v | \u00B | Vertical tab |
\f | \u00C | Formfeed |
\n | \u00A | NL |
\e | \u001B | Escape |
\nNN | Octal character | |
\xNN | Hex character | |
\uNNNN | Unicode character |
Modifiers
symbol | function | example |
---|---|---|
g | Global | |
m | Multiline | |
i | Case sensitive | |
u | Unicode | |
U | Ungreedy | |
x | Ignore whitespace/verbose |
Useful ones
Get array content without parenthesis per word
(?:\[([^\]]+)\]) /gm
*"(\w+)": "?(\w*)"?,/gm
MySQL : table structure parser
^(\w+)\s+(\w+)(\(\d*\))?.*$
^\w{4,}@+\w{2,}(.com|.co.uk|.net|.info|.xyz)$ gm
Regex of email with ability to add gmail's +example system
root email = user@gmail.com gmail+ = user+extra@gmail.com
^[a-zA-Z0-9_.]+[+]?[a-zA-Z0-9]+[@]{1}[a-z0-9]+[\.][a-z]+$ /gm
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])? /g
\b(?:(?:2(?:[0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9])\.){3}(?:(?:2([0-4][0-9]|5[0-5])|[0-1]?[0-9]?[0-9]))\b /ig
\b(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9]))|\:(\d*)\b /g
^(?1)){1,6}$ /gmi
^(?:[[:xdigit:{1,4}:){5}:(?:[[:xdigit:{1,4}:){1,6}:$ /gm
<.*?script.*\/?> /ig
^\s*(?:\+?(\d{1,3}))?([-. (]*(\d{3})[-. )]*)?((\d{3})[-. ]*(\d{2,4})(?:[-.x ]*(\d+))?)\s*$ /gm
(?:https?:\/\/)?(?:(?:(?:www\.?)?youtube\.com(?:\/(?:(?:watch\?.*?(v=[^&\s]+).*)|(?:v(\/.*))|(channel\/.+)|(?:user\/(.+))|(?:results\?(search_query=.+))))?)|(?:youtu\.be(\/.*)?))
(\[((?:\[^\[\)*)\]\([ \t]*()<?((?:\([^)]*\)|[^()\s])*?)>?[ \t]*((['"])(.*?)\6[ \t]*)?\)) /g
\/\/(?![\S]{2,}\.[\w]).*|\/\*(.|\n)+?\*\/ /g
(?:\s)\s /g
\b0x(?:[0-9A-Fa-f]{6}|0-9A-Fa-f]{8})\b