Regex CheatSheet

2 minute read

基础

dot . matches anything except for a newline
\d matches any digit [0-9]
\D matches any character that is not a digit.
\s matches any whitespace character [ \r\n\t\f ].
\S matches any non-whitespace character
\w matches any word character, Word characters include alphanumeric characters (a-z, A-Z and 0-9) and underscores (_).
\W matches any non-word character
^ matches the position at the start of a string
$ matches the position at the end of a string

重复

* matches zero or more repetitions of character/character class/group
+ matches one or more repetitions of character/character class/group
? 匹配 0 或者 1 个某字符。例如 a? 表示 zero or one of a
{x} will match exactly x repetitions of character/character class/groups
{x,y} will match between x and y (both inclusive) repetitions of character/character class/groups。当省略 y 时表示重复 >= x 次。

Character Class

[] matches only one out of several characters placed inside the square brackets.
[^] matches any character that is not in the square brackets.

字符范围，常用的有 [a-z]，[A-Z]，[0-9]

Grouping and Capturing

\b assert position at a word boundary.

Three different positions qualify for word boundaries : ► Before the first character in the string, if the first character is a word character. ► Between two characters in the string, where one is a word character and the other is not a word character. ► After the last character in the string, if the last character is a word character.

() around a regular expression can group that part of regex together.
(?: ) can be used to create a non-capturing group. It is useful if we do not need the group to capture its match.
| 表示或者，match a single item out of several possible items separated by the vertical bar. When used inside a character class, it will match characters; when used inside a group, it will match entire expressions (i.e., everything to the left or everything to the right of the vertical bar). 也就是说 | 可以用在 [] 或者 () 中。

Backreferences

\group_number：This tool (\1 references the first capturing group) matches the same text as previously matched by the capturing group. 比如，(\d)\1: It can match 00, 11, 22, 33, 44, 55, 66, 77, 88 or 99.

Assertions

Positive lookahead：regex_1(?=regex_2) asserts regex_1 to be immediately followed by regex_2. The lookahead is excluded from the match. It does not return matches of regex_2. The lookahead only asserts whether a match is possible or not.
Negative lookahead: regex_1(?!regex_2) asserts regex_1 not to be immediately followed by regex_2. Lookahead is excluded from the match (do not consume matches of regex_2), but only assert whether a match is possible or not.
Positive lookbehind: (?<=regex_2)regex_1 asserts regex_1 to be immediately preceded by regex_2. Lookbehind is excluded from the match (do not consume matches of regex_2), but only assert whether a match is possible or not.
Negative lookbehind: (?<!regex_2)regex_1 asserts regex_1 not to be immediately preceded by regex_2. Lookbehind is excluded from the match (do not consume matches of regex_2), but only assert whether a match is possible or not.

补充

*? 表示非贪婪匹配，即“尽可能少的匹配”。如 r\w*? 对于 r, re, regex 都只会匹配到 r，而 r\w* 则会匹配到整个单词。

Twitter Facebook LinkedIn

基础

重复

Character Class

Grouping and Capturing

Backreferences

Assertions

补充

Comments