Regex CheatSheet
基础
-
dot
.
matches anything except for a newline -
\d
matches any digit [0-9] -
\D
matches any character that is not a digit. -
\s
matches any whitespace character [ \r\n\t\f ]. -
\S
matches any non-whitespace character -
\w
matches any word character, Word characters include alphanumeric characters (a-z, A-Z and 0-9) and underscores (_). -
\W
matches any non-word character -
^
matches the position at the start of a string -
$
matches the position at the end of a string
重复
-
*
matches zero or more repetitions of character/character class/group -
+
matches one or more repetitions of character/character class/group -
?
匹配 0 或者 1 个某字符。例如a?
表示 zero or one of a -
{x}
will match exactly x repetitions of character/character class/groups -
{x,y}
will match between x and y (both inclusive) repetitions of character/character class/groups。当省略 y 时表示重复 >= x 次。
Character Class
-
[]
matches only one out of several characters placed inside the square brackets. -
[^]
matches any character that is not in the square brackets.
字符范围,常用的有 [a-z],[A-Z],[0-9]
Grouping and Capturing
\b
assert position at a word boundary.
Three different positions qualify for word boundaries : ► Before the first character in the string, if the first character is a word character. ► Between two characters in the string, where one is a word character and the other is not a word character. ► After the last character in the string, if the last character is a word character.
-
()
around a regular expression can group that part of regex together. -
(?: )
can be used to create a non-capturing group. It is useful if we do not need the group to capture its match. -
|
表示或者,match a single item out of several possible items separated by the vertical bar. When used inside a character class, it will match characters; when used inside a group, it will match entire expressions (i.e., everything to the left or everything to the right of the vertical bar). 也就是说|
可以用在[]
或者()
中。
Backreferences
\group_number
:This tool (\1 references the first capturing group) matches the same text as previously matched by the capturing group. 比如,(\d)\1: It can match00
,11
,22
,33
,44
,55
,66
,77
,88
or99
.
Assertions
- Positive lookahead:
regex_1(?=regex_2)
assertsregex_1
to be immediately followed byregex_2
. The lookahead is excluded from the match. It does not return matches ofregex_2
. The lookahead only asserts whether a match is possible or not. - Negative lookahead:
regex_1(?!regex_2)
assertsregex_1
not to be immediately followed byregex_2
. Lookahead is excluded from the match (do not consume matches ofregex_2
), but only assert whether a match is possible or not. - Positive lookbehind:
(?<=regex_2)regex_1
assertsregex_1
to be immediately preceded byregex_2
. Lookbehind is excluded from the match (do not consume matches ofregex_2
), but only assert whether a match is possible or not. - Negative lookbehind:
(?<!regex_2)regex_1
assertsregex_1
not to be immediately preceded byregex_2
. Lookbehind is excluded from the match (do not consume matches ofregex_2
), but only assert whether a match is possible or not.
补充
*?
表示非贪婪匹配,即“尽可能少的匹配”。如r\w*?
对于 r, re, regex 都只会匹配到 r,而r\w*
则会匹配到整个单词。
Comments