FANDOM



References

Syntax

Character Classes

Concept Description Syntax Example
Bracket expression matches a single collating element contained in the non-empty set of collating elements represented by the bracket expression [expression] [abc] [0-9a-zA-Z] [^0-9]
Character class expression represents the set of characters belonging to a character class [:name:] [:alpha:] [:digit:] [:xdigit:] [:alnum:] [:punct:] [:space:] [:blank:]
Shorthand character class \d \D \s \S \w \W

Character class expressions

  • POSIX Bracket Expressions
    • POSIX bracket expressions are a special kind of character classe.
    • [:alnum:] = [a-zA-Z0-9], [:alpha:] = [a-zA-Z], [:digit:] = [0-9], [:lower:] = [a-z]
POSIX Description ASCII Shorthand Java Remarks
[:alnum:] Alphanumeric characters [a-zA-Z0-9] \p{Alnum}
[:alpha:] Alphabetic characters [a-zA-Z] \p{Alpha}
[:digit:] Digits [0-9] \d \p{Digit}
[:punct:] Punctuation and symbols [!"\#$%&'()*+,\-./:;<=>?@\[\\\]^_`{|}~] \p{Punct}
[:space:] All whitespace characters [ \t\r\n\v\f] \s \p{Space}
[:word:] Word characters [A-Za-z0-9_] \w

Shorthand Character Classes

Shorthand Description Braket Expressions Remarks
\d digit [0-9]
\w word character [A-Za-z0-9_]
\s whitespace character [ \t\r\n\f]
\D non-digit [^\d]
\W non-word character [^\w]
\S non-whitespace character [^\s]

Quantifier

Quantity Greedy Lazy Possessive
zero or one occurrences ? ?? ?+
zero or more occurrences * *? *+
one or more occurrences + +? ++
exactly n times {n} {n}? {n}+
n or more times {n,} {n,}? {n,}+
at least n times, but not more than m times {n,m} {n,m}? {n,m}+

Anchor

Symbol Name BRE ERE Java Perl GNU sed
^ Start of Line O O O O O
$ End of Line O O O O O
\b Word Boundary O O O
\B Non Word Boundary O O O

Case Conversion

Symbol Description BRE ERE Java Perl GNU sed
\U All literal text and all text inserted by replacement text tokens after \U up to the next \E or \L is converted to uppercase. X O
\L All literal text and all text inserted by replacement text tokens after \L up to the next \E or \U is converted to lowercase. X O
\u The first character after \u that is inserted into the replacement text as a literal or by a token is converted to uppercase. X O
\l The first character after \l that is inserted into the replacement text as a literal or by a token is converted to lowercase. X O
\u\L The first character after \u\L that is inserted into the replacement text as a literal or by a token is converted to uppercase and the following characters up to the next \E or \U are converted to lowercase. X O
\l\U The first character after \l\U that is inserted into the replacement text as a literal or by a token is converted to lowercase and the following characters up to the next \E or \L are converted to uppercase. X O

Readings

BRE vs ERE

BRE ERE
Special characters . [ \ * ^ $ . [ \ ( * + ? { | ^ $

Regex Dialects

Java

API

Class/Interface/Method Description Remarks
Class Pattern A compiled representation of a regular expression.
public static Pattern Pattern.compile​(String regex) Compiles the given regular expression into a pattern.
public Matcher Pattern.matcher​(CharSequence input) Creates a matcher that will match the given input against this pattern.
public static boolean Pattern.matches​(String regex, CharSequence input) Compiles the given regular expression and attempts to match the given input against it.
Class Matcher An engine that performs match operations on a character sequence by interpreting a Pattern.
public boolean Matcher.matches​() Attempts to match the entire region against the pattern.
public boolean Matcher.find​() Attempts to find the next subsequence of the input sequence that matches the pattern.
public MatchResult Matcher.toMatchResult​() Returns the match state of this matcher as a MatchResult.
Interface MatchResult The result of a match operation.
public boolean String.matches​(String regex) Tells whether or not this string matches the given regular expression.
public String String.replaceAll​(String regex, String replacement) Replaces each substring of this string that matches the given regular expression with the given replacement.
public String String.replaceFirst​(String regex, String replacement) Replaces the first substring of this string that matches the given regular expression with the given replacement.
Class RegExUtils Helpers to process Strings using regular expressions. org.apache.commons.lang3

Embedded Flags

Flag Constants Description Remarks
(?s) Pattern.DOTALL The expression . matches any character, including a line terminator.
(?m) Pattern.MULTILINE The expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence.
(?i) Pattern.CASE_INSENSITIVE Enables case-insensitive matching.
(?U) Pattern.UNICODE_CHARACTER_CLASS Enables the Unicode version of Predefined character classes and POSIX character classes.
(?d) Pattern.UNIX_LINES Only the '\n' line terminator is recognized in the behavior of ., ^, and $.

Sample Codes

Check matching string
boolean b = Pattern.matches("a*b", "aaaaab");
 
Pattern p = Pattern.compile("a*b");
boolean b1 = p.matcher("aaaaab").matches();
boolean b2 = p.matcher("bbbbb").matches();
Find matching string
    final String name = RegExUtils.replaceFirst(output, "(?s).*<span id='name'>((?:\\w|\\s)*)</span>.*", "$1");
    final String country = RegExUtils.replaceFirst(output, "(?s).*<span id='country'>((?:\\w|\\s)*)</span>.*", "$1");
    final String age = RegExUtils.replaceFirst(output, "(?s).*<span id='age'>((?:\\w|\\s)*)</span>.*", "$1");

Readings

Perl

.NET

sed

Special Topics

Special Characters

ERE special characters

An ERE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash, such a character is an ERE that matches the special character itself. The extended regular expression special characters and the contexts in which they have their special meaning are:

. \ [ (
The period, left-bracket, backslash and left-parenthesis are special except when used in a bracket expression. Outside a bracket expression, a left-parenthesis immediately followed by a right-parenthesis produces undefined results.
)
The right-parenthesis is special when matched with a preceding left-parenthesis, both outside a bracket expression.
* + ? {
The asterisk, plus-sign, question-mark and left-brace are special except when used in a bracket expression (see RE Bracket Expression ). Any of the following uses produce undefined results:
  • if these characters appear first in an ERE, or immediately following a vertical-line, circumflex or left-parenthesis.
  • if a left-brace is not part of a valid interval expression.
|
The vertical-line is special except when used in a bracket expression. A vertical-line appearing first or last in an ERE, or immediately following a vertical-line or a left-parenthesis, or immediately preceding a right-parenthesis, produces undefined results.
^
The circumflex is special when used:
  • as an anchor
  • as the first character of a bracket expression
$
The dollar sign is special when used as an anchor.

BRE

A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a backslash, such a character will be a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are:

. [ \
The period, left-bracket and backslash is special except when used in a bracket expression (see RE Bracket Expression ). An expression containing a [ that is not preceded by a backslash and is not part of a bracket expression produces undefined results.
*
The asterisk is special except when used:
  • in a bracket expression
  • as the first character of an entire BRE (after an initial ^, if any)
  • as the first character of a subexpression (after an initial ^, if any); see BREs Matching Multiple Characters .
^
The circumflex is special when used:
  • as an anchor (see BRE Expression Anchoring )
  • as the first character of a bracket expression (see RE Bracket Expression ).
$
The dollar sign is special when used as an anchor.

Formal rules for bracket expression

Bracket expressions such as [0-9a-zA-Z], [^0-9a-zA-Z], or [0-9a-zA-Z.?*+-] are kind of different from normal expressions. One of the most important differences is metacharacters or special characters. Including that, more formal detailed description for bracket expression can be found in the following

Capturing, Grouping and Backreferences

NOT operator in Regex

Nested pairs search

Lookaround : lookahead and lookbehind

Greedy, Reluctant, or Possessive Quantifiers

Community content is available under CC-BY-SA unless otherwise noted.