What is RegEx?

Oğuz Ergül
6 min readNov 11, 2020
Fully Documented Regex Characters

As mention on the website “A regular expression (shortened as regex or regexp also referred to as rational expression) is a sequence of characters that define a search pattern. Usually, such patterns are used by string-searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory.”

1. Character Classes

Character classes are one of the most commonly used features of regular expressions. With a “character class”, also called “character set”, you can tell the regex engine to match only one out of several characters. You can find a word, even if it is misspelled Simply place the characters you want to match between square brackets.

[ ABC ] : Match any character in the set.
[ ^ABC ] : Match any character that is not in the set.
[ ^ABC ] : Match any character that is not in the set.
[ A-Z ] : Matches a character having a character code between the two specified characters inclusive.
[ \s\S ] : A character set that can be used to match any character, including line breaks, without the dot all flag (s).
\w : Matches any word character
\W : Matches any character that is not a word character
\D : Matches any character that is not a digit character (0–9).
\d : Matches any digit character (0–9).
\s : Matches any whitespace character (spaces, tabs, line breaks).
\S : Matches any character that is not a whitespace character (spaces, tabs, line breaks).
\p{L} : Matches a character in the specified Unicode category. For example, \p{Ll} will match any lowercase letter.
\P{L} : Matches any character that is not in the specified Unicode category.
\p{han} : Matches any character in the specified Unicode script. For example, \p{Arabic} will match characters in the Arabic script.
\p{Han} : Matches any character that is not in the specified Unicode script.

2. Anchors

Anchors are a different breed. They do not match any character at all. Instead, they match a position before, after, or between characters. They can be used to “anchor” the regex match at a certain position.

$ : Matches the end of the string, or the end of a line if the multiline flag (m) is enabled. This matches a position, not a character.
\b : Matches a word boundary position between a word character and a non-word character or position (start/end of the string). See the word character class (w) for more info.
\B : Matches any position that is not a word boundary. This matches a position, not a character.
^ : Matches any character that is not in the specified Unicode script.

3. Escaped Characters

The backslash in a regular expression precedes a literal character. You also escape certain letters that represent common character classes, such as \w for a word character or \s for space.

\+ : The following character have special meaning, and should be preceded by a \ (backslash) to represent a literal character:+*?^$\.[]{}()|/Within a character set, only \, -, and ] need to be escaped.
\000 : Octal escaped character in the form \000. Value must be less than 255 (\377).
\xFF : Hexadecimal escaped character in the form \xFF.
\uFFFF : Unicode escaped character in the form \uFFFF
\u{FFFF} : Unicode escaped character in the form \u{FFFF}. Supports a full range of unicode point escapes with any number of hex digits.
\cI: Escaped control character in the form \cZ. This can range from \cA (SOH, char code 1) to \cZ (SUB, char code 26).
\t : Matches a TAB character (char code 9).
\n: Matches a LINE FEED character (char code 10).
\v : Matches a VERTICAL TAB character (char code 11).
\f : Matches a FORM FEED character (char code 12).
\r : Matches a CARRIAGE RETURN character (char code 13).
\0 : Matches a NULL character (char code 0).

4. Groups & References

Capture groups and back-references are some of the more fun features of regular expressions. You place a sub-expression in parentheses, you access the capture with \1 or $1… When a pattern groups all or part of its content into a pair of parentheses, it captures that content and stores it temporarily in memory. You can reuse that content if you wish by using a backreference, in the form.

(ABC) : Groups multiple tokens together and creates a capture group for extracting a substring or using a backreference.
(?<name>ABC) : Creates a capturing group that can be referenced via the specified name.
\1: Matches the results of a capture group. For example \1 matches the results of the first capture group & \3 matches the third.
(?:ABC): Groups multiple tokens together without creating a capture group.

5. Look around

(?=ABC) : Matches a group after the main expression without including it in the result.
(?!ABC) : Specifies a group that can not match after the main expression (if it matches, the result is discarded).
(?<=ABC) : Matches a group before the main expression without including it in the result.
(?<!ABC): Specifies a group that can not match before the main expression (if it matches, the result is discarded).

6. Quantifiers & Alternation

Alternation constructs modify a regular expression to enable either/or conditional matching. .NET supports three alternation constructs:

+ : Quantifiers indicate that the preceding token must be matched a certain number of times. By default, quantifiers are greedy, and will match as many characters as possible.
* : Matches 0 or more of the preceding token.
{1,3} : Matches the specified quantity of the previous token. {1,3} will match 1 to 3. {3} will match exactly 3. {3,} will match 3 or more.
? : Matches 0 or 1 of the preceding token, effectively making it optional.
| : Acts like a boolean OR. Matches the expression before or after the |.

7. Substitution

Substitutions are language elements that are recognized only within replacement patterns. They use a regular expression pattern to define all or part of the text that is to replace matched text in the input string. The replacement pattern can consist of one or more substitutions along with literal characters. Replacement patterns are provided to overloads of the Regex

$& : Inserts the matched text.
$1 : Inserts the results of the specified capture group. For example, $3 would insert the third capture group.
$` : Inserts the portion of the source string that precedes the match.
$' : Inserts the portion of the source string that follows the match.
$$ : Inserts a dollar sign character ($).
\n : For convenience, these escaped characters are supported in the Replace string in RegExr: \n, \r, \t, \\, and unicode escapes \uFFFF. This may vary in your deploy environment.

8.Flags

Regular expressions may have flags that affect the search. There are only 6 of them in JavaScript:

i : Makes the whole expression case-insensitive. For example, /aBc/i would match AbC.
g : Retain the index of the last match, allowing subsequent searches to start from the end of the previous match.
m : When the multiline flag is enabled, beginning and end anchors (^ and $) will match the start and end of a line, instead of the start and end of the whole string.Note that patterns such as /^[\s\S]+$/m may return matches that span multiple lines because the anchors will match the start/end of any line.
u : When the unicode flag is enabled, you can use extended unicode escapes in the form \x{FFFFF}.
y : The expression will only match from its lastIndex position and ignores the global (g) flag if set. Because each search in RegExr is discrete, this flag has no further impact on the displayed results.
s : Expression flags change how the expression is interpreted. Flags follow the closing forward slash of the expression (ex. /.+/igm ).

I hope it helps you. Happy Hacking :)

Sources:

--

--