Regular Expressions

Regular Expressions or "regex" is used to describe a series of characters. If it has a pattern in its formatting, regex can be used.

Regular Expression Syntax

This syntax is for Javascript, but it is mostly what I have used in everything so far.


Shorthand Character Classes

Lookahead and Lookbehind

Lookahead and lookbehind allow search queries that will not be included in the final match. e.g. x(?=y) will return x only if succeeded by y.





\1, where 1 is any incremental number, is called a backreference. The number is referring to the capture group of the current reg ex. When called, the regex will search for the exact instance found of that regex.

For example, /[A-Z][0-9]\1/ will search for any char between A-Z, 0-9, and whatever character was found in the first character group. If you were searching through 'A9A' it would be a success, while 'A9B' would fail. This is because capture group 1 found an 'A', and therefore the backreference (\1) must also find an 'A'.

Common Regex Queries

Character Sets

Regex Matches
[\s\S] Anything
[^\n\r] Anything except for a newline
[A-Za-z0-9] Alphanumeric characters
[A-Za-z] Alphabetic characters
[0-9] or \d Numeric characters
[\.,-\/#!$%\^&\*;:{}=\-_`~()@\+\?><\[\]\+] All punctuation (non alphanum graphic characters)

Does Not Contain


Note that the solution to does not *start with* is generally much more efficient than the solution to does not *contain*.[5]

If you are trying to test that a query doesn't have a specific thing, it's easier to use a negative lookahead (?!) at the start of the query than to try and force a weird negation into the middle.

For instance, if you wanted to check if hede is not in a string, you can use a negative lookahead to test if hede exists from the start of the string. If not, it will continue with the rest of the regex.

^       From the start of the string,
(?!     start a negative lookahead for
  .*    anything, followed by
  hede  'hede'
)       . If the previous pattern isn't found,
.*      find any pattern
$       up to the end of the string.

Not Strict

Another way to do this[4] is by using match groups, though this is a little loose, as it still returns a match, even though we won't mean it like that. Essentially, if a specific match group exists, then it is what we want.

This example will match any file named stromboli with a file extension. However, we only want a file named stromboli that doesn't end in .svg. So we will use match groups as a final filter.

stromboli\.  Look for if pattern 'stromboli.' exists,
(            and if
  svg        'svg'
  |          or
  (.*)       anything else
$            ends the string.

If this is called on this series of strings:


They will all match, but we will only want instances that have a match group that includes $2. If we see that $2 exists in the match groups, then we will have what we are looking for, or vice versa; if $2 does not exist, then we don't want it.



Last modified: 202205021209