Before digging into how to craft regex expressions, I want to mention that there are a lot of great reference sites for exploring and practicing how regex works. I use regex101 to test out expressions and for quick reference, but several similar sites exist such as regexr exist as well. And of course, there’s no substitute for trying to build code for practice too.
Regex expressions are denoted by enclosing a pattern between backslashes. Optionally, some languages can add flags after the second backslash that denotes different scopes of pattern matching, but for now let’s focus on a basic expression.
In this example I use boring ol’ strings as our expression, and our search function returns the first index value that matches or -1 if there’s no match. I say ‘boring’ because simple strings like don’t take advantage of any of the tools regex has at its disposal. Notice that the second example still returns 1 since that exact sequence is within the string, whereas the third example returns -1 since ‘hola’ is not contained.
A period within a regex expression can be substituted for any character. If you need to indicate a literal period within an expression then you can “escape” into it by placing a forward slash before it.
This regex expression is using a set. Sets search a single position in the sequence for all the characters contained within the set. In this case, the expression is looking for a string whose first position is either ‘b’, ‘c’, or ‘r’ followed by the characters ‘a’ and ‘t’. Notice that only the fourth test fails to match. Sets combine with many of regex’s other tools, allowing for sets containing special characters and ranges as well as ways to search for more than one match of the same set.
/[a-z]/-These sets include ranges. They’ll match with any character between the hyphenated values, including the two values. In this example, the regex represents a pattern that is 3 characters in length. The first one can be any capital letter, the second can be any lower case letter, and the third can be any digit. These ranges can also be combined in a single set for more versatility. Now let’s look at how to indicate multiples.
The ?, *, and + symbols denote that the preceding character a varying amount of times. The ‘?’ matches the preceding character 0–1 times, the ‘*’ matches 0 or more, and the ‘+’ matches 1 or more times. Notice that in the regex1 example the ‘?’ in the expression only matched the first ‘s’ in ‘catsss’, while the ‘+’ in the third expression did not match with ‘cat’
Lastly, wildcards are catch-all shorthands for other kinds of characters and function very similarly to sets. The most common ones are ‘\d’ for all digits, ‘\w’ for all word characters (which includes letters and numbers but excludes white space and punctuation), and ‘\s’ for all white space characters such as a space, a tab, or a newline.
In the first example above, I use an expression that looks for one or more digits, followed by a ‘-’, followed by one or more digits again, another ‘-’, and then more digits. This kind of expression can be effective at finding phone numbers within strings.
The second expression looks for one or more word characters, followed by a ‘.’, followed by one or more word characters again. In this case, it matches only the url in the string. In the next example, the same string is then split into an array at each instance of white space by the ‘\s’ expression.
That wraps up my overview regex basics. There’s a lot that I didn’t cover, but these tools cover a lot of functionality. I encourage you to look more into the subject to uncover more techniques. Happy pattern matching!