Regex Files ep 1

Roman Opalacz
4 min readDec 8, 2020

Have you ever written a function involving strings that you felt could be written so much easier if you could only find an easier way to match specific parts of the string? Or maybe you felt your function could be abstracted a little further if you could just normalize its inputs more. Have you ever wished you had more reasons to use forward slashes? These are all valid reasons to look into regex. I’m going to go over some regex basics using Javascript

Regex is short for regular expression, which is used to define a search pattern. These search patterns can then be used in conjunction with various methods like Javascript’s match, replace, search and split functions to enable very nuanced parsing of strings.

Sometimes it can be tricky untangling what you want out of strings…

Before digging into how to craft regex expressions, I want to mention that there are a lot of great reference sites for exploring and practicing how regex works. I use regex101 to test out expressions and for quick reference, but several similar sites exist such as regexr exist as well. And of course, there’s no substitute for trying to build code for practice too.

Regex expressions are denoted by enclosing a pattern between backslashes. Optionally, some languages can add flags after the second backslash that denotes different scopes of pattern matching, but for now let’s focus on a basic expression.

the search function will return the first index that the regular expression matches

In this example I use boring ol’ strings as our expression, and our search function returns the first index value that matches or -1 if there’s no match. I say ‘boring’ because simple strings like don’t take advantage of any of the tools regex has at its disposal. Notice that the second example still returns 1 since that exact sequence is within the string, whereas the third example returns -1 since ‘hola’ is not contained.

A period within a regex expression can be substituted for any character. If you need to indicate a literal period within an expression then you can “escape” into it by placing a forward slash before it.

The test function will check a string for a regex expression and return true or false

This regex expression is using a set. Sets search a single position in the sequence for all the characters contained within the set. In this case, the expression is looking for a string whose first position is either ‘b’, ‘c’, or ‘r’ followed by the characters ‘a’ and ‘t’. Notice that only the fourth test fails to match. Sets combine with many of regex’s other tools, allowing for sets containing special characters and ranges as well as ways to search for more than one match of the same set.

/[a-z]/-These sets include ranges. They’ll match with any character between the hyphenated values, including the two values. In this example, the regex represents a pattern that is 3 characters in length. The first one can be any capital letter, the second can be any lower case letter, and the third can be any digit. These ranges can also be combined in a single set for more versatility. Now let’s look at how to indicate multiples.

The string function .match can accept a regex expression and will return the first match or all matches if the expression includes the g (global) flag at the end

The ?, *, and + symbols denote that the preceding character a varying amount of times. The ‘?’ matches the preceding character 0–1 times, the ‘*’ matches 0 or more, and the ‘+’ matches 1 or more times. Notice that in the regex1 example the ‘?’ in the expression only matched the first ‘s’ in ‘catsss’, while the ‘+’ in the third expression did not match with ‘cat’

Lastly, wildcards are catch-all shorthands for other kinds of characters and function very similarly to sets. The most common ones are ‘\d’ for all digits, ‘\w’ for all word characters (which includes letters and numbers but excludes white space and punctuation), and ‘\s’ for all white space characters such as a space, a tab, or a newline.

In the first example above, I use an expression that looks for one or more digits, followed by a ‘-’, followed by one or more digits again, another ‘-’, and then more digits. This kind of expression can be effective at finding phone numbers within strings.

The second expression looks for one or more word characters, followed by a ‘.’, followed by one or more word characters again. In this case, it matches only the url in the string. In the next example, the same string is then split into an array at each instance of white space by the ‘\s’ expression.

That wraps up my overview regex basics. There’s a lot that I didn’t cover, but these tools cover a lot of functionality. I encourage you to look more into the subject to uncover more techniques. Happy pattern matching!

--

--

Roman Opalacz

Aspiring software engineer with almost no experience applying logic