Regular Expressions

In this guide we are going to learn how to use Regular Expressions (regex) to match any pattern or text. This is often used in text editors, CLI commands(like grep or awk) and programming languages.

What is regex?

Regex are patterns used to match/find strings in a text. You might think that you can use simple built-in search (like Ctrl+F in Notepad) in any text editor for this task, but regex allows for finding not-so-exact matches and simple search can’t do this.

I will give you an example and you will see. now.  Let’s say you have a phonebook in a text file with 1 million  U.S phone numbers in international format (like this +12345678900).

Now you  can’t remember your friend’s number and only remember that he lives in Austin (that is his number should begin with +1512) and his number has two zeros somewhere in the middle.

How would you filter out all phone numbers matching your pattern? You can’t use Ctrl+F in  Notepad because searching for “00” or “+1512” will not find anything useful, because there will be too much matches.  You need to somehow combine your conditions in one search and create a pattern to find matching numbers. This is exactly what regex has been created for. The matching regex pattern meeting our requirements number would be /\+1512.*00.*/. This pattern will match all numbers starting with +1512 and having 00 in the middle.

Now that you know why regex is so useful and popular we will look at different usage examples.

 

Regex reference.

You can use https://regexr.com/ website to try out the following patterns. This site allows you to easily test your regex patterns by automatically highlighting matches in a text area.

Regex pattern are sometimes written between slashes like on this website, so i will give examples of regex patterns also enclosed by slashes.

 

How to match a single character in Regex?

. (dot) is used to match any single character except a linebreak.

Pattern:/1.2/  matches: “112”, “152”, “1 2”, “1a2”, “1T2”

 

How to match a single character from a specific range in Regex?

[] (square brackets) are used when you need to match a character from a specific range.

Pattern:/1[3-7]2/  matches: “132”, “142”, “152”, “162”, “172”

Pattern:/1[a-c]2/  matches: “1a2”, “1b2”, “1c2” , does not match “1A2”

Pattern:/1[A-C]2/  matches: “1A2”, “1B2”, “1C2” , does not match “1a2”

Pattern: /[a-zA-Z]/ matches: “1a2”, “1B2” …

 

How to match a single character from a set of characters in Regex?

[] (square brackets) are used when you need to match any character from a set.

Pattern:/1[37]2/  matches: “132”, “172” , does not match “1372”, “152”

 

How to match a single character that is not in a set in Regex?

[^] (caret in square brackets) is used to negate a set, that is to match a character that is not in a set

Pattern:/1[^37]2/  matches: “102”, “142”, “182”, “1 2”, “1A2” , does not match “132”, “172”

 

How to match any digit character in Regex?

\d is used to match any single digit character. Equivalent pattern is [0-9]

Pattern:/A\dB/  matches: “A1B”, “A2B”, “A3B” and so on

 

How to match a single character that is not a digit?

\D is used to match any single digit character that is not a digit. Equivalent pattern is [^0-9]

Pattern:/A\DB/  matches: “AAB”, “AZB”, “A B”, “A)B”, “A+B”, “A-B” and so on. Does not match “A1B”, “A2B” ..

 

How to match a whitespace character?

\s is used to match a whitespace character

Pattern:/1\s2/  matches: “1 2”

 

How to match a character  that is not a whitespace?

\S is used to match any character that is not a whitespace.

Pattern:/1\S2/  matches: “112”, “1A2”, “1(2”, “1+2”. Does not match “1 2”

 

How to match one or more of the preceding character?

+ (plus) is used to match one or more of the preceding character.

Pattern:/1+2/ Possible matches: “12”, “112”, “1112”, “11112” ans so on

 

How to match zero or more of the preceding character?

* (star) is used to match zero or more of the preceding character.

Pattern:/1A*2/  matches: “12”, “1A2”, “1AA2”, “1AAA2” and so on

 

How to match zero or one of the preceding character?

? (question mark) is used to match zero or one of the preceding character.

Pattern:/1A?2/  matches: “12”, “1A2”. Does not match “1AA2”, “1AAA2”

 

How to match the beginning of the string?

^ (caret) is used to match the beginning of the string.

Pattern:/^12/  matches: “12” when it is in the beginning of the string like in 123456. Does not match “12” in these strings “112”, “012”, “x12”

 

How to match the end of the string?

$ (dollar sign) is used to match the end of the string.

Pattern:/12$/  matches: “12” when it is in the end of the string like in 0012. Does not match “12” in these string “12A”, “126”

 

How to match special characters?

If we need to match a special character like *,?,+,^,[,] etc. we need to escape them by a backslash \ so they act just like a simple character.

Example:

/1A?2/ matches 1A2 and 12, but:

/1A\?2/ matches 1A?2

 

Conclusion.

I hope you found these examples useful. There are some more examples which we have not touched in this guide, but know you enough to construct regex patterns to have the job done in 99% of cases.

Now you can apply these patterns in a programming language like Python or JS, or,  for example , you could use regex in grep commands. There’s a good guide on grep command here.

Thank you for reading!

 

 

 

Tags:

Add a Comment