Exploiting Regular Expressions

Somdev Sangwan
5 min readFeb 23, 2019

A regular expression (or regex) is basically a search pattern. For example, the expression [cb]atwill match both cat and bat. This isn’t a regex tutorial so if you don’t know much about regex, go through this amazing cheat sheet before reading any further.

Let’s get started with some basics anyway :)

Repetition Operators

+ is a repetition operator and matches repetition of characters, patterns or groups.

ca+t will match caaaat

There’s another repetition operator, *. The only difference between + and * is that + matches one or more while * matches zero or more. To be clear about this,

ca*t will match both caaaat and ct
ca+t will match caaaat but not ct

Greedy & Lazy Matching

If I want to match everything between x and y, I can simply do x.*y where . means anything. This expression will hence match x)dw2rfy without any problem.

Repetition operators are greedy by default. They will try to match as much as possible.

Let’s consider the above example again,

Using the expression x.*y against the string axaayaaya will return xaayaay. One may expect this search to return xaay or xaayaay because both of these fit our condition x<anything here>y but this is where the concept of greedy and lazy matching comes into play. By default, the expression will return the longest possible result i.e. xaayaay but we can make it lazy or return the shortest result by using the ? operator e.g. x.*?y.

Backtracking

Computers are so smart that they are dumb. When you tell it to run x*y against a string xxxxxxxxxxxxxx, we can tell that it’s not going to match because it doesn’t contain a y at the end. But your regex engine doesn’t know that! It will do the following

xxxxxxxxxxxxxx # Didn't match
xxxxxxxxxxxxxx # Let's backtrack and go back
xxxxxxxxxxxxx # Still didn't match
xxxxxxxxxxxxx # Let's backtrack one more time
xxxxxxxxxxxx # Fuck, no luck
xxxxxxxxxxxx # Let's backtrack in hopes of matching
xxxxxxxxxxx # Okay, this isn't funny anymore
xxxxxxxxxxx # Baccccccckttraaaack
>> Let's skip a few stepsxx # No match? WTF?
x # Let's backtrack
x # Nothing to match
# Nothing to backtrack to
Somdev Sangwan

I make stuff, I break stuff and I make stuff that breaks stuff.