Regular Expressions

When we want to search for or replace a particular word in some text it is easy to do so using String functions, but when the search string is not exact such as when there might be regional variations, or we want to find a kind of word matching a pattern, we may use Regular Expressions or RegEx for short.

But, this strikes fear into many people’s minds at the very thought since Regular Expressions are deemed very hard to figure out by many people. So I will try to explain them in very easy to understand terms.

A RegEx is a sequence of characters specifying a search pattern to match with text that is scanned from beginning to end where the pattern is tried to match to the underlying text, like if you were looking for a word on the page of a book. It may be as simple as abc to match with any text containing this pattern. This will match anywhere such as at the beginning, middle, and end of the text. Also, it might be matched several times.

In Godot we use the RegEx class where we create a new RegEx object, then compile our search pattern.

var regex = RegEx.new()
regex.compile("abc") # Compile our pattern

Now we may use the search method to get the first match or search_all to get all the matches.

var txt = "abc xyz abcdefg"
var regex = RegEx.new()
regex.compile("abc")

var result = regex.search(txt)
if result:
    print(result.get_string()) # prints abc

result = regex.search_all(txt)
if result:
    print(result) # prints an array of the search matches

We usually want to specify ranges or sets of characters to match such as lowercase letters, we may do this like so: [a-z] This matches a single lowercase letter. We may do the same for numbers [0-9] and upper case letters [A-Z]. There are short-cut codes for these patterns, but as a beginner it’s easier to remember these patterns (for digits the code is \d).

  • [a-z] matches ‘a’ and ‘b’ in “aXFGTb1234”

We may also specify individual characters in a set such as: [xyz0-9] where a single character will be matched with any in this set of characters.

Another useful pattern is to match characters that are not in a set of characters such as: [^x2@] This will match any character that is not x, 2, or @.

  • [^dxg] matches ‘z’, ‘2’, and ‘X’ in “dzxg2Xdg”

We also want to specify how often the pattern should be matched in a row such as once, 2 characters, zero or more etc. We add special symbols after our pattern to specify these.

  • ? The question mark indicates zero or one of the preceding element.
  • * The asterisk indicates zero or more of the preceding element.
  • + The plus indicates one or more of the preceding element.
  • {n} Match the preceding element exactly n times.
  • {n,} Match the preceding element at least n times.
  • {n,m} Match the preceding element between n and m times.

The wildcard . matches any character.

  • dogs? matches “dog” in “dog snacks” and matches “dogs” in “the dogs barked”
  • 10* matches “1” in “312” and matches “100” in “100,000”
  • x+ matches “x” in “dx10” and matches “xxxx” in “0xxxx + 3”
  • 6{2} matches “66” in “666”
  • 6{2,3} matches nothing in “6” but matches “666” in “66666”

To match tabs and newlines we use escape characters such as \t and \n. The backslash is also used to escape any of the special characters such as dot and asterix where we actually want to match these characters e.g. \.

[a-z]+\.com matches website.com in http:\\website.com

By default, matches tend to be greedy capturing the longest character sequence that matches. But to capture the shortest, we should add a non-greedy specifier after our pattern using the question mark symbol.

a+? matches “a” in “aaaaa”

A couple more symbols we can use are to specify from the start of the string with ^ at the start of the pattern, and to the end of the string with $ at the end of the pattern.

  • ^start matches “start” in “start string” but does not match anything in “Go to start”
  • end$ matches “end” in “to the end” but does not match anything in “the end is nigh.”

To extract sub pattern matches, we use braces to group the patterns. Then the result array will contain these sub-pattern matches. Groups are also useful for specifying a list of possibilities separated with the or operator which is a vertical bar |.

  • [a-z]+\.(.+) matches wiki.org and “org” in www.wiki.org in the results array
  • (dog|cat) matches “cat” in “the cat sat on the mat”

Note that dealing with line breaks can be problematical, so it’s a good idea to remove them before applying a RegEx to your text.

Spaces may be entered as actual spaces or by using \s.

You can find various RegEx testers online to try out your RegEx patterns before you commit them to code.

One tip is to download and print out a RegEx Cheat Sheet since there are a lot of directives that I didn’t mention that may be hard to remember if you are not writing them often.

In summary: you may use RegEx to detect if patterns exist in text such as website URLs and telephone numbers, or use it for searching and replacing text patterns.

So this is a quick overview of what I think are the most useful things to know about RegEx and I hope that it helps.

You can read the official Godot RegEx docs here

Comments Forum

More solutions