Patterns

From Garry's Mod
Jump to: navigation, search

Contents

What's this article for?

This article is for teaching you how to use Lua's pattern matching language. The pattern matching language (or patterns for short) provides advanced tools for searching and replacing recurring patterns in strings. These tools can be used for writing text data parsers, custom formatters and many other things that would take hundreds of lines of code.

A lot of the theory in this article is either copied or rewritten from the lua reference manual. You can see the manual section on patterns here.

Getting started

An average pattern looks like this:

[%w_]+

That specific pattern could be used for finding variable names (such as "hi_there", "h0w_are_you" etc.). What each character in the pattern does will be explained later in this article.

These functions can be used together with patterns:

I will try to use all of these functions and explain how each of them work in detail.

Special characters

There are a bunch of special characters that either escape other characters, or modify the pattern in some way.

These characters are:

^ $ ( ) % . [ ] * + - ?

They can also be used in the pattern as normal characters by prefixing them with a "%" character, so "%%" becomes "%", "%[" becomes "[", etc.

Character classes

Character classes represent a set of characters. They can be either predefined sets or custom sets that can consist of the same predefined sets, ranges or any single characters.

Available character classes (custom and predefined):

Class Description
. (a dot) represents all characters (will match any character)
%a represents all letters (from a to z upper and lower case)
%c represents all control characters (special characters "\t", "\n", etc.)
%d represents all digits (from 0 to 9)
%l represents all lowercase letters (any letter that is lower case)
%p represents all punctuation characters (".", ",", etc.)
%s represents all space characters (a normal space, tab, etc.)
%u represents all uppercase letters (any letter that is upper case)
%w represents all alphanumeric characters (all letters and numbers)
%x represents all hexadecimal digits (digits 0-9, letters a-f, and letters A-F)
%z represents the character with representation 0 (the null character "\0")
%x (where x is any non-alphanumeric character) represents itself
[s] represents all characters in s as a union. You can see this used in the previous section. [%w_] will match any letter, digit and an underscore
[^s] represents the opposite of the union s, so [^%w_] matches everything that is not a letter, digit or underscore
  • An upper case version of a predefined character set will represent the opposite of that set, so %A will match anything that is not a letter,
  • The starting and ending points of a range are separated with a hyphen "-", so 0-5 will match a digit from zero to five, a-c will match a, b or c.

Repetition and anchoring

Characters in a string match a pattern in the following ways:

  • a single class will match a single character,
  • a single class followed by "+" will match one or more repetitions of characters and will match the longest sequence,
  • a single class followed by "-" will match zero or more repetitions of characters and will match the shortest sequence,
  • a single class followed by "*" will match zero or more repetitions of characters and will match the longest sequence,
  • a single class followed by "?" will match one or zero characters,
  •  %n (where n is a digit between 1 and 9) will match the nth capture (see next section),
  •  %bxy will match strings that start with x and end with y, "%b()" will match a string that starts with "(" and ends with ")".

Patterns can be anchored like so:

  • starting the pattern with "^" will match a string at the beginning,
  • ending the pattern with "$" will match a string at the end,
  • not anchoring the pattern will match a string at any position.

These two characters only have a meaning if positioned as stated above. At any other position, these characters have no meaning and represent themselves.

Captures

Patterns can also contain sub-patterns enclosed in "()". Captures are used in functions like string.match and string.gsub to return or substitute a specific match from the pattern. Examples on how to use these can be found below.

Usage

Now I'm going to show you how to actually use all that stuff above. The examples below explain how to use the four functions listed above.

string.find

string.find( string str, string pattern [, number start [, boolean plain ]] )

str is the string to search, pattern is the pattern string to find, start is the start index and plain is a boolean indicating whether to use a pattern search or just plain text search. The function returns the start and end indices (not start index and length) of the matching substring. If the pattern has captures, they will be returned after the indices. If a match couldn't be found, the function returns nil.

The following code will find the first word in the string.

local str = "1. Don't spam!"
local pattern = "([%a']+)" -- will match a substring that has one or more letter or apostrophes (')
local start, endpos, word = string.find( str, pattern )

print( start, endpos, word )

Output:

4 8 Don't

You probably think that this could be done with string.Explode and a few loops, but look, we did it in three lines.

The following code will check if a string is safe to be used as a file name, by comparing it with a set of restricted characters.

local str = "cry|*to"
local pattern = '[\\/:%*%?"<>|]' -- a set of all restricted characters
local start = string.find( str, pattern )

print( "String is "..( ( start ~= nil ) and "unsafe" or "safe" ) )

Output:

String is unsafe

string.find returns nil if no match is found. This means we can use boolean logic to print "unsafe" if a match is made, and "safe" otherwise.

string.match

string.match( string str, string pattern [, number start] )

str is the string to search, pattern is the pattern to find and start is the start position. If a there is a match, the function return the captures from the pattern, if there are no captures, it will return the whole match. If a match couldn't be found, the function will return nil.

The following code will parse a simple keyvalue line.

local str = "key=  value"
--The following will match "variable name, 0 or more spaces, equals sign, 0 or more spaces, variable name":
local pattern = "([%w_]+)%s*=%s*([%w_]+)"
local k, v = string.match( str, pattern )

print( k, v )

Output:

key value

The following code will check if the string ends with a .lua extension.

local str = "teel.lua"
local pattern = ".+%.lua$" -- anything until a dot and "lua" at the end of the string
local match = string.match( str, pattern )

print( "String ends with "..( ( match ) and ".lua" or "something else" ) )

Output:

String ends with .lua

string.gmatch

string.gmatch( string str, string pattern )

str is the string to search and pattern is the string to search for. The function returns an iterator function (special functions used by loops) that goes through every match in the string and returns the pattern's captures, if there are any, or the whole match if there are no captures. The function will not return nil in the case where a match couldn't be found, but an 'empty' iterator function that will not start a loop.

The following code goes through every word in the string.

local str = "This is PATTERNS"
local pattern = "%w+" -- will match any word

for word in string.gmatch( str, pattern ) do
	
	print( word )
	
end

Output:

This is PATTERNS

Any pattern that you use in string.match can also be used in gmatch, but instead of finding only the first match, it will find every match in the string.

The following code uses the keyvalue parsing pattern but can now read a list of keyvalues.

local str = "key = value key2 =  value2"
local pattern = "([%w_]+)%s*=%s*([%w_]+)" -- same pattern as above

local tbl = { }
for k, v in string.gmatch( str, pattern ) do
	
	tbl[ k ] = v
	
end

PrintTable( tbl )

Output:

key = value key2 = value2

The interesting thing is that the string can have any characters as separators between keyvalue pairs.

string.gsub

string.gsub( string str, string pattern, string/table/function repl )

str is the string to search in, pattern is the pattern to search for and repl is the value to replace with. The function returns str where all occurrences of pattern have been replaced with the value given by repl and, as the second argument, the total number of matches.

repl can be the following things:

  • a string - in which case all occurrence of pattern are replaced with this string, the "%n" item is also supported with a special case of "%0" representing the whole match,
  • a function - in which case the passed function gets called with the match/captures as its argument(s) each time a match occurs, and the match is replaced with the value returned by the function,
  • a table - in which case the value indexed with the first capture (or the match if there are no captures) is returned.

If the function or table returns nil or false, the match gets ignored and nothing gets replaced.

The following code formats a keyvalue pair as an xml node.

local str = "key = value"
local pattern = "([%w_]+)%s*=%s*([%w_]+)"
local replacement = "<%1>%2</%1>"

local output = string.gsub( str, pattern, replacement )

print( output )

Output:

<key>value</key>

The following example creates a function that works like the .NET formatting feature.

function string.format2( fmt, ... )
	
	// 'arg' is the ... combined in a table
	return fmt:gsub( "{(%d+)}", function( i ) return arg[ tostring( i ) + 1 ] end )
	
end

local str = "This is {0}, oh {1}.."
local repl1 = "PATTERNS"
local repl2 = "YEAH"

local output = string.format2( str, repl1, repl2 )

print( output )

Output:

This is PATTERNS, oh YEAH..

Conclusion

The article is finally over! I hope you learned something new from all of this. Lua's patterns are very powerful when used right. When making an addon that heavily relies on strings, patterns will most likely come in handy. You can find some new examples in either the Lua manual or PIL.

Good day!

See also

Personal tools
Navigation