Notice: Stat is currently in private beta. This documentation is incomplete and subject to change.
Stat Docs
Regular Expressions
A regular expression is a pattern that is used to match parts of a string or binary value.
They can also be used to extract parts of a string or binary value or to test to see if it matches
a pattern. If you've ever programmed in JavaScript, PHP, Python, or any other popular programming language,
then you may be familiar with regular expressions. In Stat, the syntax for a regular expression
is much like that in JavaScript, which itself was inspired by the regular expression syntax in Perl.
Stat implements most of the regular expression functionality available in JavaScript so in most cases
you can just copy a JavaScript regular expression and paste it into Stat and it'll work... probably.
There are a few differences between Stat regular expression syntax and JavaScript regular expression syntax
but for the most part, the syntax is very much the same.
Regular Expression Literals
To create a regular expression literal enclose a pattern in forward slashes
/pattern/ Here is simple example: MAIN // This is how you create a regular expression literal // This statement doesn't do anything // other than create the regular expression literal, but this // is a valid statement, albeit a bit useless /john/
The code above doesn't do anything other than create a regular expression literal, then forgets about it.
You'd be better served by storing that regular expression literal into a variable so that it can be referred to later
or so that you can pass it to a function or a stream. Here's a more useful example:
MAIN let pattern = /john/
Writing Regular Expressions
A regular expression is made up of basic characters and special character sequences that
tell the regular expression engine what to match and how to traverse through the string / binary value
as it searches for matches.
Matching simple characters
Matching simple characters is simple. Just specify the character directly in the regular expression
to match that character. To match a sequence of simple characters just specify that sequence.
MAIN // Matches the word "hello" anywhere in a string or binary value let pattern = /hello/
Regular expression escape characters
There are some characters that won't work as simple characters in a regular expression because they
have special meaning. The regular expression syntax makes heavy use of these characters in order to
perform advanced pattern matches. In order to match these characters, they must be escaped with a back slash
\.
Here is a quick list of the escape characters available: \0- The null character (Not to be confused with the empty value... that's different)\\- A literal back slash. To match a single back slash, prefix it with another back slash.\t- A tab character\f- A form feed character. Otherwise known as the new page character.\v- A vertical tab character.\n- A newline character.\r- A carriage return character.\e- An escape character. Same as\x1bin binary\/- A forward slash character\(- A left parenthesis character\)- A right parenthesis character\{- A left curly brace character\}- A right curly brace character\[- A left bracket character\]- A right bracket character\.- A dot (period) character\|- A vertical pipe character\*- An asterisk (star) character\+- A plus character\?- A question mark character\^- A caret character\$- A dollar sign character
MAIN // Matches a literal dot followed by a literal question mark let pattern = /\.\?/
Matching unicode characters
You can match a unicode character by simply adding the character to the regular expression. Alternatively,
you can use a unicode character escape just like in a string. Here is an example:
MAIN // These both match a watermelon emoji character let pattern = /🍉/ pattern = /\u{1f349}/
Matching arbitrary binary bytes
You can match a binary byte by using a hexadecimal escape byte just like in a binary value. The syntax is simple...
just use
\x followed by the 2 digit hexadecimal value of the byte you want to match which ranges
from \x00 to \xffMAIN // This matches abcn because "n" in hex is 6e let pattern = /abc\x6e/ // Matches abc followed by a binary x98 (152) byte pattern = /abc\x98/
Matching common character classes
The regular expression syntax includes a way to match common character classes like numbers, spaces, and more.
Here is a quick list of all character classes available:
\d- Matches any digit character0-9\D- Matches any non digit character. That is any character that is not0-9\s- Matches any whitespace character including space, tab, vertical tab, return, and new line characters\S- Matches any non whitespace character which is everything except space, tab, vertical tab, return, and new line\w- Matches any word character which includes A-Z, a-z, 0-9 and the underscore character _\W- Matches any non word character which is everything except A-Z, a-z, 0-9, and _
MAIN // Matches a digit character followed by any white space character let pattern = /\d\s/
Matching word boundaries
You can match a word boundary with the following escape:
\b. This escape checks to make sure
that the current position in the string or binary value is a word boundary. A word boundary is where a
word character (a-zA-Z0-9_) is followed by a non word character or visa versa. This also matches
at the start of the string as long as the first character is a word character or at the end of a string if
the last character is a word character. You can also negate the match by using \B which matches
if there isn't a word boundary at the current position. It's important to note that matching a word boundary with
\b or a non word boundary with \B doesn't move the match position. It doesn't actually consume any characters in the value you are searching. Rather, it
matches the boundary itself, not any actual characters. Here are some examples: MAIN // Matches abc but only if the next character is // not a word character. Like abc@. Also matches // abc at the end of the string let pattern = /abc\b/ // Matches abc but only if the next character is // also a word character. Like abcd pattern = /abc\B/
Matching any character
The dot character (
.) in a regular expression matches any character except a new line character by default.
You can make the dot character also match new line characters by providing the "dot all" flag (s) after
the regular expression literal. (More on that below) MAIN // Matches abc followed by any character except a new line character let pattern = /abc./ // Matches abc followed by any character, even a new line character pattern = /abc./s
Matching a specified set of characters
You can specify a character set which matches any single character in that set. The syntax looks like this:
[abc]. It's a set of characters to match enclosed in square brackets. You can also add character ranges within the set by separating two characters by a dash like so:
a-z.
This means any lower case character. 0-9 means any digit character. Note that you don't need to specify
just letters or numbers. You can specify any range in the ASCII character set, for example: !-~ which
matches any printable character. Or 4-d which matches 4-9, plus any upper case character, plus lowercase
a-d plus a bunch of characters with ASCII codes in between: :;<=>?@[\]~_` You can also negate a character set by setting the first character as a caret character
^.
For example, this matches any character except a, b, or c: [^abc]. Check out these examples: MAIN // Matches abc followed by any digit let pattern = /abc[0-9]/ // Matches abc followed by one of x, y, or z pattern = /abc[xyz]/ // Matches abc followed by any character // except a lower case letter pattern = /abc[^a-z]/
Alternate matches
You can specify alternate matches by separating them with the vertical pipe character
|.
The pattern on left side of the pipe is tried first, and if it fails, it tries to match pattern on the right side of the pipe.
You can add as many alternative matches as necessary. Here are some examples: MAIN // Matches abc or def let pattern = /abc|def/ // Matches abd, def, or xyz pattern = /abc|def|xyz/
Regular expression groups
You can enclose parts of a regular expression in parenthesis to specify a capture group. A capture group
can then be referenced later on in the regular expression to match the same sequence of characters that
the capture group matched previously. Capture groups are also returned as part of a match when using the
"match with" operator
<~ or the "match all" operator <<~. (More on that below). Groups can also be used along with alternative matches in order to separate 2 or more alternative matches from
the rest of the match. For example:
MAIN // Matches abc followed by either def or ghi let pattern = /abc(def|ghi)/
In addition to capture groups, you can specify other group types that behave differently. Here is a list of the possible group types
(pattern)- Capture group - can be referenced later and is returned along with the full match when using the "match with" or "match all" operators(?:pattern)- Non-capture group - cannot be referenced later and is not returned(?=pattern)- Look ahead group - looks ahead and attempts to match the pattern, but does not consume any of the value being searched nor is it included in the match(?!pattern)- Negative look ahead group - looks ahead and matches if the pattern doesn't match. It also doesn't consume any of the value being searched nor is it included in the match
Here are some examples:
MAIN // Captures the sequence abc let pattern = /(abc)/ // Matches abc in a group, followed by def, // followed by a 2nd group of ghi pattern = /(abc)def(ghi)/ // Groups can be nested pattern = /(a group that (contains another group) suffix)/ // Matches abc but doesn't store the match pattern = /(?:abc)/ // Matches abc only if it is followed by def pattern = /abc(?=def)/ // Matches abc only if it is not followed by def pattern = /abc(?!def)/
Referencing capture groups using back references
You can reference a previous capture group match by using a back reference which is a back slash followed by a single digit
character which is the 1 based index of the capture group. To reference the first capture group use
\1, to reference
the 2nd capture group, use \2 and so on. Here is a quick example: MAIN // Matches "hello there hello" or "bye there bye" let pattern = /(hello|bye) there \1/ // Matches abc yy or abc zz pattern = /(abc) ([yz])\2/
Regular Expression Quantifiers
Regular expression quantifiers allow you to match more or less than 1 of something. Up until now, we've
only shown examples where you match a single character or sequence of characters. However, if you need to
match say 3 or more of something, then that's where quantifiers come in. To match more or less than
one of something, simply follow any match with the appropriate quantifier. Here is a list of available quantifiers
and how they work:
?- Match 0 or 1 of the previous match+- Match 1 or more of the previous match*- Match 0 or more of the previous match{,n}- Wherenis a number, match between 0 andnof the previous match{n,}- Wherenis a number, match at leastnor more of the previous match{n,m}- Wherenandmare numbers, match at leastnbut no more thanmof the previous match
Here are some examples of using quantifiers
MAIN // Matches ab followed by an optional c let pattern = /abc?/ // Matches ab followed by 1 or more c characters pattern = /abc+/ // Matches ab followed by 0 or more c characters pattern = /abc*/ // Matches ab followed by 1 or more // white-space characters, followed by def pattern = /abc\s+def/ // Matches abc followed by 2 or more groups // of def. So abcdefdef or abcdefdefdef, etc... pattern = /abc(def){2,}/ // Matches abcdddef or abcddddef or abcdddddef pattern = /abcd{3,5}ef/
Regular expression quantifier greediness
By default all regular expression quantifiers are greedy. This means that they will try to match as much
of the string as possible that still satisfies the match. For example, if you are matching the following
string
"abcdddddef" with the following regular expression /abcd{2,}/ then it'll match abcddddd. Notice that it matched as many of the d character as it could. However,
there is another way... we could make our quantifier non-greedy so that it matches as little of the
previous match as possible to still satisfy the match. To make a quantifier non-greedy, append a question mark
character to the quantifier. Here are some examples that outline this behavior: MAIN // Matches abc followed by 2 or more d characters // Since it is greedy, it'll match as many as possible. let pattern = /abcd{2,}/ // This makes the quantifier non-greedy which means // it'll match as little as 2 d characters pattern = /abcd{2,}?/ // Matches abc followed by 0 or 1 d characters // which it'll prefer 0, followed by e pattern = /abcd??e/ // Matches a dash followed by as many characters // as possible followed by another dash. // Given this string: -abcd-efgh- // It'll match the entire string pattern = /-.*-/ // Matches a dash followed by as little characters // as possible followed by another dash. // Given this string: -abcd-efgh- // It'll match -abcd- pattern = /-.*?-/
Anchoring regular expression matches
Anchoring a regular expression means to specify the position in a string or binary value where a match
should begin or end. To anchor a regular expression to the beginning of the target, put a caret character
^ at the beginning of the regular expression. To anchor a regular expression to the end of the target
put a dollar sign character $ at the end of the regular expression. These anchors don't match any characters,
but rather they assert that the match starts at the beginning or ends at the end of the target. Here are some examples: MAIN // Matches abc, but only at the start let pattern = /^abc/ // Matches abc, but only at the end pattern = /abc$/ // Matches only abc, not abcd or 123abc pattern = /^abc$/
You can anchor not only at the start or end of the target, but also at the start or end of lines
as well. To make an anchor match at the start or end of a line, use the
m "multi-line" flag
after the regular expression like so: MAIN // Matches abc, but only at the start // of the string or start of a line let pattern = /^abc/m // Matches abc, but only at the end // of a line or end of the string pattern = /abc$/m // Matches any line in the string // that is abc pattern = /^abc$/m
To match the start of a line but also match a pattern on the previous line, place your anchor at the beginning
of a group. Likewise, to match the end of a line, but also match a pattern on the next line, place your anchor
at the end of a group. Here are some examples:
MAIN // Matches abc followed by 1 or more new // line characters, followed by def at the // beginning of a line let pattern = /abc\n+(^def)/m // Matches abc at the end of a line // followed by 1 or more new line characters, // followed by def pattern = /(abc$)\n+def/m // Matches 123 followed by 1 or more // new line characters, followed by abc on // a single line, followed by 1 or more newline // characters, followed by def pattern = /123\n+(^abc$)\n+def/m
Regular expression flags
There are a few flags that you can add after a regular expression to change its behavior when performing matches.
We've briefly mentioned a few of them above. Here is a list of all the available regular expression flags
i- Case insensitive flag. This flag causes any letter match to be case insensitive. This means that any letter matches both its lower case or uppercase versions.s- Dot all flag. This flag makes the dot character match all characters including new line characters.m- Multi-line flag. This flag makes the start anchor^match at the beginning of a line in addition to the beginning of the string and it makes the end anchor$match at the end of a line in addition to matching at the end of the string.
MAIN // Matches any letter either upper or lower case let pattern = /[a-z]/i // Matches ab, aB, Ab, or AB on any line pattern = /^ab$/mi // Matches literally everything pattern = /^.*$/s
Something to note
In Stat, there is no such thing as theg global match flag like there is in JavaScript and other languages.
When performing matches, the operator determines how many matches are made instead of a flag in the regular expression. Multi-line Regular Expressions
All regular expression literals can be multi-line. There is no special way to code a multi-line regular expression, no "here doc" or triple quotes...
nope, just code it like you'd expect. The only thing to be aware of with multi-line regular expressions is that even they must
follow the correct indentation rules. The indentation rule for multi-line regular expressions is that each subsequent line must
be indented 1 more time from the first line of the regular expression. It's better to just show you some examples:
MAIN // Matches abc followed by a new line character // followed by def followed by another new // line character followed by ghi // abc\ndef\nghi let pattern = /abc def ghi/ // The same thing but with escapes pattern = /abc\ndef\nghi/ // This is here to show how indentation works // with multi-line regular expressions if true // Another regular expression but with // a trailing new line character pattern = /abc def ghi / // The same thing but with escapes pattern = /abc\ndef\nghi\n/ // One more example with indentation pattern = / This is actually the 2nd line This is the 3rd line and it begins with 4 spaces This is the 4th line and does not start with any white-space/
Here's a quick illustration on how multi-line indentation works.

Regular Expression Interpolation
Regular Expression interpolation is a technique used to embed string or binary values into regular expression literals. This makes it easier to assemble complex regular expressions
without having to use concatenation and convert a string or binary value to a regular expression. Stat has a very powerful interpolation interpreter that allows you to not only embed
variables, but complex expressions including function calls and even nested values with their own interpolations.
To embed a value inside a regular expression, use the following syntax:
\{value}. Here are some examples: IMPORTS myFunction MAIN let name = "John" let pattern = /abc \{name} def/ // You can use any value that evaluates to a string or // binary, not just variables. This uses a function call pattern = /abc \{myFunction()} def/ // You can even use other string or binary literals // Obviously, you could just use John in the regular expression itself // but the purpose of this example is to show you what's possible pattern = /Hello \{"John"}/ // You can even nest interpolations like this pattern = /Hello \{name + ". sub \{myFunction()}"}/
Something to note
Regular expression interpolation matches only literal characters. You cannot use regular expression interpolation to build a dynamic regular expression. To build a dynamic regular expression, use a value conversion fromstring or binary to regExp?MAIN let value = ".*" // Matches abc followed by a literal dot // and a literal asterisk character. let pattern = /abc\{value}/
Creating Dynamic Regular Expressions
You can't use a regular expression literal to create a dynamic regular expression. While you can interpolate
string values into your regular expressions, those interpolations are matched literally and any special
characters within those interpolations have no special meaning outside of simple character matching.
In order to create a dynamic regular expression, that is one that is pieced together using other values, you have to
create the pattern as a string, then convert that string to an optional regular expression
regExp? using the to operator. Notice that we aren't converting the string to just a regExp.
The reason for this is that the pattern could be invalid. If I tried to convert this string to a regular expression: "abc[def", that would not work because it's missing a closing bracket. If for some reason, the
conversion didn't work, then empty would be returned. MAIN let all = ".*" let stringPattern = "abc\{all}" // Converts the string to an optional regular expression // This will end up being /abc.*/ let maybeRegExp = stringPattern to: regExp? // You can force the conversion if you're sure that it won't fail // If it does fail, then it'll evaluate to an empty regular expression that matches nothing let forcedRegExp = (stringPattern to: regExp?)!
Testing For Matches
To see if a string or binary value matches a regular expression, you can use the "matches" operator
which looks like this:
~=. The return value is a bool value which will be
either true or false. Here's what the syntax looks like: MAIN let stringVal = "Hello World" let pattern = /l+o\b/ // This will be true let doesMatch = stringVal ~= pattern // You can match against binary values too let binVal = `\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64` // This will also be true doesMatch = binVal ~= pattern
Getting a Match
You can get the first match in a string or binary value by using the "matchOnce" operator which looks like
this:
~>. The return value will be of this type if you match against a string: META type typeDef typeDef: { match: string, range: range, captures: [ { match: string, range: range } ] }?
And it'll be this type if you match against a binary value:
META type typeDef typeDef: { match: binary, range: range, captures: [ { match: binary, range: range } ] }?
Here are some examples of how to use the "matchOnce"
~> operator: MAIN // This will be { // match: "Hello", // range: 1..6, // captures[] // } let match = "Hello World" ~> /\w+/ // This will be { // match: "1.254.987", // range: 1..10, // captures: [{match: "987", range: 7..10}] // } matches = "1.254.987 864.0.11210" ~> /(?:(\d+)\.?)+/
You can swap the order of the value and regular expression as well as the direction of the operator. No matter
how you write it, Stat understands what you mean. Here are some examples:
MAIN // All these statements are the same "Hello World" ~> /\w+/ "Hello World" <~ /\w+/ /\w+/ ~> "Hello World" /\w+/ <~ "Hello World"
Getting a List of All Matches
You can get a list of all matches in a string or binary value by using the "matchAll" operator which looks like
this:
~>>. The return value will be of this type if you match against a string: META type typeDef typeDef: [ { match: string, range: range, captures: [ { match: string, range: range } ] } ]
And it'll be this type if you match against a binary value:
META type typeDef typeDef: [ { match: binary, range: range, captures: [ { match: binary, range: range } ] } ]
Here are some examples of how to use the "matchAll"
~>> operator: MAIN // This will be [ // {match: "Hello", range: 1..6, captures[]} // {match: "World", range: 7..12, captures[]} // ] let matches = "Hello World" ~>> /\w+/ // This will be [ // { // match: "1.254.987", // range: 1..10, // captures: [{match: "987", range: 7..10}] // }, // { // match: "864.0.11210", // range: 11..22, // captures: [{match: "11210", range: 17..22}] // } // ] matches = "1.254.987 864.0.11210" ~>> /(?:(\d+)\.?)+/
You can swap the order of the value and regular expression as well as the direction of the operator. No matter
how you write it, Stat understands what you mean. Here are some examples:
MAIN // All these statements are the same "Hello World" ~>> /\w+/ "Hello World" <<~ /\w+/ /\w+/ ~>> "Hello World" /\w+/ <<~ "Hello World"
Something to note
In Stat, there is no such thing as theg global match flag like there is in JavaScript and other languages.
To get all matches, use the "matchAll" ~>> operator instead. Previous
Next