System Functions

MSK( )

Scan String for Mask

Formats

1.	Search Subject String for Pattern:	MSK(string$,mask$[,ERR=stmtref])
2.	Return Captured Sub-Pattern ('OM'=0* Only):*	MSK(index[,ERR=stmtref]) (added in PxPlus 2014)

*Where:*
mask$	String containing the pattern/mask definition. If this value is null, then the previously used pattern is reused. String expression.
stmtref	Program line number or statement label to which to transfer control.
string$	String to search. Maximum string size 8KB.
index	Index of the captured sub-pattern to return, with 1-n being the captured sub-patterns, and 0 being the match for the whole pattern.

Returns

Format 1: Integer reporting the starting offset of the matched pattern mask$ in the subject string string$.

Format 2: Integer reporting the starting offset of the captured sub-pattern from the previous MSK( ) Format 1 call.

The MSL system variable and TCB(16) return the length of the string found for both formats.

Format 1

Use the MSK( ) function to scan a string looking for a specific pattern of characters. The values returned are the starting offset and length of the string matching the given mask or pattern. The return value of the MSK( ) function is the offset while the length is returned via the MSL system variable and TCB(16). The pattern defines the mask as a regular expression. The types of regular expressions that are supported are dependent on the 'OM' parameter:

	'OM'=0	(Default) Perl compatible regular expressions (PCRE) are supported. This supports everything below and will match the first match of the pattern unless otherwise specified. This mode supports UTF-8.
	'OM'>1	Mostly POSIX compatible regular expressions are supported. This supports some features below and will match the longest match of the pattern. This mode does *not* support UTF-8.

The following table displays a summary of the supported regular expression syntax:

Note:
This table does not list all allowed syntax for 'OM'=0 regular expressions. Only the ones most often used are listed. See http://manual.pvxplus.com/pcresyntax.html.

Mask Character	Format in Pattern$	Search
^ (Caret)	At the start of regular expression	To find a match with the start of the string being searched
$ (Dollar Sign)	At the end of regular expression	To find a match with the end of the string being searched
. (Period)	Anywhere in the pattern except within square brackets	To find a match with any character
(string)	String of characters (or other codes) enclosed in parentheses	To define a sub-pattern to match If 'OM'=0 and PxPlus 2014 is used, then it also sets up the sub-pattern to be captured.
[string]	String enclosed in square brackets	To find a match with any character in that string
[^string]	Square bracketing combined with a caret ^ as the first character of the string	To find a match with any character except the characters in the string *Example:* [xyz] matches x, y, or z, while [^xyz] matches a, b, c, but not x, y, or z.
[str-ing]	Dash within string in square brackets	To form expressions *Example:* [a-bd-z] to search for a match with any lowercase letter except c
* (Asterisk)	At the end of a character (or sub-pattern)	To search for zero or more occurrences of the character (or sub-pattern) *Example:* In fo, the operates on the o; it matches f, fo, foo, etc. but does not match fa. The expression f(at)* matches f, fat, fatat, fatatat, etc.
+ (Plus Sign)	At the end of a character (or sub-pattern)	To find a match with one or more occurrences of the character (or sub-pattern) *Example:* In fa+, matches fa, faa, faaa, etc. but not f.
{min,max}	At the end of a character (or sub-pattern)	To find a match with at least (min) and at most (max) occurrences of the character (or sub-pattern) *Example:* go{2,4}d matches good, good, or gooood. Note: 'OM'=0 on PxPlus 2014 Only
? (Question Mark)	At the end of a character (or sub-pattern) or following a *, +, or {min,max} metacharacter	Used at the end of a character (or sub-pattern) to indicate that it is optional *Example:* colou?r matches color or colour and sea(horse)? matches sea or seahorse. Used after a , +, or {min,max} metacharacter to indicate that the preceding metacharacter should match the shortest match Example:* l+? matches l but not ll in the string hello world. Note: 'OM'=0 on PxPlus 2014 Only
\| (Vertical Bar)	Separating two expressions	To find a match for either of the two expressions *Example:* cat\|dog matches cat or dog. c(at)\|(dog) matches cat or dog. my ((cat)\|(dog)) matches my cat or my dog.
\ (Backslash)	Preceding a mask character	To indicate that the character that follows is to be taken literally *Example:* To search for multiple asterisks, use \**.

ASCII Character Classes
The following can be used anywhere in the pattern to match common types of characters: Note: This only works for 'OM'=0 on PxPlus 2014.
\a	Alarm; that is, the BEL character (Hex 07)
\cx	"control-x", where x is any ASCII character
\e	Escape (Hex 1B)
\f	Form feed (Hex 0C)
\n	Linefeed (Hex 0A)
\r	Carriage return (Hex 0D)
\t	Tab (Hex 09)
0dd	Character with octal code 0dd
\ddd	Character with octal code ddd or back reference
\o{ddd..}	Character with octal code ddd
\xhh	Character with Hex code hh
\x{hhh..}	Character with Hex code hhh (Non-JavaScript Mode)
\uhhhh	Character with Hex code hhhh (JavaScript Mode Only)
\d	Any decimal digit
\D	Any character that is not a decimal digit
\h	Any horizontal white space character
\H	Any character that is not a horizontal white space character
\s	Any white space character
\S	Any character that is not a white space character
\v	Any vertical white space character
\V	Any character that is not a vertical white space character
\w	Any "word" character
\W	Any "non-word" character
\b	Matches at a word boundary
\B	Matches when not at a word boundary
\A	Matches at the start of the subject
\Z	Matches at the end of the subject; also matches before a new line at the end of the subject
\z	Matches only at the end of the subject
\G	Matches at the first matching position in the subject

ASCII Character Classes
The following can be used as part of any string enclosed in square brackets to match common types of characters (i.e. [[:digit:]%] will match any digit or percent sign character): Note: This only works for 'OM'=0 on PxPlus 2014.
[:alnum:]	Alphanumeric characters
[:alpha:]	Alphabetic characters
[:ascii:]	ASCII characters
[:blank:]	Space and tab
[:cntrl:]	Control characters
[:digit:]	Digits
[:graph:]	Visible characters (i.e. Anything except spaces, control characters, etc.)
[:lower:]	Lowercase letters
[:print:]	Visible characters and spaces (i.e. Anything except control characters, etc.)
[:punct:]	Punctuation and symbols
[:space:]	All white space characters, including line breaks
[:upper:]	Uppercase letters
[:word:]	Word characters (Letters, Numbers and Underscores)
[:xdigit:]	Hexadecimal digits

UTF-8 Character Classes
The following can be used anywhere in the pattern to match common types of characters: Note: This only works for 'OM'=0 on PxPlus 2014.
\p{xx}	A character with the xx property
\P{xx}	A character without the xx property
\X	A Unicode extended grapheme cluster

Where xx can be:

C	Other	No	Other number
Cc	Control	P	Punctuation
Cf	Format	Pc	Connector punctuation
Cn	Unassigned	Pd	Dash punctuation
Co	Private use	Pe	Close punctuation
Cs	Surrogate	Pf	Final punctuation
L	Letter	Pi	Initial punctuation
Ll	Lowercase letter	Po	Other punctuation
Lm	Modifier letter	Ps	Open punctuation
Lo	Other letter	S	Symbol
Lt	Title case letter	Sc	Currency symbol
Lu	Uppercase letter	Sk	Modifier symbol
M	Mark	Sm	Mathematical symbol
Mc	Spacing mark	So	Other symbol
Me	Enclosing mark	Z	Separator
Mn	Non-spacing mark	Zl	Line separator
N	Number	Zp	Paragraph separator
Nd	Decimal number	Zs	Space separator
Nl	Letter number

Format 2

Return the starting offset and length of a captured sub-pattern as specified by index from a previous Format 1 MSK( ) function call or if the index is 0, then return the starting offset and length returned by the previous Format 1 MSK( ) function. If there was no previous Format 1 MSK( ) function call or the pattern did not include the specified sub-pattern, then this call will result in an Error #42: Subscript out of range/Invalid subscript.

If the 'OM' parameter is not equal to 0, then this call will always return an Error #42: Subscript out of range/Invalid subscript if index > 0.

Sub-patterns are parts of a mask/pattern string that are enclosed by parentheses (round brackets), which can be nested. Including a sub-pattern in a mask/pattern string does two things:

It defines the sub-pattern as a group where operators, such as +, will apply to the whole group instead of just the character that preceded it.

It allows Format 2 of the MSK( ) function to return the portion of the matched mask/pattern string that matched the sub-pattern. Opening parentheses are counted from left to right (starting from 1) to obtain indexes for the captured sub-patterns.

Example:

For the string "the small fox" and the pattern "the ((small|large) (raccoon|fox))", the captured sub-patterns are 1: "small fox", 2: "small", 3: "fox".

See http://manual.pvxplus.com/pcrepattern.html.

(Format 2 was added in PxPlus 2014.)

Example

Below is an example of using the MSK( ) function and the MSL variable to do pattern and sub-pattern matching:

?prm('OM')
0
string$="the small fox"
mask$="the ((small|large) (raccoon|fox))"
?msk(string$,mask$),msl
1 13
?msk(0),msl
1 13
?msk(1),msl
5 9
?msk(2),msl
5 5
?msk(3),msl
11 3

Below is another example of using the MSK( ) function to do a more complicated pattern search. In this example, we are matching any whole number in some text. We also use sub-patterns to get just the number from the match without white space:

?prm('OM')
0
string$="99 bottles of beer on the wall."
mask$="(\A|\s)(\d+)(\s|\.|\z)"
?msk(string$,mask$),msl
1 3
?msk(2),msl
1 2

Note:
When the 'TL' system parameter (Thoroughbred® LIKE) is Off, the LIKE operator uses the same pattern matching as specified by the 'OM' parameter.

Thoroughbred® is a registered trademark of Thoroughbred Software International Inc.

System Functions

Formats

Returns

Format 1

Note:
This table does not list all allowed syntax for 'OM'=0 regular expressions. Only the ones most often used are listed. See http://manual.pvxplus.com/pcresyntax.html.

Note:
'OM'=0 on PxPlus 2014 Only

Note:
'OM'=0 on PxPlus 2014 Only

Note:
This only works for 'OM'=0 on PxPlus 2014.

Note:
This only works for 'OM'=0 on PxPlus 2014.

Note:
This only works for 'OM'=0 on PxPlus 2014.

Format 2

See Also

Example

Note:
When the 'TL' system parameter (Thoroughbred® LIKE) is Off, the LIKE operator uses the same pattern matching as specified by the 'OM' parameter.

System Functions

Formats

Returns

Format 1

Note:This table does not list all allowed syntax for 'OM'=0 regular expressions. Only the ones most often used are listed. See http://manual.pvxplus.com/pcresyntax.html.

Note:'OM'=0 on PxPlus 2014 Only

Note:'OM'=0 on PxPlus 2014 Only

Note: This only works for 'OM'=0 on PxPlus 2014.

Note: This only works for 'OM'=0 on PxPlus 2014.

Note: This only works for 'OM'=0 on PxPlus 2014.

Format 2

See Also

Example

Note: When the 'TL' system parameter (Thoroughbred® LIKE) is Off, the LIKE operator uses the same pattern matching as specified by the 'OM' parameter.

Note:
This table does not list all allowed syntax for 'OM'=0 regular expressions. Only the ones most often used are listed. See http://manual.pvxplus.com/pcresyntax.html.

Note:
'OM'=0 on PxPlus 2014 Only

Note:
'OM'=0 on PxPlus 2014 Only

Note:
This only works for 'OM'=0 on PxPlus 2014.

Note:
This only works for 'OM'=0 on PxPlus 2014.

Note:
This only works for 'OM'=0 on PxPlus 2014.

Note:
When the 'TL' system parameter (Thoroughbred® LIKE) is Off, the LIKE operator uses the same pattern matching as specified by the 'OM' parameter.