Match by RegEx
Description
Finds matches between the STRING
-columns in the inputs, using regular expression matching.
The result provides both the matching items, and the items from both inputs that didn't generate a match.
Input
A [OBJ,STRING]
: a list of candidates, in which theSTRING
-column will be used for comparison and theOBJ
-column will be the resultB [OBJ,STRING]
: a list of candidates, in which theSTRING
-column will be used as a RegEx pattern for comparison and theOBJ
-column will be the result
Output
RESULT [OBJ,OBJ]
: the matched objects fromA
andB
NOTA [OBJ]
: the objects from A that did not match with an item fromB
NOTB [OBJ]
: the objects from B that did not match with an item fromA
Parameters
Case-sensitive
: if set tofalse
, upper/lower case is ignored
Output scores can be aggregated and/or normalized.
Regular Expressions
Regular expressions are internally evaluated by a PCRE engine. For a syntax reference, see this page. For a 1-page syntax reference, see this cheat-sheet.
Some of the Most Common Questions and Mistakes
- Regular expressions are different from glob patterns using wildcards.
In particular,
*
does NOT mean "anything",.*
does. - All special characters (
. * + ? | \ ( ) [ ] ^ $
) must be escaped (prefixed with\
) when they are meant literally, in thePattern RegEx
. ^
indicates the beginning of an input text, or negation when used inside a multiple choice (e.g.,[^\d-_]
).$
indicates the end of an input text.\b
indicates a word-boundary (spaces, punctuation, etc.).
Examples
- Find names in the form of
Smith, John
:Pattern RegEx
:\b[^,]+\s*,\s*\b\w+\b
- Find any day of the week (with
Case-sensitive = false
):Pattern RegEx
:\b(mon|tue|wednes|thurs|fri|sat|sun)day\b