Match by RegEx

Description

Finds matches between the STRING-columns in the inputs, using regular expression matching. The result provides both the matching items, and the items from both inputs that didn't generate a match.

Input

  • A [OBJ,STRING]: a list of candidates, in which the STRING-column will be used for comparison and the OBJ-column will be the result
  • B [OBJ,STRING]: a list of candidates, in which the STRING-column will be used as a RegEx pattern for comparison and the OBJ-column will be the result

Output

  • RESULT [OBJ,OBJ]: the matched objects from A and B
  • NOTA [OBJ]: the objects from A that did not match with an item from B
  • NOTB [OBJ]: the objects from B that did not match with an item from A

Parameters

  • Case-sensitive: if set to false, upper/lower case is ignored

Output scores can be aggregated and/or normalized.

Regular Expressions

Regular expressions are internally evaluated by a PCRE engine. For a syntax reference, see this page. For a 1-page syntax reference, see this cheat-sheet.

Some of the Most Common Questions and Mistakes

  • Regular expressions are different from glob patterns using wildcards. In particular, * does NOT mean "anything", .* does.
  • All special characters (. * + ? | \ ( ) [ ] ^ $) must be escaped (prefixed with \) when they are meant literally, in the Pattern RegEx.
  • ^ indicates the beginning of an input text, or negation when used inside a multiple choice (e.g., [^\d-_]). $ indicates the end of an input text.
  • \b indicates a word-boundary (spaces, punctuation, etc.).

Examples

  • Find names in the form of Smith, John:
    • Pattern RegEx: \b[^,]+\s*,\s*\b\w+\b
  • Find any day of the week (with Case-sensitive = false):
    • Pattern RegEx: \b(mon|tue|wednes|thurs|fri|sat|sun)day\b