Filter by RegEx
Description
Selects objects by the value of a string property, using regular expression replacement.
Input
SOURCE [OBJ]: the list of objects to filter
Output
TRUE [OBJ]: the objects for which the selection appliesFALSE [OBJ]: the objects for which the selection does not apply
Parameters
Property: the string property to check. Use*to consider all properties.Use sub-properties: when set totrue, the values of all sub-properties are also included. Sub-properties can be defined in the data with therdfs:subPropertyOfrelation.Pattern RegEx: the regular expression to use for the match.Language: when a language is selected, only the strings in this language are extracted. This uses the language tags that are defined in the data.Case-sensitive: if set tofalse, upper/lower case is ignored
Output scores can be aggregated and/or normalized.
Regular Expressions
Regular expressions are internally evaluated by a PCRE engine. For a syntax reference, see this page. For a 1-page syntax reference, see this cheat-sheet.
Some of the Most Common Questions and Mistakes
- Regular expressions are different from glob patterns using wildcards.
In particular,
*does NOT mean "anything",.*does. - All special characters (
. * + ? | \ ( ) [ ] ^ $) must be escaped (prefixed with\) when they are meant literally, in thePattern RegEx. ^indicates the beginning of an input text, or negation when used inside a multiple choice (e.g.,[^\d-_]).$indicates the end of an input text.\bindicates a word-boundary (spaces, punctuation, etc.).
Examples
- Find names in the form of
Smith, John:Pattern RegEx:\b[^,]+\s*,\s*\b\w+\b
- Find any day of the week (with
Case-sensitive = false):Pattern RegEx:\b(mon|tue|wednes|thurs|fri|sat|sun)day\b