Filter/Match by String
Description
Finds matches between the STRING
-columns in the inputs.
Various comparison options can be chosen: equals, contains, startsWith, endsWith or edit-distance.
Input
A [OBJ,STRING]
: a list of candidates, in which theSTRING
-column will be used for comparison and theOBJ
-column will be the resultB [STRING]
: a list of candidate strings, to be used for comparison
Output
FILTER [OBJ,STRING]
: the filtered[OBJ,STRING]
fromA
MATCH [OBJ,STRING,OBJ]
: the matched[OBJ,STRING]
fromA
and[STRING]
fromB
Parameters
Comparison
: Comparison function to useequal
: the strings must be equalcontains
: the string inB
must be contained inA
containsWholeWord
: the string inB
must be contained inA
, as a whole word (only punctuation/spaces around)startsWith
: the string inA
must start withB
endsWith
: the string inA
must end withB
levenshtein
: the string inA
may not have more thanMax edit-distance
differences (character insertions or deletions) withB
. The distance does not affect the score of the match.jaro-winkler
: the strings inA
andB
must have a Jaro-Winkler similarity score not smaller thanMin similarity
. The distance does not affect the score of the match.
Case-sensitive
: if set tofalse
, upper/lower case is ignored
Output scores can be aggregated and/or normalized.