Match Strings
Description
Finds matches between the STRING
-columns in the inputs.
Various comparison options can be chosen: equals, contains, startsWith, endsWith or edit-distance.
The result provides both the matching strings, and the strings from both inputs that didn't generate a match.
It is strongly recommended that the inputs are already deduplicated. This block does not do that, and duplicates can increase computation time.
Input
A [STRING]
: a list of candidatesB [STRING]
: a list of candidates
Output
RESULT [STRING,STRING]
: the matched strings fromA
andB
NOTA [STRING]
: the strings from A that did not match with a strings fromB
NOTB [STRING]
: the strings from B that did not match with a strings fromA
Parameters
Comparison
: Comparison function to useequal
: the strings must be equalcontains
: the string inB
must be contained inA
containsWholeWord
: the string inB
must be contained inA
, as a whole word (only punctuation/spaces around)startsWith
: the string inA
must start withB
endsWith
: the string inA
must end withB
prefix
: strings inA
andB
share a prefix of a given lengthlevenshtein
: the string inA
may not have more thanMax edit-distance
differences (character insertions or deletions) withB
.jaro-winkler
: the strings inA
andB
must have a Jaro-Winkler similarity score not smaller thanMin similarity
.
Case-sensitive
: if set tofalse
, upper/lower case is ignoredExclude self-matches
: whether to emit the match if the objects inA
andB
are the same. Mostly useful whenA
andB
come from the same source