Match Strings
Description
Finds matches between the STRING-columns in the inputs.
Various comparison options can be chosen: equals, contains, startsWith, endsWith or edit-distance.
The result provides both the matching strings, and the strings from both inputs that didn't generate a match.
It is strongly recommended that the inputs are already deduplicated. This block does not do that, and duplicates can increase computation time.
Input
A [STRING]: a list of candidatesB [STRING]: a list of candidates
Output
RESULT [STRING,STRING]: the matched strings fromAandBNOTA [STRING]: the strings from A that did not match with a strings fromBNOTB [STRING]: the strings from B that did not match with a strings fromA
Parameters
Comparison: Comparison function to useequal: the strings must be equalcontains: the string inBmust be contained inAcontainsWholeWord: the string inBmust be contained inA, as a whole word (only punctuation/spaces around)startsWith: the string inAmust start withBendsWith: the string inAmust end withBprefix: strings inAandBshare a prefix of a given lengthlevenshtein: the string inAmay not have more thanMax edit-distancedifferences (character insertions or deletions) withB.jaro-winkler: the strings inAandBmust have a Jaro-Winkler similarity score not smaller thanMin similarity.
Case-sensitive: if set tofalse, upper/lower case is ignoredExclude self-matches: whether to emit the match if the objects inAandBare the same. Mostly useful whenAandBcome from the same source