Match by String (blocking)
Description
Finds matches between the STRING-columns in the inputs.
Various comparison options can be chosen: equals, contains, startsWith, endsWith or edit-distance.
The result provides both the matching items, and the items from both inputs that didn't generate a match.
Input
A [OBJ,STRING]: a list of candidates, in which theSTRING-column will be used for comparison and theOBJ-column will be the resultCandidates [OBJ,OBJ]: candidate pairs, onlyAs andBs that are in Candidates will be matchedB [OBJ,STRING]: a list of candidates, in which theSTRING-column will be used for comparison and theOBJ-column will be the result
Output
RESULT [OBJ,OBJ]: the matched objects fromAandBNOTA [OBJ]: the objects from A that did not match with an item fromBNOTB [OBJ]: the objects from B that did not match with an item fromA
Parameters
Comparison: Comparison function to useequal: the strings must be equalcontains: the string inBmust be contained inAcontainsWholeWord: the string inBmust be contained inA, as a whole word (only punctuation/spaces around)startsWith: the string inAmust start withBendsWith: the string inAmust end withBprefix: strings inAandBshare a prefix of a given lengthlevenshtein: the string in A may not have more thanMax edit-distancedifferences (character insertions or deletions) with B.jaro-winkler: the strings inAandBmust have a Jaro-Winkler similarity score not smaller thanMin similarity.
Case-sensitive: if set tofalse, upper/lower case is ignoredExclude self-matches: whether to emit the match if the objects inAandBare the same. Mostly useful whenAandBcome from the same source