Match by String (blocking)
Description
Finds matches between the STRING
-columns in the inputs.
Various comparison options can be chosen: equals, contains, startsWith, endsWith or edit-distance.
The result provides both the matching items, and the items from both inputs that didn't generate a match.
Input
A [OBJ,STRING]
: a list of candidates, in which theSTRING
-column will be used for comparison and theOBJ
-column will be the resultCandidates [OBJ,OBJ]
: candidate pairs, onlyA
s andB
s that are in Candidates will be matchedB [OBJ,STRING]
: a list of candidates, in which theSTRING
-column will be used for comparison and theOBJ
-column will be the result
Output
RESULT [OBJ,OBJ]
: the matched objects fromA
andB
NOTA [OBJ]
: the objects from A that did not match with an item fromB
NOTB [OBJ]
: the objects from B that did not match with an item fromA
Parameters
Comparison
: Comparison function to useequal
: the strings must be equalcontains
: the string inB
must be contained inA
containsWholeWord
: the string inB
must be contained inA
, as a whole word (only punctuation/spaces around)startsWith
: the string inA
must start withB
endsWith
: the string inA
must end withB
prefix
: strings inA
andB
share a prefix of a given lengthlevenshtein
: the string in A may not have more thanMax edit-distance
differences (character insertions or deletions) with B.jaro-winkler
: the strings inA
andB
must have a Jaro-Winkler similarity score not smaller thanMin similarity
.
Case-sensitive
: if set tofalse
, upper/lower case is ignoredExclude self-matches
: whether to emit the match if the objects inA
andB
are the same. Mostly useful whenA
andB
come from the same source