String Fingerprint
Description
Produces a fingerprint of all strings in a [OBJ,STRING] input.
Fingerprint transformation goes as follows:
- lowercase
- asciify (remove accents from letters)
- tokenize
- sort tokens
- put tokens back together
Input
SOURCE [OBJ,STRING]: a 2-column input with an object-string pair. Typically obtained with theExtract stringblock
Output
RESULT [OBJ,STRING]: the pairs fromSOURCE, where the string has been modifiedSTRINGS [STRING]: the modified strings, without the object they were paired to
Parameters
Output scores can be aggregated and/or normalized.