String Fingerprint

Description

Produces a fingerprint of all strings in a [OBJ,STRING] input. Fingerprint transformation goes as follows:

  • lowercase
  • asciify (remove accents from letters)
  • tokenize
  • sort tokens
  • put tokens back together

Input

  • SOURCE [OBJ,STRING]: a 2-column input with an object-string pair. Typically obtained with the Extract string block

Output

  • RESULT [OBJ,STRING]: the pairs from SOURCE, where the string has been modified
  • STRINGS [STRING]: the modified strings, without the object they were paired to

Parameters

Output scores can be aggregated and/or normalized.