Stem
Description
Extracts the stem (available for various languages) from all strings in a [OBJ,STRING]
input.
Strings are expected to be single words (see Tokenize
block).
Input
SOURCE [OBJ,STRING]
: a 2-column input with an object-string pair. Typically obtained with theExtract string
andTokenize
blocks.
Output
RESULT [OBJ,STRING]
: the pairs fromSOURCE
, where the string has been modifiedSTRINGS [STRING]
: the modified strings, without the object they were paired to
Parameters
Stemming
: strings (single words) can be stemmed for a specific language or left as they are
Output scores can be aggregated and/or normalized.