Stem

Description

Extracts the stem (available for various languages) from all strings in a [OBJ,STRING] input. Strings are expected to be single words (see Tokenize block).

Input

  • SOURCE [OBJ,STRING]: a 2-column input with an object-string pair. Typically obtained with the Extract string and Tokenize blocks.

Output

  • RESULT [OBJ,STRING]: the pairs from SOURCE, where the string has been modified
  • STRINGS [STRING]: the modified strings, without the object they were paired to

Parameters

  • Stemming: strings (single words) can be stemmed for a specific language or left as they are

Output scores can be aggregated and/or normalized.