Language Transliteration

Description

This block transliterates language-specific sets of unicode characters to their ASCII representation (e.g., for German, Busch-Jäger becomes Busch-Jaeger). Note that a language-unaware transliteration would turn that into Busch-Jager (ä is in general closest to a).

Note: This applies only to small sets of transliterations that are specific to a language. All language-unaware transliterations, such as ß to ss, or æ to ae are captured by the Normalize Diacritics block.

Input

  • SOURCE [OBJ,STRING]: A 2-column input with an object-string pair. Typically obtained with the Extract Strings block.

Output

  • RESULT [OBJ,STRING]: The pairs from SOURCE, where the string has been modified
  • STRINGS [STRING]: The modified strings, without the object they were paired to

Parameters

  • Language: The language used for transliteration. Currently, this block supports the following languages:
    • German (both lower and upper-case):
      • ä to ae
      • ö to oe
      • ü to ue
    • Swedish (both lower and upper-case):
      • å to aa
      • ä to ae
      • ö to oe
    • Norwegian/Danish (both lower and upper-case):
      • å to aa
      • ø to oe

Output scores can be aggregated and/or normalised.