Language Transliteration
Description
This block transliterates language-specific sets of unicode characters to their ASCII representation (e.g., for German, Busch-Jäger becomes Busch-Jaeger).
Note that a language-unaware transliteration would turn that into Busch-Jager (ä is in general closest to a).
Note: This applies only to small sets of transliterations that are specific to a language. All language-unaware transliterations, such as
ßtoss, orætoaeare captured by the Normalize Diacritics block.
Input
SOURCE [OBJ,STRING]: A 2-column input with an object-string pair. Typically obtained with the Extract Strings block.
Output
RESULT [OBJ,STRING]: The pairs from SOURCE, where the string has been modifiedSTRINGS [STRING]: The modified strings, without the object they were paired to
Parameters
Language: The language used for transliteration. Currently, this block supports the following languages:- German (both lower and upper-case):
ätoaeötooeütoue
- Swedish (both lower and upper-case):
åtoaaätoaeötooe
- Norwegian/Danish (both lower and upper-case):
åtoaaøtooe
- German (both lower and upper-case):
Output scores can be aggregated and/or normalised.