Rank by Boolean Expr BM25
Description
Ranks objects in SOURCE [OBJ,STRING] according to the relevance score of each STRING with the expression in QUERY [STRING].
The relevance is computed using Okapi BM-25 ranking method.
Inputs
SOURCE [OBJ,STRING]: a 2-column input with an object-string pair. Typically obtained with theExtract stringblock
Outputs
RESULT [OBJ]: a list of ranked objects
Parameters
Query: a boolean queryStemming: tokens can be stemmed for a specific language or left as they areCase-sensitive: if set tofalse, upper/lower case is ignoredNormalize diacritics: transliterates non-ASCII characters into their closest ASCII formTokenization: the method to tokenize the input strings.None: perform no tokenizationSpaces: all valid Unicode space charactersSpaces/Punctuation:Spaces+ all valid Unicode punctuation charactersSpaces/Punctuation/Digits:Spaces/Punctuation+ all valid Unicode digit charactersSpaces/Punctuation/Digits/Symbols:Spaces/Punctuation/Digits+ all valid Unicode symbol charactersCustom Regular Expression: any regular expression
Min token length: tokens whose character length is shorter than this value are discardedAll query terms must match: if set totrue, only candidates where all tokens inQTERMSmatch a string inSOURCEare considered a matchk1: controls non-linear term frequency normalisation (saturation). Lower value = quicker saturation (term frequency is more quickly less important)b: degree of document-length normalisation applied.0=no normalisation,1=full normalisation