Rank by Boolean Expr BM25
Description
Ranks objects in SOURCE [OBJ,STRING]
according to the relevance score of each STRING
with the expression in QUERY [STRING]
.
The relevance is computed using Okapi BM-25 ranking method.
Inputs
SOURCE [OBJ,STRING]
: a 2-column input with an object-string pair. Typically obtained with theExtract string
block
Outputs
RESULT [OBJ]
: a list of ranked objects
Parameters
Query
: a boolean queryStemming
: tokens can be stemmed for a specific language or left as they areCase-sensitive
: if set tofalse
, upper/lower case is ignoredNormalize diacritics
: transliterates non-ASCII characters into their closest ASCII formTokenization
: the method to tokenize the input strings.None
: perform no tokenizationSpaces
: all valid Unicode space charactersSpaces/Punctuation
:Spaces
+ all valid Unicode punctuation charactersSpaces/Punctuation/Digits
:Spaces/Punctuation
+ all valid Unicode digit charactersSpaces/Punctuation/Digits/Symbols
:Spaces/Punctuation/Digits
+ all valid Unicode symbol charactersCustom Regular Expression
: any regular expression
Min token length
: tokens whose character length is shorter than this value are discardedAll query terms must match
: if set totrue
, only candidates where all tokens inQTERMS
match a string inSOURCE
are considered a matchk1
: controls non-linear term frequency normalisation (saturation). Lower value = quicker saturation (term frequency is more quickly less important)b
: degree of document-length normalisation applied.0
=no normalisation,1
=full normalisation