Sparque Desk Documentation

Getting Started
Information that will help you to get started with SPARQUE.
Datasets
Information on creating knowledge graphs from data sources, defining mappings with SPARQUE Language, and loading triples into Sparque's graph database.
Strategies
Information on managing strategies using the Strategy Editor, connecting and configuring building blocks, setting API outputs, adding parameters, and creating stacked strategies for complex search tasks.
- Building Blocks
APIs
Information on managing APIs in SPARQUE Desk to connect strategies to your application.
Evaluation
Information on assessing strategy performance in SPARQUE Desk using testsets and evaluations.
Users
Information on managing user permissions and roles.
Settings
Information on managing your account settings.
Query API
Information on using the Query API. Learn about endpoints, faceted search, response types, and request options.
Miscellaneous
Information on Tuple-notation, encoding, error messages and short-hand notation.
Shopware Extension
The SPARQUE Shopware extension allows you to seamlessly integrate our powerful tools into your Shopware store, providing advanced data analytics and insights.
SPARQUE API Wrapper
Information on the SPARQUE API Wrapper, which streamlines interactions between clients, internal users, and SPARQUE Desk.
SPARQUE Hub
Information on SPARQUE Hub, the front-end component designed to simplify the interactions of clients and internal users with SPARQUE Desk.
Glossary
Discover important terms and concepts used throughout SPARQUE Desk.

Home
Strategies
Information on managing strategies using the Strategy Editor, connecting and configuring building blocks, setting API outputs, adding parameters, and creating stacked strategies for complex search tasks.
Building Blocks
Extract
Tokenize [Obj,String]

Getting Started
Information that will help you to get started with SPARQUE.
Datasets
Information on creating knowledge graphs from data sources, defining mappings with SPARQUE Language, and loading triples into Sparque's graph database.
Strategies
Information on managing strategies using the Strategy Editor, connecting and configuring building blocks, setting API outputs, adding parameters, and creating stacked strategies for complex search tasks.
- Building Blocks
APIs
Information on managing APIs in SPARQUE Desk to connect strategies to your application.
Evaluation
Information on assessing strategy performance in SPARQUE Desk using testsets and evaluations.
Users
Information on managing user permissions and roles.
Settings
Information on managing your account settings.
Query API
Information on using the Query API. Learn about endpoints, faceted search, response types, and request options.
Miscellaneous
Information on Tuple-notation, encoding, error messages and short-hand notation.
Shopware Extension
The SPARQUE Shopware extension allows you to seamlessly integrate our powerful tools into your Shopware store, providing advanced data analytics and insights.
SPARQUE API Wrapper
Information on the SPARQUE API Wrapper, which streamlines interactions between clients, internal users, and SPARQUE Desk.
SPARQUE Hub
Information on SPARQUE Hub, the front-end component designed to simplify the interactions of clients and internal users with SPARQUE Desk.
Glossary
Discover important terms and concepts used throughout SPARQUE Desk.

Tokenize [Obj,String]

Description

Separates a string into tokens. The tokenization method can be defined in the parameter (e.g., only tokenize by spaces or use all punctuation).

Input

SOURCE [OBJ,STRING]: a list of object-string pairs. Each string is tokenized.

Output

PAIR [OBJ,STRING]: a result pair contains an object from the input source and a token from the tokenized string. Thus, each token from the string is returned as a separate result pair.
RESULT [STRING]: the extracted tokens. Use the score aggregation parameter to define how occurrences of the same token are handled. Notice that reference to which object each token came from is lost.

Parameters

Tokenization: the method to tokenize the input strings.
- None: perform no tokenization
- Spaces: all valid Unicode space characters
- Spaces/Punctuation: Spaces + all valid Unicode punctuation characters
- Spaces/Punctuation/Digits: Spaces/Punctuation + all valid Unicode digit characters
- Spaces/Punctuation/Digits/Symbols: Spaces/Punctuation/Digits + all valid Unicode symbol characters
- Custom Regular Expression: any regular expression
Min token length: tokens whose character length is shorter than this value are discarded
Gram type:
- Word (default): each token is composed by UTF-8 word n-grams
- Character: each token is composed by UTF-8 character n-grams
Grams: allows to extract n-gram tokens (default is 1)
Stemming: tokens can be stemmed for a specific language or left as they are
Case-sensitive: if set to false, upper/lower case is ignored

Output scores can be aggregated and/or normalized.