Glossary

API

Acronym for Application Programming Interface: a software intermediary that allows two applications to talk to each other. In the context of Sparque Desk, it means the set of API endpoints that make up a service. For a typical search application there may be the following endpoints: an autocomplete, a search, a type-facet, an author-facet and a recommender endpoint. This set works as a whole; the service isn't complete if one of the endpoints in the set is not available. An active API is tied to a specific version of the data: each endpoint will yield information from the same version of the data (all endpoints will upgrade simultaneously).

See: Docs > APIs

API Endpoint

A component of an API that forms a touchpoint in the communication with other systems. In Sparque Desk, API endpoints are tied to search strategies to expose their functionality to the outside world.

An API endpoint is part of a Sparque request URL:

https://rest.sparque.ai/3.0/{WORKSPACE}/api/{API}/e/{ENDPOINT}/p/{PARAMETER}/{VALUE}/results

In the example below the 'search' endpoint of the 'movies' API is requested to return results for the query 'pulp fiction':

https://rest.sparque.ai/3.0/fliques/api/movies/e/search/p/query/pulp%20fiction/results

See: Docs > APIs > Endpoints

API Template

A pattern an API Endpoint or a search strategy conforms to. It denotes the parameter(s) it expects and the type of output it returns.

It can be found in the overview of endpoints within an API:


API template

and in the toolbar of the strategy editor:


API template

In this case, the endpoint/strategy expects the parameter 'query' of type STRING and returns results of type OBJ, an object from the database.

See: Docs > APIs, Docs > Strategies > Defining the API template

Attribute

In context of a graph: a triple specifying a literal value for a property (not a relation between 2 nodes).

Block

A constituent part of a search strategy. Each block performs an operation on the knowledge graph: filtering, ranking, transforming, combining, matching, etc. By combining blocks, operations of a higher complexity can be defined. Blocks are combined by connecting the output connector of one block to the input connector of another block.

Under the hood, a block contains a SpinQL snippet that defines how the probabilistic graph database should be queried in order to achieve the desired result.

Block Group

A set of blocks dedicated to a specific subtask within a search strategy.

Class

The specification of a concept.

Column Store (Database)

A type of database that stores data in a column-oriented model, making it highly scalable and fast to load and query. This makes it ideal for powering search engines.

Compiler

Depending on the context it can mean a block > spinql compiler, a spinql > sql compiler or a sql > mal compiler. Usually we mean the _spinql > sql compiler, which determines how to execute a search strategy and what to add to the cache to make execution most efficient.

Concept

A mental representation of a category of objects in a certain domain.

Connector

A small circle at the top of a block (input connector) or at the bottom of a block (output connector) that allows blocks of matching data types to be connected to each other.


Connector

The output connector of a Datasets block being connected to the input connector of a Filter by Class block

See: Connecting Building Blocks

Container

Describes how to access external data that needs to be loaded. May be flat files (like CSV, XML or JSON, XLSX), but also allows access to remote data (for example JDBC) or compressed contents.

Data Schema

The description of how data is organized in a database. In the case of a relational database it defines, for example the tables, their fields and the relations between them. See: Knowledge Graph

Data Silo

A data system that is incompatible or not integrated with other data systems. Data silos are created within organizations when different applications are used to conduct day-to-day business, each of which stores data in application-specific, proprietary formats. As a result, data is not adequately shared and remains trapped in a container like grain in a silo.

Data Source

Data in its raw form. Obtained either through files (like CSV, XML, JSON or XLSX) or through protocols (HTTP, JDBC).

Domain

A specific subject area for which an application is intended.

Entity

The instantiation of a class. In other words, an object within the domain, such as a document, a fact, or a product.

Graph

A (data) structure consisting of nodes (or entities / items / objects / points / resources / vertices) and edges (or attributes / links / lines / relations) between them.

Graph Database

A database that uses graph structures like nodes, edges, and properties to represent and store data. See: triplestore

Information Need

The type of concept a user needs to complete a (work) task or to satisfy the curiosity of the mind, independent of the method used to address the need, and regardless of whether the need is satisfied or not.

Information Retrieval

The computer science field that studies how information objects are obtained from a collection in response to an information need that is expressed by a user. Results are typically ranked on the relevance of the information object to the expressed information need.

Information Specialist

A person that is responsible for gathering, managing and disseminating data within organizations. Information Specialists have domain knowledge and are well-aware of the information that is available within an organization and its quality. In addition, they have a good understanding of the tasks that users perform with the use of data, of the challenges they encounter in their work and how search might help. Information specialists together with users can best determine which items a search application should return and in what order.

Input Connector

A small circle at the top of a block that indicates what type of data a block can have as input. An input connector can be connected to an output connector of another block if their data types match. Depending on the block, these can be one or more output connectors. If an input connector is connected to an output connector, this results in data entering the block and being transformed by the operation defined for the block.


Input Connector

The input connector of a Filter by Class block

See: Connector, Connecting Building Blocks

Output Connector

A small circle at the bottom of a block that indicates what type of data a block can have as output. An output connector can be connected to an input connector of another block if their data types match. An output connector can be connected to multiple input connectors. If an output connector is connected to an input connector, this results in data flowing to the connected block.


Output Connector

The output connector of a Datasets block

See: Connector, Connecting Building Blocks

Probabilistic Graph Database (PGD)

A database that stores graph data and a probability estimate for every node and every edge. Essentially a triplestore is extended with a probability for every triple. This extension allows for the fusion of selecting and ranking: when selecting structured data from a database, one is presented with certain answers from facts; the probability estimates will be equal to one or zero. When ranking unstructured data, computers will never be certain about the degree of relevance of the result - here, the answer to a query is really a series of probable answers, produced by one ranking algorithm or another. The PGD stores facts with a probability of 1 and ranked results with a probability estimate that expresses their relevance. See: SpinQL

RDF

The Resource Description Framework: a data model that specifies how triples can be stored.

Query Language

A computer language that is used to query a database.

Search Strategy

A combination of blocks in Sparque Desk. A search strategy results in a desired ranking of the entities in a knowledge graph. Under the hood, the constituent SpinQL snippets that make up the blocks define one complex SpinQL query that selects and ranks the desired entities in the probabilistic graph database.

Semi-Structured Data

Data that is described by markup. It doesn't conform to the strict structure of a relational database, but enforces hierarchies of records and fields within the data. Examples are XML and JSON. See: Structured Data, Unstructured Data

SpinQL

The query language Sparque developed to query the probabilistic graph database (PGD). For all operations on the PGD, it has been defined how to compute the resulting probabilities using SpinQL. SpinQL thus combines and propagates probabilities allowing the fusion of selecting and ranking.

SQL

Structured Query Language: the computer language that is used for managing data in a relational database.

Structured Data

Data that is stored in tables (typically in relational databases, spreadsheets, CSV’s etc.). It is often quantitative data, such as numbers and dates. Other examples are phone numbers, zip codes and customer names. See: Semi-Structured Data, Unstructured Data

Test

An instantiation of one or more strategy parameters. If, for example a strategy parameter 'query' is defined, a test could be 'tarantino'. This simulates the case where the user of a movie site would search for 'tarantino'. For a test to be used for a strategy, it must match its API Template.

Triple

A constituent part of a knowledge graph of the form subject, predicate, object. One resource (the subject) is connected to another resource (the object) via a relation (the predicate).

Triplestore

(Or RDF store): a purpose-built database for the storage and retrieval of triples.

Unstructured Data

Data that is stored in its native format. Even though it may have an internal structure, it's not structured in a predefined way. Unstructured data is often qualitative data, such as customer surveys and interviews. Other examples are text files, multimedia content and e-mail. See: Semi-Structured Data, Structured Data

URI

Acronym for Uniform Resource Identifier: a unique sequence of characters that identifies a resource used by web technologies.

XML Fragment

A (relatively small) piece of data (ideally no more than a few megabytes) coming from a container to be processed by the load process. A container specifies the size of the XML Fragment. Often, Containers provide mechanisms to influence the size of an XML Fragment.