Extract Strings

Description

Extracts string values from an object for a specific property. For example, it can be used to extract the title from a document, or the name of a person. The property should be provided as a parameter. With additional blocks, the extracted values can be transformed or used in a retrieval model.

Input

  • SOURCE [OBJ]: a list of objects

Output

  • PAIR [OBJ,STRING]: for each object in the input source, the extracted string value is provided as the second column in [OBJ,STRING]. When multiple values can be extracted for an object, each object-value pair is returned as a separate result.
  • RESULT [STRING]: the extracted values, disjoint from their parent object. Use the score aggregation parameter to define how occurrences of the same value are handled.

Parameters

  • Property: the property to extract the values from. Use * to extract values from all properties
  • Use sub-properties: when set to true, the values of all sub properties are also included. Sub-properties can be defined in the data with the rdfs:subPropertyOf relation.
  • Language: when a language is selected, only the strings in this language are extracted. This uses the language tags that are defined in the data.

Output scores can be aggregated and/or normalized.