Mapping Language

Template

As any XSLT stylesheet, we start by declaring the xml header and the xml xsl:stylesheet element.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:spinque="com.spinque.tools.importStream.EmitterWrapper"
    xmlns:su="com.spinque.tools.importStream.Utils"
    extension-element-prefixes="spinque">

  <xsl:output method="text" encoding="UTF-8"/>

...

</xsl:stylesheet>

We include two Spinque namespaces. By declaring the Spinque namespace we make the methods to generate triples available to our stylesheet.

With the su namespace we make additional utility functions available. Examples are methods to transform strings, such as changing the case, splitting, and normalizing. There are also methods for processing numbers, dates, and person names.

Additional XML Namespaces

In the xsl:stylesheet element, you can also declare additional XML namespaces you need later on. These could be namespaces that are in your data that you want to map to, or namespaces that you want to use in the generated triples.

<xsl:stylesheet 
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:spinque="com.spinque.tools.importStream.EmitterWrapper"
    xmlns:su="com.spinque.tools.importStream.Utils"

    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:oai="http://www.openarchives.org/OAI/2.0/"
    xmlns:schema="https://schema.org/"

    extension-element-prefixes="spinque">

Templates

In XSLT, you define templates that match with elements in your XML data. In this case, you define templates that match the elements in your virtual XML fragments. For example, to match a row element coming from a CSV container you define a template as follows:

<xsl:template match="row">
  ...
</xsl:template>

Generating Triples

The key differences between a normal XSLT file and a Sparque data mapping is that we do not write the normal textual output of the XSLT, but instead generate triples. There are two elements to generate a triple. With spinque:attribute you generate a triple with a literal value. In other words, you define an attribute for an object such as a string value, a date or geographic coordinates.

<spinque:attribute subject="..." attribute="..." value="..." type="..."/>

With spinque:relation you generate a relation between two objects.

<spinque:relation subject="..." predicate="..." object="..."/>

Sparque automatically creates the required resources when you output an attribute or a relation. When you generate an attribute, this means there is a resource created for the subject and the attribute. When you generate a relation, there are resources created for the subject, predicate and the object.

Given the virtual XML fragment for a row from a CSV file:

<row>
  <field column="0" name="productID">product12</field>
  <field column="1" name="productName">Bicycle</field>
  <field column="2" name="supplier">supplier45</field>
</row>

We would generate an attribute for the name and a relation to the supplier:

<xsl:template match="row">
  <xsl:variable name="productID" select="field[@name='productID']"/>
  <xsl:variable name="supplierID" select="field[@name='supplier']"/>
  <spinque:attribute subject="{$productID}" attribute="name" value="{field[@name='productName']}" type="string"/>
  <spinque:relation subject="{$productID}" predicate="supplier" object="{$supplierID}"/>
</xsl:template>

Note that we put the variables in the spinque:attribute and spinque:relation between , e.g., subject="{$productID}". For constant values, do not put brackets around them, e.g., predicate="supplier".

In the example, we used XSL variables to store the identifiers of the resources. This is not required, but does make your mapping file more readable.

Creating a Graph

The relation in the example above creates a resource for the supplier. At this time, we do not have any information about the supplier yet, other than the identifier. Now we could add a data source with information about suppliers.

<row>
  <field column="0" name="supplierID">supplier45</field>
  <field column="1" name="name">bicycles.com</field>
</row>

In the data mapping for this data source, we would generate an attribute with the name for the supplier.

<xsl:template match="row">
  <xsl:variable name="supplierID" select="field[@name='supplierID']"/>
  <spinque:attribute subject="{$supplierID}" attribute="name" value="{field[@name='name']}" type="string"/>
</xsl:template>

Because we use the same identifier, the two data sources are automatically integrated in the resulting data graph.


Generated data graph

A special relation is the type of a resource. We use http://www.w3.org/1999/02/22-rdf-syntax-ns#type from the RDF vocabulary for this purpose. Although it is not required to specify a type relation, we strongly advise you to do so. Types are very useful when you model search strategies over your data graph.

<spinque:relation subject="{$subjectID}" predicate="http://www.w3.org/1999/02/22-rdf-syntax-ns#type" object="product"/>

spinque:attribute

The {{<spinque:attribute .../>}} expects at least 3 XML-attributes:

  • subject : the ID of the object to which the attribute should be added
  • attribute : the name of the attribute
  • value: the actual value

Besides these XML-attributes, it is also possible to specify a few others:

  • type : by default a value is a string, but it can also be a date, integer, double, or point (geometric object).
  • p : a probability of how sure it is that the value exists
  • flags : these can be multiple space- or comma-separated values.
    Currently supported are: -- allowEmpty : to not throw away empty strings) -- noNormalizeWhitespace : to preserve whitespace as it was) example flags="allowEmpty, noNormalizeWhitespace"

spinque:relation

The {{<spinque:relation .../>}} expects at least 3 XML-attributes:

  • subject : the ID of the object to which the relation should be added
  • predicate : the name of the relation
  • object: the ID of the target.

Besides these XML-attributes, it is also possible to specify a few others:

  • p : a probability of how sure it is that the relation exists

Debug Statements

Sometimes it is handy to get some intermediate output. The spinque:debug element allows you to generate output anywhere in your stylesheet. When you run the debugger or the indexer from the Sparque CLI, the debug messages are shown.

<spinque:debug message="this is debug output: {$subjectID}"/>