Mapping Language
Template
As any XSLT stylesheet, we start by declaring the xml header and the xml xsl:stylesheet
element.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:spinque="com.spinque.tools.importStream.EmitterWrapper" xmlns:su="com.spinque.tools.importStream.Utils" extension-element-prefixes="spinque"> <xsl:output method="text" encoding="UTF-8"/> ... </xsl:stylesheet>
We include two Spinque namespaces. By declaring the Spinque namespace we make the methods to generate triples available to our stylesheet.
With the su namespace we make additional utility functions available. Examples are methods to transform strings, such as changing the case, splitting, and normalizing. There are also methods for processing numbers, dates, and person names.
Additional XML Namespaces
In the xsl:stylesheet element, you can also declare additional XML namespaces you need later on. These could be namespaces that are in your data that you want to map to, or namespaces that you want to use in the generated triples.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:spinque="com.spinque.tools.importStream.EmitterWrapper" xmlns:su="com.spinque.tools.importStream.Utils" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai="http://www.openarchives.org/OAI/2.0/" xmlns:schema="https://schema.org/" extension-element-prefixes="spinque">
Templates
In XSLT, you define templates that match with elements in your XML data. In this case, you define templates that match the elements in your virtual XML fragments. For example, to match a row element coming from a CSV container you define a template as follows:
<xsl:template match="row"> ... </xsl:template>
Generating Triples
The key differences between a normal XSLT file and a Sparque data mapping is that we do not write the normal textual output of the XSLT, but instead generate triples. There are two elements to generate a triple. With spinque:attribute
you generate a triple with a literal value. In other words, you define an attribute for an object such as a string value, a date or geographic coordinates.
<spinque:attribute subject="..." attribute="..." value="..." type="..."/>
With spinque:relation
you generate a relation between two objects.
<spinque:relation subject="..." predicate="..." object="..."/>
Sparque automatically creates the required resources when you output an attribute or a relation. When you generate an attribute, this means there is a resource created for the subject and the attribute. When you generate a relation, there are resources created for the subject, predicate and the object.
Given the virtual XML fragment for a row from a CSV file:
<row> <field column="0" name="productID">product12</field> <field column="1" name="productName">Bicycle</field> <field column="2" name="supplier">supplier45</field> </row>
We would generate an attribute for the name and a relation to the supplier:
<xsl:template match="row"> <xsl:variable name="productID" select="field[@name='productID']"/> <xsl:variable name="supplierID" select="field[@name='supplier']"/> <spinque:attribute subject="{$productID}" attribute="name" value="{field[@name='productName']}" type="string"/> <spinque:relation subject="{$productID}" predicate="supplier" object="{$supplierID}"/> </xsl:template>
Note that we put the variables in the spinque:attribute
and spinque:relation
between , e.g., subject="{$productID}"
. For constant values, do not put brackets around them, e.g., predicate="supplier".
In the example, we used XSL variables to store the identifiers of the resources. This is not required, but does make your mapping file more readable.
Creating a Graph
The relation in the example above creates a resource for the supplier. At this time, we do not have any information about the supplier yet, other than the identifier. Now we could add a data source with information about suppliers.
<row> <field column="0" name="supplierID">supplier45</field> <field column="1" name="name">bicycles.com</field> </row>
In the data mapping for this data source, we would generate an attribute with the name for the supplier.
<xsl:template match="row"> <xsl:variable name="supplierID" select="field[@name='supplierID']"/> <spinque:attribute subject="{$supplierID}" attribute="name" value="{field[@name='name']}" type="string"/> </xsl:template>
Because we use the same identifier, the two data sources are automatically integrated in the resulting data graph.
A special relation is the type of a resource. We use http://www.w3.org/1999/02/22-rdf-syntax-ns#type
from the RDF vocabulary for this purpose. Although it is not required to specify a type relation, we strongly advise you to do so. Types are very useful when you model search strategies over your data graph.
<spinque:relation subject="{$subjectID}" predicate="http://www.w3.org/1999/02/22-rdf-syntax-ns#type" object="product"/>
spinque:attribute
The {{<spinque:attribute .../>}}
expects at least 3 XML-attributes:
- subject : the ID of the object to which the attribute should be added
- attribute : the name of the attribute
- value: the actual value
Besides these XML-attributes, it is also possible to specify a few others:
- type : by default a value is a
string
, but it can also be adate
,integer
,double
, orpoint
(geometric object). - p : a probability of how sure it is that the value exists
- flags : these can be multiple space- or comma-separated values.
Currently supported are: --allowEmpty
: to not throw away empty strings) --noNormalizeWhitespace
: to preserve whitespace as it was) exampleflags="allowEmpty, noNormalizeWhitespace"
spinque:relation
The {{<spinque:relation .../>}}
expects at least 3 XML-attributes:
- subject : the ID of the object to which the relation should be added
- predicate : the name of the relation
- object: the ID of the target.
Besides these XML-attributes, it is also possible to specify a few others:
- p : a probability of how sure it is that the relation exists
Debug Statements
Sometimes it is handy to get some intermediate output. The spinque:debug
element allows you to generate output anywhere in your stylesheet. When you run the debugger or the indexer from the Sparque CLI, the debug messages are shown.
<spinque:debug message="this is debug output: {$subjectID}"/>