Schematron Query Language Binding and XSLT
October 17, 2022
Table of Contents
- 1 Introduction
- 2 Query language binding in general
- 3 Using XSLT keys in Schematron
- 4 Using XSLT functions in Schematron
- 5 Wrap up
1 Introduction
Schematron is one of the XML validation languages. It’s a flexible and elegant language that allows you to specify rules for XML documents and the messages to emit when these rules are broken. An introduction to Schematron can for instance be found in the xml.com article Validating XML with Schematron.
Any Schematron schema contains expressions. For instance for the nodes to match, the conditions to check, to insert values from the document (in messages and several other places). It should come as no surprise that in the vast majority of cases XPath is used as the language for these expressions. But what most people are probably not aware of is that Schematron is actually a container language around the language used for expressions. And that, theoretically, you could use other expression languages inside Schematron.
This concept is called Query Language Binding or QLB. Query Language Binding allows you to specify the embedded programming language used for all expressions. And some bindings, most notably the XSLT ones, also allow you to add specific code constructs, greatly expanding the scope of what you can do.
This article discusses Query Language Binding in general and then elaborates on the capabilities the XSLT type bindings provide you with.
Query Language Binding is just one of many Schematron features. All of them are described in my book about Schematron: Schematron - A language for Validating XML, XML Press, 2022. This article is an excerpt from the chapter on Query Language Binding.
2 Query language binding in general
The Query Language Binding for a Schematron schema is set using a
queryBinding
attribute, containing the name of the binding, on the root
element. For instance, to set the Query Language Binding to xslt3
:
The Schematron standard reserves a number of Query Language Binding names: exslt
, stx
, xslt
, xslt2
, xslt3
, xpath
, xpath2
, xpath3
,
xpath31
, xquery
, xquery3
and xquery31
.
The ones set in bold are defined. The other ones are reserved only, but by their name we can
surmise what was meant.
Despite this seemingly abundant number of bindings, for the most prevailing Schematron
processors the xslt
, xslt2
and xslt3
bindings can be
used only. So let’s focus on those:
The
xslt
Query Language Binding (which is the default if you don’t specify aqueryBinding
attribute) allows you to use XPath 1.0 expressions. Additionally it allows indexes using thexsl:key
element. Applied properly this can make lookups of, for instance, identifiers significantly faster.The
xslt2
Query Language Binding is an extension of thexslt
one. It allows XPath 2.0 expressions. This gives you a lot more options for your expressions and also more standard functions. Results of expressions are no longer limited to strings but can be any data type.An important additional feature is that it allows you to define your own functions in your schema, using
xsl:function
and embedded XSLT 2.0. These functions can then be used in expressions in your schema.The
xslt3
Query Language Binding is an extension of thexslt2
binding. It allows the use of XPath 3.1 expressions and functions expressed in XSLT 3.0.
My advice would be, if your processor supports this, to always set the Query Language
Binding to either xslt2
or, preferably, xslt3
. Not specifying a
binding (by not using a root queryBinding
attribute) means that your limited to
XPath 1.0 for your expressions. Given the current state of technology that’s severely limiting.
3 Using XSLT keys in Schematron
One of the things you can do with an xslt
type Query Language Binding is use
XSLT keys. Let’s explore this.
Referencing in XML documents is often done using identifiers. For instance the following example contains orders that reference items, by identifier:
1 2 3 4 5 6 7 8 9 10 11 | <?xml version="1.0" encoding="UTF-8"?>
<orders>
<item id="bolts" price="5.49">A box with 20 bolts</item>
<item id="nuts" price="3.78">A box with 20 nuts</item>
<!-- … many, many more items… -->
<order>
<ordered-item id-ref="bolts" quantity="5"/>
<ordered-item id-ref="nuts" quantity="10"/>
</order>
<!-- … many, many more orders… -->
</orders>
|
The value of each id-ref
attribute on an
ordered-item
element must contain the identifier of an item
element,
in the same document. A basic version of a Schematron schema that checks this is:
1 2 3 4 5 6 7 8 9 10 11 | <?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt3">
<pattern>
<rule context="ordered-item">
<let name="item-id" value="@id-ref"/>
<assert test="exists(/*/item[@id eq $item-id])">
The referenced item <value-of select="$item-id"/> does not exist
</assert>
</rule>
</pattern>
</schema>
|
The let
element stores the identifier to check in the variable
$item-id
. This is used in the assert
to check whether an
item
element with the same identifier exists. Very straightforward and perfectly
all right.
But what if the document is very large and contains thousands and
thousands of item
elements? Every ordered-item
element causes the schema
processor to search all the item
elements, from top to bottom, again and again.
That’s not very efficient and can take a long time.
A solution to this is creating a key. This is an in-memory data
structure that allows fast lookup of elements by some key index value. XSLT has an instruction
for this, xsl:key
. Using either an xslt2
or xslt3
Query
Language Binding we can use this in Schematron also. The following Schematron schema does the
same as Figure 3, but much more efficient:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <?xml version="1.0" encoding="UTF-8"?>
<!-- 1 - Define the XSLT namespace: -->
<schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" queryBinding="xslt3">
<!-- 2 - Define a key using the XSLT key instruction: -->
<xsl:key name="item-ids" match="/*/item" use="@id"/>
<pattern>
<rule context="ordered-item">
<!-- 3 - Reference the key using the key() function: -->
<assert test="exists(key('item-ids', @id-ref))">
The referenced item <value-of select="@id-ref"/> does not exist
</assert>
</rule>
</pattern>
</schema>
|
To be able to use instructions from XSLT, we need to declare the XSLT namespace. Hence the
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
namespace declaration on the root element. Every element that starts withxsl:
is now considered an XSLT instruction.The XSLT
xsl:key
instruction defines a key. It has three components:The name of the key, in this case
item-ids
.The nodes the key is about, in this case the
/*/item
elements.The value of the key, in this case the identifier of the item, contained in its
id
attribute.
What happens under the hood is that the Schematron processor creates some appropriate data structure that allows fast lookup of
item
elements using the value of theirid
attribute.The
match
attribute of theassert
element uses the (XSLT)key()
function to look up values in the key. This function takes two or three parameters:The name of the key, in this case
item-ids
(as a string, therefore written using quotes, as'item-ids'
).The value to lookup, in this case the
id-ref
attribute of theordered-item
element.The third, optional and unused here, parameter of the
key()
function allows you to limit the returned nodes to a specific part of the document (a “subtree”). This by specifying the root node of the part you’re interested in. Default value is the document node/
.
The
key()
function will perform a fast and efficient lookup and return theitem
element(s) associated with the given identifier. If the identifier is unknown it will return an empty sequence.
A warning before we end this topic: keys don’t come for free. Building a key takes time and you have to weigh this against the time raw lookups take (as done in Figure 3). In general, don’t use keys on small documents. The tipping point is fuzzy. If this is important to you: experiment and measure!
4 Using XSLT functions in Schematron
Separating code using functions is a very normal thing to do when programming. Schematron itself however lacks the ability to define functions. For this it relies on its Query Language Binding feature.
As an example, assume we have some separate reference document that tells us the expected price for something with a certain type. It also contains a default price, as an attribute on the root element, for everything with a type not mentioned otherwise:
We would like to use this reference document in checking documents like the following:
1 2 3 4 5 6 | <things>
<thing name="thing 1" type="A125" price="17.25"/>
<thing name="thing 2" type="A125" price="17.26"/>
<thing name="thing 3" type="X96" price="89.34"/>
<thing name="thing 4" type="Y78" price="10.01"/>
</things>
|
To check a price of a thing in Figure 6, we need to look it up its expected
price in Figure 5, based on its type. If it’s
not mentioned we should use the default price. We could express this as a complicated and
rather long XPath expression directly in Schematron, but it’s much nicer and more
maintainable to define a function for this using XSLT. Using a Query Language Binding of
xslt2
or xslt3
we can do this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | <?xml version="1.0" encoding="UTF-8"?>
<!-- 1 - Define the XSLT namespace on the root element: -->
<schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" queryBinding="xslt3">
<!-- 2 - Define a namespace for the functions as an <ns> element: -->
<ns uri="#functions" prefix="f"/>
<!-- 3 - Define your function using XSLT: -->
<xsl:function name="f:get-price" as="xs:double">
<xsl:param name="type" as="xs:string"/>
<xsl:variable name="prices-document" as="document-node()"
select="doc('type-codes-and-prices.xml')"/>
<xsl:variable name="data-element-for-type" as="element(data)?"
select="$prices-document//data[@type eq $type]"/>
<xsl:choose>
<xsl:when test="exists($data-element-for-type)">
<xsl:sequence select="xs:double($data-element-for-type/@price)"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence
select="xs:double($prices-document/type-codes-and-prices/@default-price)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
<pattern>
<rule context="thing">
<!-- 4 - Use the defined function to get the price: -->
<let name="expected-price" value="f:get-price(@type)"/>
<assert test="$expected-price eq xs:double(@price)">
The price for <value-of select="@name"/> should be
<value-of select="$expected-price"/>
</assert>
</rule>
</pattern>
</schema>
|
We’re going to use XSLT code as part of, embedded in, Schematron. Therefore you have to define the XSLT namespace on the root element (
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
).XPath functions names must be in some namespace. In Schematron you have to define such a namespace as an
ns
element. This allows you to use this namespace in the XPath expressions in the schema. The example namespace (#functions
) and prefix (f
) used here are random examples. You can use anything you like.Define your function(s) using the XSLT programming language. In this example the function is called
f:get-price
.We use the defined
f:get-price()
function to get the expected price from Figure 5 and use this in the assert’s test expression.
5 Wrap up
Query Language Binding allows you, theoretically, to change the language used for expressions in a Schematron schema.
In most cases only the
xslt
,xslt2
andxslt3
bindings are supported.You specify the Query Language Binding of a Schematron schema using the
queryBinding
attribute on the root element of the schema.If you don’t specify this attribute the default value is
xslt
.The
xslt
binding limits you to XPath 1.0 expressions only, which, given the current state of technology, is rather limiting.The
xslt2
andxslt3
bindings allow you to use XSLT keys and functions. These are very useful constructs in more complex schemas.