Why You Should Be Using XSLT 3.0
February 14, 2017
Eighteen years ago, the originators of XML specification faced a problem: how to use the new language to generate a book-publishing format. What emerged were two new languages, the first for describing the various functional parts of a publication in XML called the XML Stylesheet Language Formatting Objects (ultimately XSL-FO) and the XML Stylesheet language (XSLT) for transforming XML-formatted content into the XSL-FO language.
XSL-FO is still in use today, though the number of formatting languages in XML has grown beyond the initial scope of FO. Additionally, CSS has been quietly overtaking FO for many simpler document transformations, to the extent that many eBooks (specifically those based upon the ePub standard) are essentially HTML + CSS. However, XSLT has taken its own remarkable trajectory, as people began to realize that the problem of transforming XML transcended just publishing books and covered transforms from any format to any other.
A problem that XSLT adoption has faced comes due to the difficulties in getting older implementations upgraded. Java ships with Xalan. Xalan has not been improved since it was first incorporated into Java back in 2000 and it still uses the very first version of XSLT, standardized in 1999. The Linux based libxslt processor is similar; while it is a good implementation for the Linux platform, it has not been upgraded since it was written in the early 2000s. Since then there have been two more major versions released of the XSLT standard, the first (XSLT 2.0) in 2006, the second (XSLT 3.0) scheduled to be released this year. These versions are backwards compatible, which means that XSLT 1.0 stylesheets written fifteen years ago should still work today in contemporary XSLT engines with little to no modification.
Moreover, swapping out XSLT versions is typically as simple as dropping a more contemporary engine, such as the Saxon processor, into a folder in your Java project and changing a line in a configuration file. Most Java developers could do it in under ten minutes, and there are both open source and commercial versions of these for free up to a fairly modest licensing fee. Upgrading similar systems on Windows (such as Altova's XSLT server or the Quixslt XSLT processor) is usually nearly as easy. There really are very few reasons why you should not upgrade.
The question, of course, is what benefits do you get for that upgrade? There are a number of them, but it's worth going through the key ones to understand why upgrading (preferably to XSLT 3.0) is so worthwhile.
JSON Transformations
In XSLT 3.0, an inbound document can be in JSON, rather than XML. The processor can take that document, use the json-to-xml() function to convert it into a specific known XML format, process that through the templates, then convert the resulting output back into JSON (or can convert it into HTML 5 among other formats).
For instance, the following inbound JSON content
{"employees":{
"jd101":{
"firstname":"Jane",
"surname":"Doe",
"department":"IT",
"manager":"kp102"
},
"kp102":{
"firstname":"Kitty",
"surname":"Pride",
"department":"IT",
"manager":"jh104"
},
"cx103":{
"firstname":"Charles",
"surname":"Xavier",
"department":"Management"
},
"jh104":{
"firstname":"James",
"surname":"Howlett",
"department":"Security",
"manager":"cx103"
}
}}
will get transformed to an internal XML representation through the json-to-xml() function:
<j:map xmlns:j="http://www.w3.org/2013/XSL/json">
<j:map key="employees">
<j:map key="jd101">
<j:string key="firstname">Jane</j:string>
<j:string key="surname">Doe</j:string>
<j:string key="department">IT</j:string>
<j:string key="manager">kp102</j:string>
</j:map>
<j:map key="kp102">
<j:string key="firstname">Kitty</j:string>
<j:string key="surname">Pride</j:string>
<j:string key="department">IT</j:string>
<j:string key="manager">jh104</j:string>
</j:map>
<j:map key="cx103">
<j:string key="firstname">Charles</j:string>
<j:string key="surname">Xavier</j:string>
<j:string key="department">IT</j:string>
</j:map>
<j:map key="jh104">
<j:string key="firstname">James</j:string>
<j:string key="surname">Howlett</j:string>
<j:string key="department">IT</j:string>
<j:string key="manager">cx103</j:string>
</j:map>
</j:map>
</j:map>
Now, suppose that you wanted to map this to a different data structure, such as an array of objects. The templates to do so would look something like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
xmlns:emp="http://www.semanticalllc.com/ns/employees#"
xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:j="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="xs math xd h emp"
version="3.0"
expand-text="yes"
>
<xsl:output method="text" indent="yes" media-type="text/json" omit-xml-declaration="yes"/>
<xsl:variable name="employees-a" select="json-to-xml(/)"/>
<xsl:template match="/">
<xsl:variable name="persons-b">
<xsl:apply-templates select="$employees-a/*"/>
</xsl:variable>
{xml-to-json($persons-b,map{'indent':true()})}
</xsl:template>
<xsl:template match="/j:map">
<j:map>
<j:array key="persons">
<xsl:apply-templates select="j:map[@key='employees']/j:map" mode="employee"/>
</j:array>
</j:map>
</xsl:template>
<xsl:template match="j:map" mode="employee">
<j:map>
<j:string key="id">{@key}</j:string>
<j:string key="fullName">{j:string[@key='firstname']||' '||j:string
[@key='surname']}</j:string>
<j:string key="reverseName">{j:string
[@key='surname']||', '||j:string
[@key='firstname']}</j:string>
<xsl:copy-of select="*[@key=('firstname','surname','department')]"/>
<xsl:if test="j:string [@key='manager']">
<j:string key="reportsTo">{j:string
[@key='manager']/text()}</j:string>
</xsl:if>
</j:map>
</xsl:template>
</xsl:stylesheet>
As a side note, the namespace for the XML-ified verson of the JSON (the namespace referred to by the j: prefix) has changed several times, over the course of the XSLT 3.0 recommendation, so it's probably worth experimenting with the json-to-xml() function to see what namespace the processor currently uses.
This can then be converted back to JSON with the xml-to-json() function, resulting in the following output:
{ "persons" :
[
{ "id" : "jd101",
"fullName" : "Jane Doe",
"reverseName" : "Doe, Jane",
"firstname" : "Jane",
"surname" : "Doe",
"department" : "IT",
"reportsTo" : "kp102" },
{ "id" : "kp102",
"fullName" : "Kitty Pride",
"reverseName" : "Pride, Kitty",
"firstname" : "Kitty",
"surname" : "Pride",
"department" : "IT",
"reportsTo" : "jh104" },
{ "id" : "cx103",
"fullName" : "Charles Xavier",
"reverseName" : "Xavier, Charles",
"firstname" : "Charles",
"surname" : "Xavier",
"department" : "Management" },
{ "id" : "jh104",
"fullName" : "James Howlett",
"reverseName" : "Howlett, James",
"firstname" : "James",
"surname" : "Howlett",
"department" : "IT",
"reportsTo" : "cx103" }
]
}
Attribute and Text Value Templates
One argument that has dogged XSLT from the beginning is its verbosity, along with similar arguments that certain expressions are difficult to manage when the @select attribute is used for evaluating text expressions (specifically with the <xsl:value-of> element). With XSLT 3, an innovation that first appeared in XQuery has made its way into the XSLT language - the use of text value templates (attribute value templates appeared in XSLT2).
The idea here is simple - any time an expression can be evaluated as a string or similar atomic value, the <xsl:value-of> statement can be replaced with braces "{}". Such expressions are called text value templates, or TVTs. For instance, the following generates the full name of an employee from the first and last name, using XSLT 1.0.
<j:string key="fullName"><xsl:value-of
select="fn:concat(j:string[@key='firstname'], ' ', j:string[@key='surname'])"/></j:string>
In XSLT 3.0, this can be rewritten as
<j:string key="fullName">{
j:string[@key='firstname'] || ' ' || j:string[@key='surname']
}</j:string>
Not only does the latter require fewer keystrokes (reducing the verbosity of the language) but it is also easier to follow, especially with the concat operators ("||") replacing the concat() function. This also solves a big problem with XSLT 1 when you had a select expression which needed both single and double quotes within attributes.
Evaluating XPaths Dynamically
A related capability within XSLT 3.0 is the <xsl:evaluate> tag, which evaluates the expression within the @xpath attribute to turn it into an XPath expression, then re-uses the XPath expression itself to select the appropriate nodes.
<xsl:template match="j:map" mode="employee">
<xsl:variable name="prop" select="'j:string'"/>
<xsl:variable name="key" select="'firstname'"/>
<output><xsl:evaluate($prop||"[@key='"||$key||"']") context="."/></output>
</xsl:template>
In this particular case, the expression
$prop||"[@key='"||$key||"']"
gets converted into the XPath string:
j:string[@key="firstname"]
This is then evaluated to retrieve (for the first entry) the value
"Jane"
This can evaluate both individual string or similar atomic values and sequences of nodes. Note that this is less efficient than using static content because it's harder to optimize, but invaluable in those cases where you are building stylesheets that build stylesheets (a surprisingly common design pattern, by the way).
Functions and Types
An XSLT 2.0 addition carried over into XSLT 3.0 is the introduction of functions and typed variables. These provide a huge boost over named templates.
In XSLT 1.0, if you wanted to evaluate a "function" you needed to use a named template:
<xsl:template name="multiple-template">
<xsl:param name="a"/>
<xsl:param name="b"/>
<xsl:value-of select="$a * $b"/>
</xsl:template>
This could only be invoked outside of attributes, so going from:
<multiply a="10" b="20"/>
to
<multiply-result value="200"/>
would look something like:
<xsl:template match="multiply">
<xsl:variable name="result">
<xsl:call-template name="multiply-template">
<xsl:with-param name="a" select="number(@a)"/>
<xsl:with-param name="b" select="number(@b)"/>
<xsl:value-of select="$a * $b"/>
</xsl:call-template>
</xsl:variable>
<multiply-result>
<xsl:attribute name="value">
<xsl:value-of select="$result"/>
</xsl:attribute>
</multiply-result>
</xsl:template>
With XSLT 3 (really XSLT 2), you can simplify this considerably by creating functions and using datatypes:
<xsl:function name="myNS:multiply" as="xs:double">
<xsl:param name="a" as="xs:double"/>
<xsl:param name="b" as="xs:double"/>
{$a * $b}
</xsl:function>
<xsl:template match="multiply">
<multiply-result value="{myNS:multiply(@a,@b)}"/>
</xsl:template>
There are several key insights here. First, functions can be defined in their own namespaces and can be imported as libraries. This is accomplished in the xsl:stylesheet header if not inline. Import functions overrule local functions of the same name, while with <xsl:include> local functions take precendence.
<xsl:stylesheet version="3.0"
extension-element-prefixes="myNS"
xmln:nyMS = "http://www.example.com/ns/myNS#"
>
<xsl:import href="myFunctions.xsl"/>
This can provide portability of functions across platforms in a consistent manner, as you can move from Java to C++ to eventually Javascript without needing to change function code. Extending functions with native code is similarly supported. Functions are invoked anyplace that an XPath expression can be evaluated, and can both accept and return nodes and sequences of nodes. Sequences of nodes (and mixed types) also replace node-sets in XSLT 1.0, giving much more flexibility without the requirement of the node-set() function (which is usually retained for backward compatibility but is now essentially just a pass-through function).
Additionally, with XSLT 2.0/3.0, you can now identify datatypes, giving much more control over both input and output. These are optional rather than required, but are useful in providing functional validation. Similarly, different functional signatures are considered distinct functions, and as such you can validate for certain types but not others:
<xsl:function name="myNS:multiply" as="xs:integer">
<xsl:param name="a" as="xs:integer"/>
<xsl:param name="b" as="xs:integer"/>
{$a * $b}
</xsl:function>
XSLT 3.0 also includes the concept of function packages. For instance, a complex number package may look something like the following:
<xsl:package
name="http://example.org/complex-arithmetic.xsl"
package-version="1.0"
version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://example.org/complex-arithmetic.xsl">
<xsl:function name="f:complex-number"
as="map(xs:integer, xs:double)" visibility="public">
<xsl:param name="real" as="xs:double"/>
<xsl:param name="imaginary" as="xs:double"/>
<xsl:sequence select="map{ 0:$real, 1:$imaginary }"/>
</xsl:function>
<xsl:function name="f:real"
as="xs:double" visibility="public">
<xsl:param name="complex" as="map(xs:integer, xs:double)"/>
<xsl:sequence select="$complex(0)"/>
</xsl:function>
<xsl:function name="f:imag"
as="xs:double" visibility="public">
<xsl:param name="complex" as="map(xs:integer, xs:double)"/>
<xsl:sequence select="$complex(1)"/>
</xsl:function>
<xsl:function name="f:add"
as="map(xs:integer, xs:double)" visibility="public">
<xsl:param name="x" as="map(xs:integer, xs:double)"/>
<xsl:param name="y" as="map(xs:integer, xs:double)"/>
<xsl:sequence select=" f:complex-number( f:real($x) + f:real($y), f:imag($x) + f:imag($y))"/>
</xsl:function>
<xsl:function name="f:multiply"
as="map(xs:integer, xs:integer)" visibility="public">
<xsl:param name="x" as="map(xs:integer, xs:double)"/>
<xsl:param name="y" as="map(xs:integer, xs:double)"/>
<xsl:sequence select=" f:complex-number( f:real($x)*f:real($y) - f:imag($x)*f:imag($y), f:real($x)*f:imag($y) + f:imag($x)*f:real($y))"/>
</xsl:function>
<!-- etc. -->
</xsl:package>
This is pulled in with the <xsl:use-package> element. The advantage that packages have over normal imports is that they provide the ability to maintain different versions and consequently establish version dependency.
One final addition in the world of functions is the incorporation of a <try><catch> block. For instance, a function can make use of such a function to catch divide by zero errors (Note: this is a deliberately simple example).
<xsl:function name="f:divide"
as="map(xs:integer, xs:integer)" visibility="public">
<xsl:param name="x" as="map(xs:integer, xs:double)"/>
<xsl:param name="y" as="map(xs:integer, xs:double)"/>
<xsl:try>
{$x div $y}
<xsl:catch>
<xsl:if test="$y = 0">
<xsl:message>
<div data-code="{$err:code}">Divide by Zero Error</div>
</xsl:message>
</xsl:if>
</xsl:catch>
</xsl:try>
</xsl:function>
The try construction contains a sequence of items to be evaluated. If any of these fail, then the failing item in the sequence generates a message, with the data about the message contained in the $err:* namespace. This extends to functions the kind of exception handling that had largely been the province of templates in 1.0 (with more capabilities).
Extended Function Set, Sequences, Arrays and Maps
The initial function set for XSLT1.0 were the same as XPath 1.0 functions, and were very limited. Minimal math support, no regular expression support, minimal string manipulation capabilities, no support for set (sequence) operations, no support for dates - it's a very bare bones function sets and one reason why many people have the impression that XSLT is underpowered: XSLT 1.0 is underpowered. XSLT 3.0 is not.
The following is a breakdown of all of the functions supported in the XPath 3.0 specification (a recommendation as of April 2014).
- abs acos add-dayTimeDurations add-dayTimeDuration-to-date add-dayTimeDuration-to-dateTime add-dayTimeDuration-to-time add-yearMonthDurations add-yearMonthDuration-to-date add-yearMonthDuration-to-dateTime adjust-dateTime-to-timezone adjust-date-to-timezone adjust-time-to-timezone analyze-string asin atan atan2 available-environment-variables avg
- base64Binary-equal base-uri boolean boolean-equal boolean-greater-than boolean-less-than
- ceiling codepoint-equal codepoints-to-string collection compare concat concatenate contains cos count current-date current-dateTime current-time
- data date-equal date-greater-than date-less-than dateTime dateTime-equal dateTime-greater-than dateTime-less-than day-from-date day-from-dateTime days-from-duration dayTimeDuration-greater-than dayTimeDuration-less-than deep-equal default-collation distinct-values divide-dayTimeDuration divide-dayTimeDuration-by-dayTimeDuration divide-yearMonthDuration divide-yearMonthDuration-by-yearMonthDuration doc doc-available document-uri duration-equal
- element-with-id empty encode-for-uri ends-with environment-variable error escape-html-uri exactly-one except exists exp exp10
- false filter floor fold-left fold-right for-each for-each-pair format-date format-dateTime format-integer format-number format-time function-arity function-lookup function-name
- gDay-equal generate-id gMonthDay-equal gMonth-equal gYear-equal gYearMonth-equal
- has-children head hexBinary-equal hours-from-dateTime hours-from-duration hours-from-time
- id idref implicit-timezone index-of innermost in-scope-prefixes insert-before intersect iri-to-uri is-same-node
- lang last local-name local-name-from-QName log log10 lower-case
- matches max min minutes-from-dateTime minutes-from-duration minutes-from-time month-from-date month-from-dateTime months-from-duration multiply-dayTimeDuration multiply-yearMonthDuration
- name namespace-uri namespace-uri-for-prefix namespace-uri-from-QName nilled node-after node-before node-name normalize-space normalize-unicode not NOTATION-equal number numeric-add numeric-divide numeric-equal numeric-greater-than numeric-integer-divide numeric-less-than numeric-mod numeric-multiply numeric-subtract numeric-unary-minus numeric-unary-plus
- one-or-more outermost
- parse-xml parse-xml-fragment path pi position pow prefix-from-QName
- QName QName-equal
- remove replace resolve-QName resolve-uri reverse root round round-half-to-even
- seconds-from-dateTime seconds-from-duration seconds-from-time serialize sin sqrt starts-with static-base-uri string string-join string-length string-to-codepoints subsequence substring substring-after substring-before subtract-dates subtract-dateTimes subtract-dayTimeDuration-from-date subtract-dayTimeDuration-from-dateTime subtract-dayTimeDuration-from-time subtract-dayTimeDurations subtract-times subtract-yearMonthDuration-from-date subtract-yearMonthDuration-from-dateTime subtract-yearMonthDurations sum
- tail tan time-equal time-greater-than time-less-than timezone-from-date timezone-from-dateTime timezone-from-time to tokenize trace translate true
- union unordered unparsed-text unparsed-text-available unparsed-text-lines upper-case uri-collection
- year-from-date year-from-dateTime yearMonthDuration-greater-than yearMonthDuration-less-than years-from-duration
- zero-or-one
XSLT 3.0 adds a few functions to this list that are specific to the XSLT language, some as traditional functions, some as elements. These include additional support for sorting, grouping, numbering, higher order functions (functions as arguments to other functions), map/reduce capabilities, regular expression analysis and so forth. It also includes support for reading (and writing) XML, text and binary resources under separate threads, giving it much more control for orchestrating processes (this combined with XProc makes XSLT3 a major player in any orchestration system).
XSLT 3.0 also includes support for maps. Maps are analogous to objects in Javascript, making it possible to create entities which contain name-value pairs that can be set and updated dynamically. Such maps are (like all XSLT structures) immutable - a put() operation on a map returns a new map. This is actually in accordance with a growing sentiment in the programming community that mutable programming introduces too many potential side-effects that lead to hard to maintain code.
Similarly, the language also provides support for both sequences (from XSLT 2.0 onward) and arrays (XSLT 3.0). The distinction between the two is subtle: in a sequence, if you add a new sequence to an existing sequence, the result is just another sequence - there are no boundaries of containment. An array is similarly a list of items, but you can have an array of arrays.
<xsl:variable name="sequence" select="('a','b,'c',('d','e',('f'))"/>
{$sequence}
=> ('a','b','c','d','e','f')
{$sequence[2]}
=> 'b' // 1-based
<xsl:variable name="array" select="[[1,2],[3,4],[5,6]]"/>
{$array}
=> [[1,2], [3,4], [5,6]]
{$array[1]} // 0-based
=> [3,4]
{$array[1][0]}
=> 3
This completes the equivalency between JSON and XML within XSLT - with XSLT you can work with all of the structures that either has. It's also worth noting that this makes it possible to use XSLT for certain RDF operations, because RDF can also be represented as either JSON or XML.
Streaming and Performance
One final benefit of the XSLT 3.0 standard - it supports streaming. The real world has moved beyond files - data comes in streams, from activity streams generated by Twitter or Facebook to location streams coming from cell phones to gigabyte sized files that can only be consumed as chunked streams. XSLT 3.0 can be configured to handle streamed content, with some limitations that come from not necessarily knowing completion points until they arrive.
Performance is a little harder to measure - both Xalan and libxslt are relatively basic and consequently haven't been optimized, much over the years. This means that in simple transformations these XSLT 1.0 processors may have a slight edge over XSLT 3.0 processors like Saxon, but for even moderate weight transformations, any real speed benefits disappear because so much post-processing needs to be done. Running XSLT 3.0 in a streaming mode can provide a huge amount of caching, and some processors (notably Saxon) also support full or partial compilation.
Wrap Up
XSLT 3.0 represents a major upgrade of the XSLT 1.0 (and even XSLT 2.0) standards to become a general purpose transformation language for the most common data storage and messaging formats. The language has become integral in publishing pipelines, is increasingly responsible for managing transformations between complex data structures and data mappings, and is accessible from a number of computer languages, including PHP and Ruby (a version of an XSLT 3.0 compliant version of Saxon, Saxon-C, is now available for C/C++ in Linux and shortly for Windows, and will be available via bindings for languages such as PHP).
Thus, even if XML is not part of your normal processing pipeline, XSLT 3.0 is still very much a worthwhile investment to learn and integrate into your own systems.
Author Kurt Cagle has been writing about XSLT from its early days in 1999, and is tickled that 3.0 is on its way.