XProc 3.0 - Connecting steps using ports
January 23, 2020
Introduction
XProc is an XML based programming language for processing documents in pipelines: chaining conversions and other steps together to achieve the desired results. The introductory article on XProc 3.0 can be found here.
This article dives into an important concept in XProc: ports. Ports are the things where documents flow in and out. To use ports effectively you need to know how to declare them and how to connect them to each other. This article explains how to do this for the most common use. If you need more detail, please refer to the specification itself: http://spec.xproc.org/.
What are ports?
As explained in the introductory article about XProc, an XProc program consists of steps. Steps are the building blocks of XProc. A step takes documents as input(s), does something with the data flowing through and produces output(s). Programming XProc is writing steps by chaining other steps.
Ports are the connectors of steps. Documents flow in or out of a step through ports. Ports are the equivalent of the USB connectors on your computer or the network connectors on your router.
For instance, the <p:xslt>
step has two input ports, one for the document and one for the stylesheet. It also has two
output ports: one for the result of the XSLT transformation and one for the optional additional result documents (created with
<xsl:result-document>
):
Ports are defined with <p:input>
and <p:output>
elements. Here are the
(simplified) port declarations of the <p:xslt>
step:
Declaring ports
Now <p:xslt>
is a predefined step and part of XProc's standard step
library. Its ports are declared for you. But programming XProc is writing your own steps, which means you have to be able to
define/declare input and output ports.
Declarations of input and output ports must be stated in the prolog of your pipeline, as direct children of the
<p:declare-step>
and before its body, the steps constituting the pipeline, begins. Input ports are
declared with <p:input>
, output ports with (no surprise) <p:output>
. Here is an example:
The following attributes define properties of ports when declaring them:
port
The mandatory
port
attribute provides a name for the port.By convention the primary input and output ports (see below) are named
source
andresult
.primary
Setting this boolean attribute to
true
makes this port primary. A primary port plays an important role in connecting steps. See the section called “Implicit connections”.There can only be one primary input and one primary output port. If a step declares a single input or single output port, it automatically becomes primary. So in Example 2, “An XProc step divided into a prolog and a body”, both ports are primary.
sequence
This boolean attribute tells the XProc processor how many documents can appear on the port. A value of
false
means exactly one document, a value oftrue
means zero or more. An error will be raised if this restriction is violated.content-types
This attribute restricts the content types (MIME types) of the documents on the port. Its value is a whitespace separated list of strings like
text/plain
orapplication/*+xml
(where the*
of course acts like a wildcard). Adding a minus (-
) in front negates the meaning: not allowed. An error is raised if a document appears on the port that violates the restriction(s).The exact rules for specifying "any XML document" or "any HTML document" are somewhat complicated. To make life easier there are five shortcut values you can use if you need this:
xml
,html
,text
,json
andany
. So, for example, telling a port to accept only XML and HTML documents:content-types="xml html"
.serialization
This attribute is for output ports only. It tells the XProc processor what to do when the document(s) appearing on the port need to be serialized (usually to disk). When no serialization takes place (all port connections inside the pipeline) the attribute is ignored.
The value of the attribute is an XPath
map
. For example to serialize as HTML with indentation on:serialization="map{'method': 'html', 'indent': true()}"
Input port defaults
It's possible to define a default connection for an input port. A typical use case would be some conversion that in most cases uses a standard XSLT stylesheet. But sometimes this standard conversion needs to be overridden. Here 's how you could define such an input port:
If you don't connect anything to the special-stylesheet
port, it will connect to default-stylesheet.xsl
. But if
you do the default will be ignored.
The rules and syntax for defining a default connection are the same as for defining an explicit
port connection. There's one exception: A default port connection cannot be dynamic (cannot depend on things happening at runtime).
Therefore you cannot use <p:pipe>
or the pipe
attribute.
Output port connections
An output port of a pipeline needs to know what to deliver to the outside world. You specify this by connecting it to some output port of one of the steps that make up the pipeline.
For the primary output port this will usually be the last step of your pipeline. In that case you don't have to do anything, the implicit connection mechanism will take care of everything. But if you want to output something else you need to specify this. For example:
The extra
output port will always emit the result of the transformation named transform-step
, even if the
pipeline goes on after this.
Connecting or binding ports
To make pipelines out of steps, you have to be able to connect or bind the ports of the individual steps. Documents must flow into input ports, whether they come from output ports of other steps, from disk, the web or whatever.
Connecting ports for the outermost step
When you invoke the outermost XProc pipeline using an XProc processor, you need to bind its input and output ports to something (usually files on disk). This binding is done through the command line. The same is true for setting the pipeline's options.
Unfortunately I cannot give you much guidance in this: How an XProc processor binds the pipeline's ports and sets options is implementation defined. So: read the processor specific documentation.
Implicit connections
The introduction to XProc article already explained implicit binding. Here 's a rerun:
One input and one output port of a step can be designated as primary port (by convention called source
and result
). Primary ports of steps "auto-connect" when you place steps next to each other in XProc code. This is called
implicit binding.
Not only does implicit binding connects steps, it also connects the primary input and output port of a pipeline to its constituent steps. Here's an example pipeline that chains two XSLT transformations:
Explicit connections
Not all ports are primary ports and even for primary ports the previous step is not always the right one. So XProc provides you with means to explicitly bind ports. Explicit port binding is always done on input ports. You specify where an input port is connected to, not where an output port should deliver its results. Pull, not push.
Input ports must be connected (or have a default connection, see the section called “Input port defaults”). Any unconnected output ports (also the primary ones) silently discard their results.
You can connect an input port in several ways. Remember, the examples in this article only show the most common use, lots of details and settings have been left out. Please refer to the specification if you need more information.
- An output port of another step
To connect to an output port of some other step in your pipeline you need to do two things:
Give the step you want to connect to a name using the
name
attribute:Connect the input port you want to bind using
<p:pipe>
or thepipe
attribute. For example, assume you want to connect thesource
input port of some step to thesecondary
output port of the<p:xslt>
step of Example 5, “Providing a name for a step”:Exactly the same can be achieved with:
- Something reachable by URI
For this, use a
<p:document>
child element or anhref
attribute:Exactly the same can be achieved with:
In the examples above, the relative
some-file.xml
will usually be resolved against the pipeline's location. An absolute filename must be prefixed withfile:/
.- Something stated inline
If the document you want to connect to a port is fixed, you can state it in the XProc code itself using
<p:inline>
. Attribute and Text Value Templates (XPath expressions between curly braces{…}
) are expanded:If the document is an XML document (like in the example above) you can leave out the surrounding
<p:inline>
element.- Nothing
Sometimes an input port doesn't need anything. In that case use
<p:empty>
:
Wrap up
Understanding ports and how to connect them is absolutely paramount in understanding XProc:
You have to declare the ports your pipeline uses to receive and deliver documents on.
You have to bind/connect the ports of the steps inside your pipeline, so the document(s) can flow through. Additional external documents (like XSLT stylesheets) must be connected using ports also.
In other words: if you get the hang of ports and their connections, you're well under way to mastering XProc!