XML.com

XProc 3.0 - Connecting steps using ports

January 23, 2020

Erik Siegel

Erik Siegel continues his series on using XProc 3.0 with a description of how to use ports to connect steps.

Introduction

XProc is an XML based programming language for processing documents in pipelines: chaining conversions and other steps together to achieve the desired results. The introductory article on XProc 3.0 can be found here.

This article dives into an important concept in XProc: ports. Ports are the things where documents flow in and out. To use ports effectively you need to know how to declare them and how to connect them to each other. This article explains how to do this for the most common use. If you need more detail, please refer to the specification itself: http://spec.xproc.org/.

What are ports?

As explained in the introductory article about XProc, an XProc program consists of steps. Steps are the building blocks of XProc. A step takes documents as input(s), does something with the data flowing through and produces output(s). Programming XProc is writing steps by chaining other steps.

Ports are the connectors of steps. Documents flow in or out of a step through ports. Ports are the equivalent of the USB connectors on your computer or the network connectors on your router.

For instance, the <p:xslt> step has two input ports, one for the document and one for the stylesheet. It also has two output ports: one for the result of the XSLT transformation and one for the optional additional result documents (created with <xsl:result-document>):

p:xslt example ports
The p:xslt step with its in- and output ports

Ports are defined with <p:input> and <p:output> elements. Here are the (simplified) port declarations of the <p:xslt> step:

Example 1. Simplified port declarations for the <p:xslt> step
<p:input port="source" primary="true" sequence="true"/>
<p:input port="stylesheet"/>
<p:output port="result" primary="true"/>
<p:output port="secondary" sequence="true"/>

Declaring ports

Now <p:xslt> is a predefined step and part of XProc's standard step library. Its ports are declared for you. But programming XProc is writing your own steps, which means you have to be able to define/declare input and output ports.

Declarations of input and output ports must be stated in the prolog of your pipeline, as direct children of the <p:declare-step> and before its body, the steps constituting the pipeline, begins. Input ports are declared with <p:input>, output ports with (no surprise) <p:output>. Here is an example:

Example 2. An XProc step divided into a prolog and a body
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" …>

  <!-- Step's prolog: -->
  <p:input port="source"/>
  <p:output port="result"/>
  …

  <!-- Step's body: -->
  <p:xslt>
    <p:with-input port="stylesheet" href="do-something.xsl"/>
  </p:xslt>
  …

</p:declare-step>

The following attributes define properties of ports when declaring them:

port

The mandatory port attribute provides a name for the port.

By convention the primary input and output ports (see below) are named source and result.

primary

Setting this boolean attribute to true makes this port primary. A primary port plays an important role in connecting steps. See the section called “Implicit connections”.

There can only be one primary input and one primary output port. If a step declares a single input or single output port, it automatically becomes primary. So in Example 2, “An XProc step divided into a prolog and a body”, both ports are primary.

sequence

This boolean attribute tells the XProc processor how many documents can appear on the port. A value of false means exactly one document, a value of true means zero or more. An error will be raised if this restriction is violated.

content-types

This attribute restricts the content types (MIME types) of the documents on the port. Its value is a whitespace separated list of strings like text/plain or application/*+xml (where the * of course acts like a wildcard). Adding a minus (-) in front negates the meaning: not allowed. An error is raised if a document appears on the port that violates the restriction(s).

The exact rules for specifying "any XML document" or "any HTML document" are somewhat complicated. To make life easier there are five shortcut values you can use if you need this: xml, html, text, json and any. So, for example, telling a port to accept only XML and HTML documents: content-types="xml html".

serialization

This attribute is for output ports only. It tells the XProc processor what to do when the document(s) appearing on the port need to be serialized (usually to disk). When no serialization takes place (all port connections inside the pipeline) the attribute is ignored.

The value of the attribute is an XPath map. For example to serialize as HTML with indentation on: serialization="map{'method': 'html', 'indent': true()}"

Input port defaults

It's possible to define a default connection for an input port. A typical use case would be some conversion that in most cases uses a standard XSLT stylesheet. But sometimes this standard conversion needs to be overridden. Here 's how you could define such an input port:

Example 3. Defining an input port with a default connection
<p:input name="special-stylesheet">
  <p:document href="default-stylesheet.xsl"/>
</p:input>

If you don't connect anything to the special-stylesheet port, it will connect to default-stylesheet.xsl. But if you do the default will be ignored.

The rules and syntax for defining a default connection are the same as for defining an explicit port connection. There's one exception: A default port connection cannot be dynamic (cannot depend on things happening at runtime). Therefore you cannot use <p:pipe> or the pipe attribute.

Output port connections

An output port of a pipeline needs to know what to deliver to the outside world. You specify this by connecting it to some output port of one of the steps that make up the pipeline.

For the primary output port this will usually be the last step of your pipeline. In that case you don't have to do anything, the implicit connection mechanism will take care of everything. But if you want to output something else you need to specify this. For example:

Example 4. Defining an input port with a default connection
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" …>

  <p:input port="source"/>
  <p:output port="result"/>
  <p:output port="extra" pipe="result@transform-step"/>
  …

  <p:xslt name="transform-step">
    <p:with-input port="stylesheet" href="do-something.xsl"/>
  </p:xslt>
  …

</p:declare-step>

The extra output port will always emit the result of the transformation named transform-step, even if the pipeline goes on after this.

Connecting or binding ports

To make pipelines out of steps, you have to be able to connect or bind the ports of the individual steps. Documents must flow into input ports, whether they come from output ports of other steps, from disk, the web or whatever.

Connecting ports for the outermost step

When you invoke the outermost XProc pipeline using an XProc processor, you need to bind its input and output ports to something (usually files on disk). This binding is done through the command line. The same is true for setting the pipeline's options.

Unfortunately I cannot give you much guidance in this: How an XProc processor binds the pipeline's ports and sets options is implementation defined. So: read the processor specific documentation.

Implicit connections

The introduction to XProc article already explained implicit binding. Here 's a rerun:

One input and one output port of a step can be designated as primary port (by convention called source and result). Primary ports of steps "auto-connect" when you place steps next to each other in XProc code. This is called implicit binding.

Not only does implicit binding connects steps, it also connects the primary input and output port of a pipeline to its constituent steps. Here's an example pipeline that chains two XSLT transformations:

Implicit connections
Implicit connections in a pipeline

Explicit connections

Not all ports are primary ports and even for primary ports the previous step is not always the right one. So XProc provides you with means to explicitly bind ports. Explicit port binding is always done on input ports. You specify where an input port is connected to, not where an output port should deliver its results. Pull, not push.

Input ports must be connected (or have a default connection, see the section called “Input port defaults”). Any unconnected output ports (also the primary ones) silently discard their results.

You can connect an input port in several ways. Remember, the examples in this article only show the most common use, lots of details and settings have been left out. Please refer to the specification if you need more information.

An output port of another step

To connect to an output port of some other step in your pipeline you need to do two things:

  • Give the step you want to connect to a name using the name attribute:

    Example 5. Providing a name for a step
    <p:xslt name="step-to-connect-to" …>
      …
    </p:xslt>  
  • Connect the input port you want to bind using <p:pipe> or the pipe attribute. For example, assume you want to connect the source input port of some step to the secondary output port of the <p:xslt> step of Example 5, “Providing a name for a step”:

    Example 6. Connect to an output port using <p:pipe>
    <some-step …>
      <p:with-input port="source">
        <p:pipe step="step-to-connect-to" port="secondary"/>
      </p:with-input>
    </some-step>  

    Exactly the same can be achieved with:

    Example 7. Connect to an output port using the pipe attribute
    <some-step …>
      <p:with-input port="source" pipe="secondary@step-to-connect-to"/>
    </some-step>  
Something reachable by URI

For this, use a <p:document> child element or an href attribute:

Example 8. Connect to a URI port using <p:document>
<some-step …>
  <p:with-input port="source">
    <p:document href="some-file.xml/>
  </p:with-input>
</some-step>  

Exactly the same can be achieved with:

Example 9. Connect to a URI using the href attribute
<some-step …>
  <p:with-input port="source" href="some-input.xml"/>
</some-step>  

In the examples above, the relative some-file.xml will usually be resolved against the pipeline's location. An absolute filename must be prefixed with file:/.

Something stated inline

If the document you want to connect to a port is fixed, you can state it in the XProc code itself using <p:inline>. Attribute and Text Value Templates (XPath expressions between curly braces {…}) are expanded:

Example 10. State an input document in the XProc code itself
<p:variable name="debug" select="true()"/>
<some-step …>
  <p:with-input port="source">
    <p:inline>
      <input-document timestamp="{current-dateTime()}">
        <p>Debug setting: {$debug}</p>
      </input-document>
    </p:inline>
  </p:with-input>
</some-step> 

If the document is an XML document (like in the example above) you can leave out the surrounding <p:inline> element.

Nothing

Sometimes an input port doesn't need anything. In that case use <p:empty>:

Example 11. Connect an input port to nothing using <p:empty>
<some-step …>
  <p:with-input port="source">
    <p:empty/>
  </p:with-input>
</some-step>  

Wrap up

Understanding ports and how to connect them is absolutely paramount in understanding XProc:

  • You have to declare the ports your pipeline uses to receive and deliver documents on.

  • You have to bind/connect the ports of the steps inside your pipeline, so the document(s) can flow through. Additional external documents (like XSLT stylesheets) must be connected using ports also.

In other words: if you get the hang of ports and their connections, you're well under way to mastering XProc!