1. Introduction
The bane of standards and specifications writers is the distraction from valuable
technical work triggered by the responsibility to present that work in a completed
document according to an imposed style or layout guide. This responsibility is multiplied
when a single document text is targeted for multiple standards development organizations
who impose their own historical-based layouts. Maintaining multiple document formats
simultaneously is rife with problems of consistency and extra effort. Addressing this
using XML as a single source for multiple differing layouts is critical to maintaining the
single source text consistently across the publications.
Moreover, committees have many experts with input to a given document. Being able to
incorporate input from different contributors is streamlined when using XML because it is
text based. Equipping committee members to publish their intermediate work allows them to
review their modifications before submitting to editors their suggested changes.
Finally, assembling complex work products can be a finicky task. Results are produced
more quickly and are built more reliably where it is possible to automate the production
and assembly task. Using the GitHub hosting service for git repositories offers
“GitHub Actions” where scripted behaviours can be executed.
This case study of two OASIS technical committees with the responsibility to prepare
revisions of their respective OASIS specifications to be suitable for both OASIS and ISO
submission illuminates the committees’ use of DocBook XML for single-source
authoring. And using XML provides additional benefits opening options for generated
content not readily available in other authoring environments.
Also illustrated is how the editing and publishing process is supported by using the git
repository and GitHub hosting for collaborators to use to make their proposed
modifications to the editors, thus freeing contributors of the burden of supporting in
their local computing environments specialized publishing tools they may not have and
processes they may not be familiar with.
The environment now available for each committee implements the hands-off production of a
pair of ZIP files for uploading to the OASIS Kavi server: one for distribution to users by
TC Administration and one for the archiving of source and intermediate files that are used
in production but not residing on any OASIS server. This archive fulfills a
committee’s obligation to make publicly-available for posterity all of the inputs
involved in the production of work products.
This essay presents this model for other OASIS Technical Committees to leverage the
collaboration opportunities provided by the technologies, yet still meet the committee
obligations for archive. This implementation of the model incorporates OASIS’s
commercial license for the http://RealtaOnline.com online-publishing service, a high-performance and
purpose-built standards publishing platform, now available to support all OASIS editors of
committee work products.
This model can be useful in other collaborative writing and publishing environments,
illustrating the use of tools whose genesis is in software development, not in the
authoring of documents. And not just for OASIS and not just for standards.
2. A requirement to support multiple
SDO layouts
Standards Development Organizations (SDOs) mandate particular page/screen layout
requirements for their own specifications. They don’t always look the same as work
products from other SDOs (imagine!). For the authors/editors of standards for a single
SDO, this isn’t usually a problem, other than the burden itself of worrying about
the layout when wanting otherwise to focus on the technical work itself.
Lately, however, there have been opportunities for some SDOs to adopt the specifications
of other SDOs. The International Organization for Standardization (ISO) is an frontrunner
of this with their formal Publicly Available Specification (PAS) submission process. This
process permits an accredited organization to submit the organization’s own
internally-developed specifications as candidate ISO specifications to be affirmed as
International Standards with the agreement of ISO’s national bodies. While the first
PAS submission is allowed to be published using the submitter’s organizational
layout, subsequent submissions are required to follow ISO’s layout as dictated by an
important document titled “Directives Part 2”.
The Organization for the Advancement of Structured Information Standards (OASIS) is an
accredited PAS submitter and a number of committees have, and plan to have, submitted
OASIS Standards to become ISO standards. The OASIS Universal Business Language (UBL)
technical committee submitted their UBL 2.1 OASIS Standard through the PAS process to
become ISO/IEC 19845:2015. Subsequent revisions must be submitted to ISO using the
Directives Part 2 layout, while at the same time satisfy OASIS’s technical
publication process obligations for layout. These are two quite different layouts for one
set of technical information.
The OASIS Code List Representation technical committee intends to submit their OASIS
genericode specification through the PAS process. As the team progresses, it faces the
same multiple-layout challenges as the UBL TC.
In both cases the committee is leveraging a long-promoted benefit of using XML syntax for
source documents: multiple target publishing. For documents written following the OASIS
DocBook standard, OASIS has a specification layout set of stylesheets conforming to the
layout requirements dictated by committee process. With these stylesheets the source XML
is in a simple text-based file format that is readily handled by many tools.
In support of their work, OASIS committee editors have access to the online-publishing
REST-based publishing service offered by http://RealtaOnline.com. One of the features of this service is the
transformation of DocBook XML conforming to the OASIS specification conventions into
NISO-STS, the JATS-based publishing vocabulary commonly adopted for international
standards.
The Réalta service pairs the OASIS specification stylesheet library for DocBook
with a Directives Part 2 stylesheet library for NISO-STS. Committee editors can choose to
invoke the service by supplying a single DocBook representation of the content and getting
back the content in OASIS layout in both PDF and HTML, optionally as well as in Directives
Part 2 layout in PDF if needed. The former two outputs satisfy the OASIS technical
committee process and the latter output satisfies the PAS submission process.
3. GitHub: a committee-wide
collaboration opportunity
GitHub describes itself as the largest and most advanced development platform in the
world. At this time of writing, it is supporting over 200 million repositories providing
cloud-based storage for project data maintained using the free git source code control and
change tracking software. Also provided is a computing service called GitHub Actions, with
which processes can run in the cloud on the data found in the repository at various
interactions collaborators have with the git software.
For committees such as the UBL TC, the publishing burden is multiplied by the need for
separate subcommittees to make contributions towards a single specification. Different
clauses of the specification are the responsibility of different subcommittees. Moreover,
some of the technical artefacts are governed by separate subcommittees.
Using git on GitHub supports the collaboration and input from multiple committee members
towards a single specification. With the automation, each contributor is empowered to
create the set of deliverables reflecting their input to the project as a preview of what
the committee editors would see with their input created. Editors, in turn, can see both
the inputs and the outputs of a collaborator’s submission to assess how best to
respond to the contribution.
The two GitHub-hosted git repositories for this case study are:
Separately tailored GitHub Actions are leveraged by each committee in the generation of
published content and artefacts, and their assembly into both archival and distributable
packages conforming to OASIS committee process.
All of the committee scripts and stylesheets are found in the repository in clear text
for future teams to maintain and modify as required. The scripting is written using Apache
Ant, a choice by the committees and not in any way an obligation on the part of GitHub. If
you can run the script locally in your environment, you can run it in GitHub if all of the
tools are available. The UBL environment has some tool dependencies that are able to be
satisfied by GitHub.
4. The committee protocol using git
on GitHub
Two committee project roles are identified, each defined as a GitHub team: editors and
maintainers.
Editors are responsible for incorporating the suggestions made by the maintainers into
review copies (for committee consideration) and main copies (already accepted by the
committee). Two git branches are reserved and commits to these branches are restricted to
editors:
main
- this is content that has been reviewed by
committee members and considered acceptable to be distributed for its intended purpose
(which may be for testing or for production use, not necessarily for final use); the
public is expected to look to this branch for self-consistent content reviewed and
accepted by the committee.
review
- this is content from the editors that has not
been reviewed by committee members yet, and so is not considered agreed-upon for its
intended purpose, but the editors have incorporated input from other sources into a
package for review; when there is consensus about the content of the review
branch, it is snapshot in the main
branch.
A main
branch package is not necessarily a final package,
but simply a package merged from the review
branch whose
review has been completed. Editor’s note: the jury
remains out whether the main
branch is useful to the
committee’s public audience, as the function may be satisfied by judicious use
of tags and releases.
Maintainers create and maintain their suggestions in their own git branches (note that
editors making their own suggestions also work in their own git branches as if they were a
maintainer). Maintainers can use any XML editing tool to make their changes to the
specification document. Other files and directories can change however needed by the
maintainer.
This diagram overviews the maintenance and publishing protocol maintainers and editors
are expected to follow, remembering that an editor also performs maintainer tasks until
they submit their own pull requests from their personal branches and then perform the role
as an editor. The numbered step details are found at https://github.com/oasis-tcs/ubl/tree/review#detailed-steps for readers interested
in the interactions between roles.
Open-source tools can be configured as part of the GitHub environment (e.g. OpenOffice is
used in the UBL environment) or uploaded with the repository to be used for publishing. Of
note, the OASIS committees’ use of the REST-based interface to the commercial http://www.RealtaOnline.com satisfies the publishing task without having to upload
the publishing tools as part of the repository contents.
GitHub Action results are restricted to GitHub members and do not persist more than 90
days after production. OASIS editors and maintainers needing to distribute the published
results manually copy the assembled ZIP files to the Kavi server.
5. Generated content used in the
deliverables
5.1. Specification text
In both the UBL and genericode specifications, some of the content of the specification
document is synthesized as part of the build and publication processes. This is
accomplished by declaring in the specification XML the external general entities that
point to entity files containing the generated content.
XSLT is a versatile transformation language that reads in XML and can output either
standalone XML or XML general entities.
In the UBL specification the generation process is quite complex, producing a handful
of included entities incorporating information from multiple sources including the
previous version of the document XML, the previous and current versions of the semantic
library from Google spreadsheets, and some colloquial XML documents used to specify
summary information.
In the genericode specification the generation is quite straightforward and is
illustrated in the diagram below. The repository entities always are empty placebos, as
they are replaced in the production process prior to publishing. See https://github.com/oasis-tcs/codelist-genericode/tree/review#authoring-and-generated-content
for details on the numbered steps.
This content harvesting, manipulation, and re-insertion can be very powerful in
creating useful summaries or other reference materials inside of the specification. As
in this example, the harvested table is massaged for use in a conformance clause,
rearranging the text based on how it is read in a different context.
5.2. Accompanying artefacts
In the case of the genericode project, all the accompanying artefacts are prepared by
hand and contributors simply update the repository directories as required. The build
process is less than 90 seconds.
In the case of the UBL project, most of the numerous accompanying artefacts are
synthesized from three Google spreadsheets made available to the committee to
collaborate on the UBL semantic model. From these spreadsheets the GitHub automation
runs a number of stylesheets and other applications to check the veracity of the inputs
while producing the outputs. When problems are detected, the output work products
include files not intended to be distributed to users. This brings the problems to the
attention of the author. A successful build process takes more than 20 minutes. GitHub
conveniently sends an email notification at the completion of the build process,
successful or not.
6. Previewing XML content
If all a committee member is doing is modifying the text of the specification document, a
local preview environment enables the writer to see the impact of their changes to the
documentation XML before checking in their branch. This gives instant feedback without
needing to trigger a GitHub action. Only a single layout is supported, that being the
OASIS layout based on DocBook, and so there may be some limited content reserved for the
ISO publication that cannot be seen.
On the Windows platform the Internet Explorer browser renders the specification XML. On
the Mac platform the Safari browser does the same. It appears other browsers cannot handle
the DocBook stylesheet library and cannot be used.
The author simply drags-and-drops or opens the XML file from the browser to see the HTML
rendering of the content. After editing the content in their XML editor and saving their
work, a simple refresh in the browser renders their latest. This is instant and does not
rely on GitHub actions to perform a formal publishing process.
This functionality is unavailable to maintainers performing their tasks from the online
GitHub web interface. It is available only to those users who have cloned the repository
to their local environment. Online users must commit and push their content in order to
obtain a rendering. For UBL contributors the online run takes over 20 minutes, where as a
local refresh is instantaneous.
7. GitHub automation and
housekeeping
Key to the success of the hands-off committee work is the automation provided by GitHub
that is triggered by the push request. A GitHub Action is performed on the server after
the server makes its own copy of the git repository content. Thus, any changes to the
repository made on the server during the action do not impact on the repository data in
git. Also, the scripting for the server is maintained as repository files just as all the
other files.
As described in sections above, the artefact synthesis, the content generation, and the
publishing all are executed on the GitHub server and not the committee member’s
computer. Not only does this free up the computer from the grinding of producing the
outputs, it precludes the need to install the transformation and publishing software on
the member’s machine.
Moreover, not all committee members are working on the same operating system and so there
would be members who would not be able to run the build script (in this case using bash
for invocation) on their computer. This is mitigated somewhat for these two repositories
in that the scripting is done using Ant, which is cross platform. A batch file invocation
of the Ant script would work just as well.
Finally, there is no need to propagate to committee members information that may be of a
sensitive nature. In the example of these two repositories, the user name and password
REST access credentials for the commercial publishing service are hidden in GitHub secret
values managed by OASIS TC Administration. These values are not exposed in the console
logs or error reports of the executing process and so there is no security breach that
might permit unauthorized access to the publishing service. OASIS committees not wanting
to use GitHub but who need to use the REST service are trusted to protect their use of
their committee’s private values.
Before GitHub actions are able to be performed in the repository, they must be enabled by
going to the Actions tab and engaging the facility. Then the hidden directory in the
repository has a YAML script that can be tailored for custom invocations based on git
actions (see https://github.com/oasis-tcs/codelist-genericode/tree/review/.github/workflows for
the genericode example). In the case of these two repositories, the only action triggering
an invocation is “push”. After establishing the computing environment and
tools needed as environment dependencies, the invocation performed is the bash script that
runs the Apache Ant script and zips up the results.
Importantly, how a collaborator uses git impacts on how often the automation gets run.
When a collaborator is working from the command line they are able to commit multiple
changes to their git repository before a single push request is used to trigger the
automation. When a collaborator is working from the GitHub web interface, each and every
individual commit includes an implicit push that triggers the automation. A web user
modifying 10 files will trigger 10 automation builds. This may introduce delays for the
collaborator wanting top see their final result, or for other users of the repository, by
ending up queued behind all of the automation triggers.
Accordingly, a collaborator will see their final result sooner if they take the time to
go to the Actions tab and cancel the nine workflow runs triggered before the final trigger
that produces the desired results. This also prevents execution minutes being deducted
from the GitHub monthly limits.
Moreover, some housekeeping is critical to reduce the GitHub burden of supporting the
triggered actions. Action results are automatically deleted by GitHub 90 days after having
been created, but for those 90 days the results need to be kept around occupying storage.
This reduces the impact on the GitHub storage limit for repositories.
In accordance with OASIS committee process requirements, committee work products need to
persist indefinitely on OASIS platforms. Collaborators are obliged to view GitHub action
results as the transitory constructs they are, manually preserving in Kavi the results
that need to be preserved, and respectfully cancelling and deleting runs and results that
no longer are needed. In the case of the UBL project, a single run of the automation takes
over 20 minutes to produce 550Mb of data compressed into 160Mb total in two ZIP files.
Deleting the undesired action runs and results will save time and space. Deleting
intermediate results will help, as will deleting the final results that are posted to Kavi
for distribution to the committee.
8. Summary
Using git and GitHub for standards development provides a collaborative environment
supporting committee members in the mundane and arcane tasks of publishing specification
documents and assembling distribution deliverables. This frees up their time to focus on
the heart of the standards process: the divining and development of the specification
content.
Not that git and GitHub themselves aren’t a bit arcane, but an investment in
learning how to use these tools will stand one in good stead in the future of
collaboration and software development. Helpfully, the openly-available repositories
supporting these two cases can be inspected for the procedures performed and copied when
starting new projects.
This case study illustrates how these tools can be successfully deployed for
collaborative documentation and deliverable development. Contributors can trigger the
remote process readily to preview the impact of their changes before suggesting them to
the editors in charge, without the burden of supporting specialized publishing tools they
may not have in their personal computing environment. Members see their contributions in
final-form PDF and HTML when the GitHub commit process invokes the online XML publishing
service from Réalta.
These collaboration benefits are not restricted to this OASIS environment, as this git
and GitHub combination can be realized by any writing project using open publishing
processes. OASIS project editors in particular have access to a commercial publishing
process that simultaneously satisfies the OASIS committee process requirements for both
the OASIS page layout and the ISO Directives Part 2 page layout, but not having this
doesn’t take away from the model itself.