OASIS is a good place to create open business information exchange standards. This article gives you some pointers on how to go about doing that.
Foreword
Open information interchange is growing in importance in many
economic sectors and industries. The OASIS technical committee process was created to
nurture the development of open standardized work products using accredited rules and
strictures. Communities of users in any economic sector can mimic and engage the
techniques and processes published by OASIS to produce useful exchange document
specifications.
To illustrate an example of such as this, the OASIS Universal
Business Language (UBL) is a business-oriented interchange vocabulary was developed
independently of any vendor’s software or user platform’s constraints. UBL
is used successfully to bridge supply chain and transportation applications with
chosen information items to convey commonly-accepted business concepts in these
economic sectors. Reaching beyond any one expression syntax, the UBL information
model, developed by participants with domain expertise but not syntax expertise, can
be realized in multiple syntaxes, such as XML and JSON.
This essay is written
for development teams embarking on creating practical interchange vocabularies for
user communities in any industry or economic sector. The reader is invited to project
their own requirements and objectives onto the OASIS tools, deliverables and processes
of all of the facets of development that successfully resulted in the UBL OASIS
Standard, ratified globally as ISO/IEC 19845.
Executive
summary
In April 2016 Crane Softwrights Ltd. was asked by an organization
to propose a response to the request to produce an open XML document interchange
standard for interested parties in its economic sector to see and use. The sector
includes governmental and non-governmental organizations. The client’s
pre-conceived constraints did not have room for the approach I proposed and so I
submitted a “no bid” and I didn’t get the gig. But I put effort into
my RFP response and I’m sharing it here, anonymizing the sector details, for the
possible benefit of others who might be interested in running the same kind of
project.
I don’t think an open industry-wide business vocabulary is a task
one organization (even if it is the dominant one in an economic sector) can give to a
consultant to produce, in isolation and with only a collection of legacy paper
documents, a successful result. But I do think what I proposed for open development is
not only viable but compares well against other approaches to the statement of work of
producing an open interchange vocabulary standard.
The ISO/IEC Open-edi
Reference Model depicts a perspective of looking at interchange vocabularies as a part
of doing business in any industry or economic sector. Framed by the Open-edi
perspective, OASIS (the Organization for the Advancement of Structured Information
Standards) has published a vocabulary standard of Naming and Design Rules for
realizing such a work product using non-technical collaborative tools in addition to
some technical development tools.
Moreover, OASIS has a defined committee
process prescribing governance, transparency and availability within which the
development of such an open and standardized work product can be successfully
accomplished. As an illustrative example of leveraging the OASIS committee process,
the OASIS Universal Business Language (UBL) interchange vocabulary was created
primarily by domain experts using that vocabulary standard and the available free
tools. The OASIS approach used by the technical committee can be examined as an
example by any industry or economic sector to produce their own open standards for
interested parties to see and use.
By the way, the client never did loosen their
pre-conceived notions of how to run their project and I never heard from them if they
found a consultant to give them what they asked for the way they asked for it, nor if
so, whether or not they were successful in their economic sector with
it.
In
any economic sector or industry, a business environment connects business entities within a
common community of mutual interest. That community exists to benefit all of the
participants, otherwise they probably wouldn’t be participants of the environment in
the first place. Each business partner fulfills one or more roles in the community to
satisfy the needs of other business partners. Examples of different business environments
are manufacturing, financial services, real estate/property management, insurance and
transportation logistics.
Environments overlap and participants develop business
practices over the years to shape the scenarios involving entities and the interactions
between the entities playing roles to fulfill those scenarios. These business practices
incorporate both the external choreography of interactions between entities and the internal
(often private and/or proprietary) processes an entity performs to satisfy their
obligations. The external practices form both that choreography of transactions between
parties and the business information exchanged in these transactions that bridge the
information systems of each party such that entity each can perform their role in the dance.
Examples of transactions are manufacturing work orders, property booking reservations and
transportation status checks.
Consider as just an example the commonly-understood
basic supply chain Buy-Ship-Pay environment connecting commercial, logistical and regulatory
processes for trading partners. Such are in place when buying goods from a local company for
pickup, or selling goods to someone on the other side of the world involving international
intermodal transportation.
An example choreography of basic Buy-Ship-Pay processes
between a buyer party and a seller party is illustrated with a number of possible steps in
the swim-lane diagram in Figure 1,
“Buy-Ship-Pay detailed choreography”. The business information bundles
supporting the transactions are labeled in the boxes, conveying a message from one party to
the other.
Figure 1. Buy-Ship-Pay detailed choreography
But not every relationship need involve all of the possible
transactions available in an environment. Different scenarios may require different subsets
of the transactions, also called profiles of the choreography. For example, the demand for
payment with an invoice might be the only transaction needed between two parties that have
already agreed on everything else in the process, and so would be a simple profile of
procurement. Such an invoice-only profile would need to contain less information in the
invoice than such for a comprehensive profile of procurement that might demand all of the
illustrated steps be formally instantiated.
Figure 2. Buy-Ship-Pay simple choreography
Not shown in the detailed diagram is the possible choreography of
transactions in intermodal transportation that might be behind the “ship the
goods” action, involving parties such as freight forwarders or customs authorities.
UN/CEFACT, the United Nations Centre for Trade Facilitation and E-Business, outlines a
number of possible roles engaging in the Buy-ship-Pay process in addition to the swim-lane
use of just “Buyer” and “Seller” in Figure 3, “Buy-Ship-Pay
roles”.
Figure 3. Buy-Ship-Pay roles
And so other scenarios might demand not only the complete
choreography of Buy-Ship-Pay but an additional choreography to engage and track the movement
of goods. And behind the scenes within that the participants in the transportation process
will have their own choreographies oblivious to the buyer and the seller.
Perhaps the
seller cannot be established until a profile of choreography of tendering and award has
successfully completed. This dance happens before the Buy-Ship-Pay choreography begins
between the buyer and the successful seller.
Within any economic sector environment,
profiles of choreography will be well-recognized by participants, and their existing
information systems would support their participation in such profiles. It is the
responsibility of the community of users to establish the commonly-accepted profiles of
choreography and the profiles of the documents supporting those interactions within the
sector.
2. Conveying business information
Regardless of the
sector environment, business information is conveyed from a sending role to a receiving role
as a transaction within a profile of choreography. The sender has its own business practices
developed over time to meet its obligations. The receiver could have very different business
practices because its obligations and its history differ from the sender. All that needs to
be in common is the understanding of the transactions in a given profile and the different
subsets of information used for each transaction.
The business practices shape the
data model that maintains the information that supports those practices with applications
run by the parties. Accordingly, there is no guarantee (or even likelihood) that the data
models of the two parties match, though probably the components of each that represent
common concepts would be the same. In paper-based transactions, the sender presents these
information components in a page layout, likely with labels or geometries that relate the
components to each other. Figure 4,
“Business information paper exchange” depicts the recipient application
ingesting the business document through scanning or manual entry, relying on the labels and
geometries to be properly interpreted for the information to placed correctly in the
receiver’s data model to support the receiver’s business practices.
Figure 4. Business information paper exchange
The integrity of such a paper-based transaction relies on the
integrity of the printing and the interpretation of the identity of the printed content.
Faults in either will impact on the success of the business practices and, in turn, the
entire choreography. Fixing the faults, given they can be identified, can be costly to all
parties.
A digital exchange removes the challenge of printing and interpreting the
printed content. It does not remove the challenge of starting off with correct information.
But if the information is correct, then using digital technologies can drastically reduce
the opportunities for identification errors. The sender and receiver need to agree on a
syntax representing the information. Figure 5, “Business information digital exchange components” depicts
how the sender marshals their information out of their application into the syntax that is
transported to the receiver who unmarshals the information from the syntax into their
different application.
Figure 5. Business information digital exchange components
All of the aspects described so far slot into the ISO/IEC Open-edi
Reference Model [Open-edi] outlined in the left two columns of Figure 6, “Open-edi Reference Model perspective of interchange”.
While the abbreviation for “electronic data interchange” historically is often
associated with financial information, it has always been agnostic of the nature of the
information being exchanged. From the introduction of ISO/IEC 14662 one reads:
The field of application of Open-edi is
the electronic processing of business transactions among autonomous multiple
organizations, authorities or individuals within and across sectors (e.g.
public/private, industrial, geographic). It includes business transactions which involve
multiple data types such as numbers, characters, images and sound.
The Open-edi
Reference model is independent of specific:
information technology implementations;
business content or conventions;
business activities;
parties participation in business activities.
Open-edi describes two “views” of
electronic business (the rows in the diagram): the business operational view and the
functional services view. The business operational view (BOV) describes the abstract
properties of the environment, the scenarios, the roles in the scenarios and the bundles of
information conveyed between roles. The functional services view (FSV) describes the
concrete machine-processable properties of user data representation of information bundles,
the choreographies engaged by the roles in the scenarios of the environment and the
transport of the content between the parties.
Figure 6. Open-edi Reference Model perspective of interchange
Also shown in the diagram, in the rightmost column, is the bridging
of the business specification of the information objects and definitions to the
machine-processable specification of the binding of the information objects to actual syntax
representations suitable for applications to produce and ingest. The two examples of
syntax-independent information bundle description technologies cited are the UN/CEFACT Core
Component Technical Specification (CCTS) and the Unified Modeling Language (UML). The three
examples of syntax technologies cited are the text-oriented XML and JSON, and the
binary-oriented ASN.1. The technology that bridges the two is the set of naming and design
rules governing creating from the business view of information bundles (the models) the
functional view of user data (the syntax).
3. Models and syntax
It is
typical for developers to tailor schemas of syntax to their application’s use of the
concepts. But as described earlier, the applications deployed by a sender are likely
different than the applications deployed by the receiver. An independent representation of
the information in a syntax, not favouring any one application over the other, is needed for
all applications to support.
Vendor-, platform- and product-independent purpose-built
vocabularies exist for prose and poetry (TEI [TEI]), office documents (OASIS ODF [ODFTC]), and technical documentation
(OASIS DITA [DITATC] and OASIS DocBook [DocBookTC]). These particular
vocabularies would not be appropriate for structured business information in sectors such as
manufacturing, financial services, real estate/property management, insurance and
transportation logistics.
Producing for an economic sector an open business vocabulary
to be published, made available, and maintained for interested parties to see and use cannot
be left to a single developer. Creating syntax specifications is a specialized technical
task, and technical experts or teams are often tasked with doing just that for an
application’s data. Rather, what is needed is the development of the information
bundles of the semantic components from which the syntax specifications can be
derived.
But it is untenable to expect a technically-oriented third party, presumably
without a formal background in the economic sector milieu specifically, to fully comprehend
and understand the nuanced semantics behind the information bundles required for that
sector’s applications. This information might have historically been presented in
multiple physical documents of forms and transmissions, thus requiring recognizing
duplicates, determining structure, establishing granularity, specifying cardinality and
appropriately labeling all of the perhaps thousands of semantic information components. And
because of looking for semantic similarities across different documents, analyzing multiple
documents would be very difficult work to do entirely in parallel. Stakeholders in the
economic sector are the ones who should be identifying and specifying the information
bundles required by the applications deployed by the roles in the
choreographies.
UN/CEFACT created the syntax-independent Core Component Technical
Specification [CCTS] Version 2.01 as a
modeling technique for business documents. It comes complete with a specification of a base
library of core component data types typical in business documents, such as currency
amounts, quantities with units of measure, date and time values and others, including
metadata facets for most of the types. It describes a rigorous structuring of containment
resulting in tree-like branches and leaves of content, each leaf being structured itself at
the lexical and metadata levels by its core component data type. While UML can describe any
structure, it does not include a library of agreed-upon content data types nor does it
constrain the containment with CCTS-like rules.
Producing the actual tree-like syntax
specification is the smallest of the tasks in the project and is rote in nature. For
large-sized vocabularies this shouldn’t be a manual task, but should be automated from
the semantic descriptions. Thus, the semantic descriptions in CCTS should be in a
machine-processable format on which to base the automation.
The OASIS Business
Document Naming and Design Rules [BDNDR-v1.1] formally describes the application of CCTS 2.01 in a constrained
fashion to ensure the automated generation of validation artefacts in either or both XML [XML 1.0] with XSD [W3C Schema] and
JSON [ISO 21778 -
ECMA JSON] with JSON Schema [JSON Schema]. Crane Softwrights Ltd. has made the
validation artefact generation tools freely available for anyone to use [Crane
Resources]. These are the tools used by the committee to create the
distribution artefacts, and by the community to create subset specifications.
For a short period of time a CCTS Version 3.0 was in development, but it never
gained traction and continued development of it has been abandoned. The modeling and
core types found in CCTS Version 2.01 are proven to be sufficient to the
task.
There is an ISO/IEC 15000-5:2014 IS specification of CCTS derived from CCTS Version
2.01, but it was released after the release of the first NDR from OASIS and so is not
considered relevant to the OASIS work already established using CCTS Version 2.01 as
its base.
UN/CEFACT have, themselves, created a suite of naming and design rules for creating
XML validation artefacts from CCTS Version 2.01 models, but the resulting schema work
products cannot be deployed with the same flexibility for real-world business
requirements as those created with the OASIS BDNDR rules.
Illustrated in this essay, the OASIS Universal Business Language (UBL) [UBL-2.1], internationally standardized as
ISO/IEC 19845:2015, implements the Buy-Ship-Pay example of a business interchange vocabulary
developed using the principles described. How the UBL committee employs the BDNDR is
described in the UBL Naming and Design Rules Version 3.0 [UBL-NDR]. The UBL distribution includes both the CCTS models
and the XML/XSD syntax validation artefacts as the two normative components for 65 document
types in UBL 2.1. There are separate non-normative publications of the model in UML (derived
from the XSD with CCTS data types) [UBL-2.1-UML] and syntaxes in ASN.1 [UBL-2.1-ASN.1] and JSON [UBL-2.1-JSON].
And so, in essence, when hiring
assistance for a project to develop an exchange vocabulary, one shouldn’t be looking
to engage, for example, primarily XSD or JSON expertise. What is needed is modeling
facilitation assistance to empower and mentor a team of stakeholders to develop the
information bundles of the document models themselves in CCTS. The need for XSD or JSON
expertise is secondary, rote and preferably supported by the generation tools. The need is
not absent, it is just not primary. And the primary task can be supported with a
well-defined committee process, such as is published by OASIS.
4. Leveraging the OASIS TC
process
Important in any development of such an open
specification, in order to gain the trust of potential users, are three critical aspects:
governance, transparency and availability. The rules of engagement and obligations by
contributors are formalized by the governance of the project. The openness of the
development process to public scrutiny is needed for transparency. The openness of the work
product is characterized by its unfettered availability (recognizing that even
“mandatory registering for a free copy” is a barrier to availability).
If
you asked me (as my client did), I would propose that to accomplish these project goals you
become an organizational member of OASIS [OASIS], the Organization for the Advancement of Structured
Information Standards. Following the recognized TC process at http://www.oasis-open.org/policies-guidelines/tc-process (accredited by ANSI in the
US and ISO internationally as suitable for creating national and international standards)
would create an OASIS Sector Information Exchange Technical Committee for the economic
sector with a charter for the creation of transfer schemas for the sector community. I would
also propose that you engage an OASIS member resource, such as I offered myself to my
client, familiar with both CCTS technical issues and OASIS committee work as a founding
technical co-chairman (being responsible for the syntax, (the minor role), the CCTS
principles and committee functioning). You, as an organizational member, would provide a
domain co-chairman (being responsible for the model and the membership). These would be two
of the required five member companies needed to form a committee. You would then engage the
interests of stakeholders to find at least another three charter members, hopefully more. If
you had a particular geographic focus for the economic sector, you might get some interest
internationally in participating if other geographical areas had similar interests, thus
making it a globally-interesting specification.
Many years ago I participated in the
creation of the OASIS TC process and governance. The objective was general enough that
“if Japanese subway operators wanted to get together to create an XML vocabulary for
interchanging scheduling information, the process should be straightforward and flexible
enough that they would find a home at OASIS to do so”. (I don’t think any
Japanese subway operators actually did so, but it exemplified the kind of framework OASIS
was striving for.) The process has matured and become very successful, and OASIS offers
assistance to technical committees to help promote membership in the TCs. And the legal
folks at OASIS have covered off in the detailed process all of the very important issues of
copyright and intellectual property rights involved in group developments of open-use
standards.
Having worked out such IPR issues, the TC process and procedures protect
the work product from being blind-sided by IPR claims (provided that the TC members respect
their membership obligation and members of the public only use the Public Comment list to
submit, which has obligations built in to subscribing to the list). The OASIS process
dictates that all meeting agendas, minutes, TC mail list and documents be transparently open
to the public at all times. OASIS puts no encumbrances on the work products, not even
“register to use”, and puts all work products in the publicly-accessible file
repository. Ownership of the resulting specification rests with OASIS, but the specification
is fully open.
The OASIS TC process for public review and creating a committee
specification is extensive and rigourous. The TC administration support of wikis, JIRA
ticket management (very important for building and maintaining the specification), a
document repository and a file repository are all available to use by a TC at no charge. No
software to install or maintain ... public visibility (mandated) for all of the projects
actions: meeting agendas and minutes, discussions, document drafts and final versions,
committee specifications and distribution artefacts.
The new technical committee can
be arranged with subcommittees responsible for certain domains, and the subcommittees make
recommendations to the technical committees to include in deliverables.
Given that
OASIS is an accredited ISO/IEC JTC 1 Publicly-Available Specification (PAS) submitter, the
option is there to make a work product an ISO standard. For example, UBL 2.1 is now ISO/IEC
19845:2015, a recognized ISO Standard. ODF is another example of an OASIS Standard that has
become an ISO standard, initially ISO/IEC 26300:2006 and now split into many
parts.
The work of the UBL committee has progressed for a long time to become an
effective and comprehensive document exchange specification for the Buy-Ship-Pay
environment. UBL 2.0 had 32 document types and 1,972 information elements, and it took many
months (a few years) to develop it by a group of volunteers not working full time. UBL 2.1
went much more smoothly with a honing of the tools employed to produce the results of 65
document types and 4,112 information items. UBL 2.2 is anticipated to be developed even more
quickly because the tools are now mature and turnkey, even though we are adding another 16
document types and 500 information items. And with the recent interest from the user
community for UBL subsets, the generation tools can now create and document tailored UBL
schemas created for specific deployments of the vocabulary. The tools can also be used to
measure backwards compatibility between versions.
Within the proposed new OASIS Sector
Information Exchange Technical Committee, after an initial setup configuration, schema
generation would only take minutes, not months as in the early days of UBL. During
vocabulary development one can produce daily suites of validation artefacts when taking
daily snapshots of the information models. The tools for creating artefacts are free to
download and use from http://www.CraneSoftwrights.com/resources/ubl/#obdndr.
One can
find the UBL 2.1 information model, expressed using CCTS, at https://docs.google.com/spreadsheets/d/1amzk8jn1boD2q3ze9rR14PVB6OGDyHTc2pQl92JutvE/view.
The use of Google Drive allows international members of the committee to collaboratively
edit the content simultaneously. It also allows the interim work to be publicly transparent.
For archive purposes, periodic snapshots of the ever-changing live document are made and
stored in the OASIS repository.
The
technical co-chair I proposed above would participate in the newly-created OASIS Sector
Information Exchange Technical Committee on an hourly services contract as the
technology-related co-chair responsible for establishing the committee, running the
committee and meetings within the OASIS process, initiating the master spreadsheet of
CCTS-expressed information items, facilitating those responsible for populating the
spreadsheet with direction regarding using CCTS, producing the syntax validation artefacts
and producing the Committee Specification documents per the TC process. Your designated
semantics-related co-chair would be responsible for the quality and integrity of the
concepts expressed using CCTS.
The following are examples of the many facets of the
work being done within the UBL TC that would have parallels in a newly-formed OASIS Sector
Information Exchange Technical Committee:
All of that (except the additional governance document which was a TC effort)
comes just with being a TC, being supported by OASIS TC Administration, and so justifies the
cost of membership (see https://www.oasis-open.org/join/categories-dues for details).
You,
as the domain co-chair, and the sector stakeholders would focus on the semantics and
building the exchange document types and common library in CCTS, documenting guidelines
where seen fit, handing it all to the technical co-chair to create committee deliverables
for voting and public review and, ultimately, acceptance as an OASIS Standard. The heavy
lifting of information design is best done by the stakeholders because the stakeholders
would know the semantics better than any third-party consultant not familiar with the
domain. The technical co-chair would also facilitate the CCTS model creation by the domain
experts.
The work can start small. Within a couple of weeks the entire committee and
development framework can be in place, creating stub artefacts and skeleton documentation.
The process can be debugged and technical staff brought up to speed in its use. The model
just builds up from scratch and at any time one can snapshot the modeling task. Anyone can
produce a set of schemas with which committee members can test syntax documents. The lead
technical co-chair is available to help with problems with CCTS and with the
syntaxes.
Regarding the publishing of OASIS specification documents, I am also the
creator of the OASIS DocBook XML specialization for OASIS standards, now used by a number of
committees. It is used for writing up the UBL, OASIS Business Document Envelope (BDE) and
OASIS XLIFF TC specification documents (standards and committee notes). The DocBook
environment produces output in both HTML and PDF formats for publishing, and has the added
bonus of being machine-processable if a user in the community would find that helpful. Other
committees use OpenOffice or Word to create the specification documents without such
features. Based on the success of UBL I would commend the use of the off-the-shelf DocBook
environment for your technical co-chair to create the specification documents.
5. Value
validation
A comment to make regards the OASIS BDNDR perspective
of the scope of XSD and the use of other technologies. It is the UBL committee’s
firmly-held position that XSD is suitable only for structural validation, that being the
arrangement and cardinality of elements and attributes, and the lexical expression of the
text strings found in leaf elements and their attributes. Value validation regards the
constraints on the values of the text strings found in leaf elements and their attributes.
Value validation can be subject to external pressures and scenarios, such that different
value constraints are needed when dealing with different exchange partners. In UBL the
validation process separates structural validation in XSD and value validation in OASIS
Context/value Association [CVA] files that reference OASIS genericode [genericode] files (both developed
within the OASIS Code List Representation Technical Committee [CLRTC]). The CVA files are translated through Schematron [Schematron] into
XSLT [XSLT 2.0].
Instances are then run through a two-phase validation process: first to confirm the
information is in the correct place, the second to confirm the information has correct
values. This is depicted at http://docs.oasis-open.org/ubl/os-UBL-2.1/UBL-2.1.html#A-UBL-2.1-CODE-LISTS-AND-TWO-PHASE-VALIDATION.
All of the tools used to create the value validation artefacts are
freely-available.
6. Digital
signatures
One final technical note regards digital
signatures that should probably be included for all information exchange document types.
Digital signatures provide non-repudiation as well as authentication. It might seem overkill
for small innocuous documents, but it has no overhead when it isn’t being used and it
is there for anyone in the user community who may have a business need to use it. For
backward compatibility, the UBL extension point first child of the document element has
scaffolding within which digital signatures are embedded in UBL documents. The BDE schemas
developed from scratch provide both for an extension point as the first child of the
document element and separately for a set of digital signatures as the last children of the
document element. In new schemas this latter approach would be used for document models
large and small.