This White Paper considers the subject areas of three of the Consortia and Forums of The Open Group as data integration use cases, and describes how TOGAF® can be used to develop data integration solution architectures for them. They are:
Military Software Design and Procurement - The FACE™ Consortium
Subsurface Data Processing - The OSDU® Forum
Greenhouse Gas Emissions - The Open Footprint™ Forum,.
The three subject areas illustrate different aspects of data integration. The white paper shows how the TOGAF® standard can be applied in each of them, and presents conclusions for data integration projects in general.
Integration of data within and between enterprises is essential to The Open Group vision: Boundaryless Information Flow.
The Open Group is a global consortium whose vision is Boundaryless Information Flow™ achieved through global interoperability in a secure, reliable and timely manner. This means that people and software programs in an enterprise should have ready access to the information that they need and are entitled to, in order to support the enterprise’s business processes. The enterprise will then gain operational efficiencies through integrated information, and integrated access to that information, internally and spanning the key interactions with suppliers, customers, and partners.
The Open Group’s Architecture Forum supports this vision by its work on Enterprise Architecture, which includes development and publication of standards and best practice guides. These most notably include the TOGAF® standard [C220], which is the de facto global standard for Enterprise Architecture.
The role of the Architecture Forum’s Data Integration Work Group is to create a body of architecture artefacts for data integration, and an overall framework to stitch them together. It produced a White Paper on Technical Standards for Data Integration [W211] and conducted a survey of Enterprise Architects to understand their needs in data integration projects. It is now developing a Guide to Data Integration using The Open Group Standards, based on research into current data integration trends, collection of data integration use cases, and review of the relevant Open Group standards. These include, not only the TOGAF® standard, but also other standards of the Architecture Forum such as the Open Agile ArchitectureTM standard [C208], and standards produced by other Forums, such as the standard for the modeling language Archimate® [C226], the Digital Practitioner Body of KnowledgeTM standard [C196], and the Open Data Element Framework [C223].
The FACE Consortium was formed to define an open software development environment for military avionics, to address issues of cost and inability to re-use existing components in defense procurements. It created a reference architecture with a strong emphasis on data modeling, which enables suppliers to design products that interwork.
The OSDU Forum enables the energy industry to collaborate on the development of technology to support the world’s changing and evolving energy needs. Energy companies currently face two major challenges: low prices and the move to a lower carbon environment. The Forum is developing a common data platform as a first step to help them take on the challenges and meet the changing needs.
There is a growing legal requirement for companies to report their carbon footprints. Producing this report often requires integration of data from various parts of the company itself, and also from suppliers and business partners. Standards and best practices for this are only just beginning to emerge. The Open Group’s Open Footprint (OFP) Forum [Q210], [W234], with other standards bodies and industry consortia, is working to develop them.
This White Paper forms a starting point for the investigation of how The Open Group standards can be used for data integration by considering how the TOGAF® standard can be used to architect solutions in these three areas, where data integration is an important part of solution development. Later studies will build on it.
Military procurements typically call for competitive tenders against functional requirements. In the past, this often led to purchase of products from different suppliers that met the requirements but did not re-use common components and were hard to integrate with each other. A new component might be designed from scratch even when it had 70% commonality with an existing component. The customer could be “locked in” to a particular supplier when purchasing equipment that must integrate with that supplier’s equipment. Lead times were long, costs were excessive, and equipment interoperability was limited.
Originally driven by the US Air Force, the scope of FACE has broadened to include other branches of the military in the US and its allies. The overall scope of the problem is the procurement and design of items of equipment that have common functionality and must interwork.
To address this problem, the FACE consortium has developed a reference architecture that includes a data architecture and modeling process, with an ability to verify architecture instantiation through a conformance program. Product suppliers use this so that their products will interwork. Even where there is no immediate need for interoperation, they still use applicable architecture models for future proofing.
Reference Architecture
The reference architecture is illustrated in Figure 1.
Figure 1. FACE Reference Architecture
In this architecture, the Portable Components Segment (PCS) is where software providing mission-level capabilities or business logic resides. Transfer of data between components in this segment (and some other segments) is accomplished by the Transport Services Segment (TSS). The TSS has a typed interface to provide for the transmission of data defined by data models.
As well as just transporting data, the TSS has a capability to transform it for security and other purposes.
Data Architecture
The FACE Data Architecture is a set of related models, specifications, and governance policies with the primary purpose of supporting interoperable data exchange through the TSS. It includes a Shared Data Model (SDM), a Shared Data Model Governance Plan that defines policies for development and management of the SDM, and an Open Universal Domain Description Language (Open UDDL) for formally describing, querying, and communicating information.
The UDDL leverages entity and relationship modeling with a novel approach allowing for refinement and appropriate extension of semantic elements. Additionally, it provides for common data models that may be reused and shared to enhance interoperability between software development efforts.
Figure 2 shows the various constituent models that make up the FACE Data Architecture and the relationships between the models.
Figure 2. Models of the FACE Data Architecture
Note: UoP stands for Unit of Portability
The FACE architecture solves the problem of integrating data from pieces of equipment developed by different organizations by defining a modeling framework in which common data models can be developed, and by providing a repository for the common models. Its conformance program reinforces this with practical support for testing and acceptance.
A systems integrator developing software that integrates data can review the applicable data models, define additional models as needed, and verify that they are in accordance with the architecture. To aid model development, the FACE Consortium has a location on its website with links to third-party data modeling tools.
While the primary purpose of the FACE data architecture is to support interoperable data exchange, it has a data integration element. A portable component may integrate data that it obtains from other components. It can do this because the code and configuration are aware of the data models, and can interpret the typed data delivered by the TSS.
Oil and gas exploration and extraction require complex data processing operations, often based on sophisticated geological models. Developing the software to support these operations is difficult and expensive. Multiple scientific and technical disciplines are involved, each with its own models and vocabulary. Often the models are inconsistent and the vocabularies use terms in conflicting ways. This leads to logical data silos that hamper cross-disciplinary workflows.
Much of the expensive software and many of the constricting data silos are in need of revision or replacement, because of changes to the business and the technology. On the business side, companies’ operating models are changing to reflect the transition in the energy sector towards renewable resources, driven by public policy and consumer demand. On the technical side, digitalization is raising new requirements, and cloud computing is delivering better processing options.
The Open Group formed the Open Subsurface Data Universe (OSDU) Forum to deliver an open source, standards-based, technology-agnostic data platform for the energy industry. It seeks to reduce data silos to enable transformational workflows, accelerate the deployment of emerging digital solutions for better decision making, and create an open, standards-based ecosystem that drives innovation for the energy industry.
Data Platform
The Forum has developed an open source data platform [OSDU] that consists of a set of data management services with APIs for use by applications. The services enable user applications to:
Other than the last of these, the platform capabilities are generic, able to support applications outside the OSDU domain. These capabilities can be provided in different cloud environments or on premise. This is illustrated in figure 3, which shows a data storage layer supporting a generic data platform, with the software to process specific kinds of data that are relevant to OSDU applications forming a domain data platform partly layered above it, and the applications accessing the domain and generic data platforms through their respective APIs.
Figure 3. OSDU Data Platform
Schemas
The OSDU does not define an overall data model. It provides for the registration and management of schemas in a community repository, through the generic schema service. This allows for the maintenance of common schemas and the incremental development of application-specific schemas.
The data platform addresses the need to respond to business and technical change, and its data management capabilities and schema repository prevent the formation of workflow-constricting data silos. The open source approach, with cross-company collaboration of experts, delivers a cost-effective and well-architected solution.
Again, data integration is not the primary purpose, but there is a strong data integration element. Within a workflow, an application can retrieve from the platform data that has been stored by other applications, and combine it to produce integrated data, which it outputs or stores for later use. It can do this because it can use the stored schemas. The support by the platform for entitlements, policies, and legal tags is an important aspect of this, helping to ensure that integrating the data does not breach corporate governance, intellectual property, or legal restrictions.
The danger of climate change is now accepted. The transition to a lower-carbon economy is a global imperative and business reality. Companies need reliable data for decision support, to understand how they can reduce their carbon footprints, to be able to explain how they are reducing it to investors and society as a whole, and to comply with the increasing body of reporting legislation.
The scope of the problem is the collection, integration and reporting of carbon footprint data. It applies to companies and governments globally, and to collaboration between companies that use each others’ energy or other products.
Accounting for greenhouse gas emissions [GHG] has some similarity to financial accounting, which companies do partly because it is required by legislation, partly to provide information about themselves, and partly for internal planning. Financial accounting has been practiced for centuries, there are established standards and good practices, legislation is mature, and companies have dedicated accounts departments. By contrast, carbon footprint reporting is in its infancy, standards and practices are still emerging, legislation is only just beginning to appear, and few companies know how to collect their footprint data.
Footprint data is more complex than financial data, with many different quantities to be measured, and with a need to consider not only direct emissions (GHG Scope 1) but also indirect emissions from energy (GHG Scope 2) and indirect emissions from suppliers and customers (GHG Scope 3).
The data can be collected in various ways. These include direct measurement, which can be more or less accurate depending on the measurement tools and methods, and taking nominal values from tables for different kinds of emission. The latter approach is probably the least accurate, but at present the most widely used. As well as collecting its own data for scope 1, an enterprise must use data from energy suppliers, other suppliers, and customers for scopes 2 and 3.
The problem is for companies to be able to produce reliable emissions reports from disparate sources using different standards (or no standards). An enterprise must integrate this data, obtained by a variety of different methods, and represented in a variety of different formats, in accordance with complex rules, to obtain and report its carbon footprint. It is important to understand the basis on which the data has been computed, that it has been computed reliably, and by a reliable source.
While there are many specialized commercial off-the-shelf products to assist with accounting, companies typically use spreadsheets for footprint data, if they process it at all.
There are some reporting standards (for example GHG Corporate Protocol [GHG] and IFRS Accounting [IFRS]), but each company uses them in its own way. They define what needs to be reported, but are not focused on the technical details, and are not at a sufficiently detailed technical level to ensure that reports are fully comparable. This limits the ability of companies to control their emissions, the reliability of their footprint information, and the effectiveness of legislation.
The Open Footprint Forum was created by The Open Group to act as the predominant global data standard for verifiable emissions. It is developing a detailed data model and defining a platform for collecting, integrating and reporting Footprint data.
At the time of writing, the model and platform are under development, and anything said here is subject to change when the definitions are published. What is said here is based on publicly presented descriptions of the Forum’s intentions. As anyone who has been involved in standards development knows, drafts can change radically as a standard evolves.
The platform will support a workflow including the activities of data generation, data extraction, data storage, analytics, reporting, and data use. The stored data will be structured in accordance with the data model.
The data model includes the entities involved in GHG emission reporting and the relationships between them. It captures provenance and calculation basis as well as data values, and can be used by collaborating parties to enable the production of reliable reports. By providing detailed technical definitions, the data model enables companies to combine their data with data from other companies, merging like with like, without having to research and transform other companies’ data formats.
The OFP Forum is considering using the OSDU data platform, but with the components that support OSDU applications replaced by components that support OFP applications. This will enable the development of open source and commercial off-the-shelf products to process Footprint data.
Use of this platform with the OFP Data Model would follow the pattern illustrated in Figure 3. Open source and commercial off-the-shelf products to process footprint data would comprise the domain data platform. They would be used by enterprise GHG reporting applications. The applications and the domain data platform would use the domain data definitions of the OFP data model.
The data integration element of the OFP use-case is more explicit than for FACE or the OSDU. Derivation of a company’s emissions data requires the integration of data from within the company and outside it.
The intention is to translate the generated and extracted data into the common format of the OFP data model, so that it can be combined by applications for analytics, reporting and use. The details of how to achieve this are still being worked out.
Use of TOGAF® for Solution Architectures
The TOGAF Standard is a framework for Enterprise Architecture. Within the context of Enterprise Architecture, a solution architecture is a description of a discrete and focused business operation or activity and how IS/IT supports that operation.
For FACE, a portable component performing a specific avionics function, such as a tactical air navigation system, is a solution and has a solution architecture. A geological analysis toolkit is an example for OSDU. For OFP, every enterprise will need a GHG reporting solution, and should therefore have a GHG reporting solution architecture.
The TOGAF® architecture development method (ADM), illustrated in Figure 4, can be used to develop solution architectures. It has nine phases and a central requirements process. Three of the phases (B, C and D) are primarily concerned with architecture definition. They are preceded by the definition of the overall architecture vision (Phase A), and followed by phases in which the architecture is realized, and lessons are learned.
Figure 4: The TOGAF® ADM
An architecture vision is a succinct description of the target architecture that describes its business value and the changes that will result from its successful deployment. It serves as an aspirational vision and a boundary for detailed architecture development.
It includes the creation of a solution overview. This is a “pencil sketch” of the expected solution at the outset of the engagement. It is often called a Solution Concept Diagram, Conceptual Solution Architecture, or something similar. It identifies the main data processing operations, with the data sets that they consume and produce, at a very high level.
A business Architecture relates business elements to business goals and elements of other domains.
For FACE and OSDU solutions, the business architecture is specific to the avionics component or subsurface application in question. A FACE solution business architecture relates a portable component to the purpose of the aircraft in which it is installed and the role of the Air Force. An OSDU solution business architecture relates a subsurface application to the business of the enterprise using it. Each portable component, and each application, performs a different function in its business context.
For OFP solutions, the business architecture is more generic, as GHG reporting is, or will be, a business requirement of all enterprises. Each GHG reporting system performs a similar function in the business context of the reporting enterprise.
Information Systems Architectures
The Information Systems Architectures consist of a data architecture and an application architecture, which may be developed in either order.
In a data integration project, the Architecture Vision solution overview forms a starting point, and information about data processing operations is developed at a more detailed level, with the identification of applications and major application components.
The data architecture is a description of the structure of the major types and sources of data, logical data assets, physical data assets, and data management resources. It includes the definition of detailed data models, and the creation of the data flow diagrams (or similar) that identify the data processing operations and their data sets.
The application architecture is a description of the structure and interaction of the applications that provide key business capabilities and manage the data assets. It identifies and describes the data integration application components.
The structure of the major types and sources of data are described by data models. They form a crucial aspect of all three use-cases.
A semantic model shows classes of object and the relations between them. The creation of such a model, whether it is called a data model overview, a conceptual data diagram, an entity-relationship diagram, a taxonomy, an ontology, or a metamodel, is essential for data integration. It enables communication between the human architects and designers, so that they have a common understanding of the data to be integrated.
This is particularly important when the data to be integrated, and the people defining its integration, come from different organizations, which will typically use different terminologies.
In relational database theory, data models are classified in three types: conceptual, logical, and physical. A similar classification can be applied to other data storage paradigms, including graph databases and NoSQL name-value pairs, and to data transmission formats such as XML and JSON.
A data integration component works with schemas at the physical data model level. These schemas define the structure of the input data, the output data, and any data that the component stores for its own purposes. The component must translate between them and the software data structures on which its program code operates.
The conceptual and logical data models provide human-understandable explanations of the physical data models. The data integration can be described in terms of logical operations on these models.
Because there are different modeling techniques and storage paradigms, and to cater for efficient storage use and run-time performance, the mapping between a logical data model and a physical data model that realizes it may not be straightforward. A translation between a physical data model and software data structures may not be straightforward either.
There is no clear boundary between architecture and design when it comes to data models. The definition of conceptual data models is generally part of the architecture, carried out in the data architecture part of the information systems architecture phase. The definition of physical data models and their mapping to software data structures are often carried out by data specialists and programmers, and form part of the design and implementation rather than the architecture definition. Logical data models may be specified as part of the architecture, or left to the design.
FACE
Each FACE component has an individual business architecture, and has individual data and application architectures to support it. But, while different portable component data architectures may have different data structures, the structures are defined in the same way, to facilitate interworking between components.
The FACE data architecture, illustrated in Figure 2, provides for the creation of conceptual, logical and platform data models, which are increasingly concrete realizations of the business domain concepts. This leads to the definition of UoP models for the portable components, which specify the messages that the components receive and send, at abstract and concrete levels.
Integration models, while not required, can be added to provide details of integration between components.
OSDU
As for FACE, each OSDU application has an individual business architecture, and has individual data and application architectures to support it.
OSDU does not have a rigorous data definition methodology like that of FACE. Nevertheless, an OSDU solution will involve the definition of conceptual, logical and physical data models, and the mapping of physical data models to software data structures. The model schema are stored in the OSDU schema registry.
The OSDU domain data platform contains open source products that could be specified as part of the application architecture, or be selected to provide application functionality in the Opportunities and Solutions phase, which follows the technology phase of TOGAF.
OFP
Every enterprise’s GHG reporting system performs broadly the same business function.
The aim is that their data architectures should all specify the common standard data model that is being defined by the Open Footprint Forum.
It is likely that their application architectures will have areas of similarity. The Open Footprint Forum wants to encourage the development of open source or commercial off-the-shelf software products based on its model that can form solution building blocks to implement common functionality.
A technology architecture is a description of the structure and interaction of the technology services and technology components. The FACE and OSDU standards define significant portions of FACE and OSDU solution technology architectures. Like the OSDU, the Open Footprint Forum is also looking to have a standard data platform.
The FACE standard defines a messaging technology service, in which the TSS passes messages between portable components. A solution architecture definition adds definitions of other technology services and components, such as special sensors or other hardware that the portable component needs.
The OSDU standard defines a data access technology service. Again, a solution architecture definition adds definitions of other technology services and components that the application needs, and also defines the cloud or on-premise data storage that underlies the OSDU data platform.
The OSDU architecture is cloud native. This is highly appropriate as a basis for the open source components, and will help encourage their development.
The following are conclusions that are drawn from consideration of the three use-cases, and of the use of TOGAF to develop their solution architectures, and apply to data integration projects in general.
Classic Data Integration Pattern
There is a classic data integration architecture pattern, which is implicit in all three use cases. Starting from a solution overview, it defines and describes data semantics and structure, data processing operations, and a software platform to support them. This is the information needed by development and operations staff to create and run software programs that integrate data.
(The term architecture pattern is used here in the general sense of the TOGAF® Architecture Development Method as “a way of putting building blocks into context; for example, to describe a re-usable solution to a problem.” We do not give a formal pattern definition.)
Use of the TOGAF® ADM
This pattern aligns well with the TOGAF architecture development method, with the creation of a solution overview in the Architecture Vision phase, the definition of data semantics and structure and data processing operations in the Information Systems Architectures phase, and the definition of the platform in the Technology Architecture phase.
Standard Data Models
The development of standard data models, such as that being defined by the Open Footprint Forum, can be of great value for data integration.
Application Components
For subsurface applications and GHG accounting, many of the applications and components will be common to many of the companies involved, creating a market for “off-the-shelf” application components. For most data integration projects, this will not be the case, creating a need for model-driven application components and components that can be configured “on the fly” to operate on different data sets.
Data Integration Platforms
The three use cases specify particular platform components, but most data integration projects will use whatever platforms have been adopted for other applications by the enterprises concerned. These will not necessarily be cloud native. Cloud native architectures are however increasing in popularity, and cloud-based microservices are an excellent way of implementing data integration solutions.
(Please note that the links below are good at the time of writing but cannot be guaranteed for the future.)
C196 The Digital Practitioner Body of Knowledge™ Standard, a standard of The Open Group (C196), published by The Open Group, January 2020; refer to: https://publications.opengroup.org/c196
C208 The Open Agile Architecture™ Standard, a standard of The Open Group (C208), published by The Open Group, reissued October 2022; refer to: https://publications.opengroup.org/c208
C220 The TOGAF® Standard, 10th Edition, a standard of The Open Group (C220), published by The Open Group, April 2022; refer to: www.opengroup.org/library/c220
C223 O-DEF™, the Open Data Element Framework, Version 3.0, published by The Open Group, May 2022; refer to https://publications.opengroup.org/c223
C226 The Archimate® Standard, Version 3.2, a standard of The Open Group (C226), published by The Open Group, October 2022; refer to: https://publications.opengroup.org/c226
FACE FACE Technical Standard, Edition 3.2. The Open Group, August 2023. Refer to https://publications.opengroup.org/c232
GHG GHG Protocol Corporate Accounting and Reporting Standard, Revised Edition, the World Business Council for Sustainable Development (WCBSD) and the World Resources Institute (WRI); refer to https://ghgprotocol.org/sites/default/files/standards/ghg-protocol-revised.pdf
IFRS IFRS Accounting, Sustainability Disclosure, and other standards, The IFRS Foundation; refer to https://www.ifrs.org/
OSDU Open Source software for the OSDU Data Platform. The OSDU Forum. Refer to https://community.opengroup.org/osdu/
Q210 The Open Footprint Forum, a Data sheet published by The Open Group, February 2021; refer to https://publications.opengroup.org/catalog/product/view/id/1296/s/q210/category/25/
W211 Technical Standards for Data Integration, a White Paper published by The Open Group, June 2021 (W211); refer to https://publications.opengroup.org/w211
W234 Open Footprint™ Forum Overview: Enabling Effective Emissions Data Management and Sharing; refer to https://publications.opengroup.org/w234
< >