The Product

Soda Data Content Management System

Content Creation 

Build full-featured text and graphics websites

Display, edit and import data

Make content public or restrict access

Website Admin

Set up workspaces, pages and data views

Define users and access levels

Configure authorization providers

Data Import

Analyse JSON APIs

Import all data or only data that has changed

Integrate imported data using views


Content Creation

Web pages are the best way of presenting textual information - not just on the world-wide web, but in enterprise systems too. With attractive layouts and quick, intuitive navigation, they are the standard for displaying text, graphics, and video. 

They are also the best way to present data - but extracting it from data stores and formatting it for display is often not straightforward. Content management systems make it easy to lay out web pages and add many kinds of content, but do not provide a dynamic display of changing data. Embedding spreadsheet or other application files in pages often requires special-purpose code. 

There are now many sophisticated applications that process data using advanced techniques such as knowledge graphs and deep learning. Users want to display their results in ways that their developers cannot anticipate. They may want to display combined results from different applications. This does not need to be difficult.

Soda is the perfect platform for applications to provide data that users can display with other content, and present just as they wish.

Access Control

Access ControlAccess control is essential for protecting consumer personal data and enterprise intellectual property. Open data is important in a democratic society, but we also need personal privacy and a commercial structure that allows wealth generation. Companies will create useful data, and develop sophisticated data-transformation applications, if they can make money. To do this, they must be able to control who can use their data.

The old concept of keeping all corporate assets inside a secure perimeter has proved unworkable, given the universal connectivity  of the Web, and the fact that many "corporate" assets are now "in the cloud". The new concept is zero-trust architecture, with access to assets controlled regardless of whether it is from inside or outside the corporation.

With increased collaboration and cloud-based working, where the access comes from is not the only consideration. A data owner must be able to control access to the data wherever it is located: in the owner's systems, in the cloud, and even in the user's systems.

For data access, the granularity of control is important. Control at the data-set level, or even at the record level, is too inflexible. It must be possible to allow or deny access to individual fields.

Soda uses virtual data lakes to access data stores. Virtual data lakes are triple stores with access control built in to the core. Access to a triple requires the user to have access to its subject, its verb, and its object. It is only by going to this level of detail that a system can provide the granularity of control needed for collaborative distributed processing. 

Before deciding whether to grant or deny access to a requesting party, the system must establish what rights it has. Oauth has become the de-facto standard for establishing the rights of requesting parties to cloud-based assets. Soda uses Oauth to establish the level of access to grant to people or systems requesting access to data.

Import Data

We talk about the API economy, recognizing the way APIs underpin interactions between web services and applications in a commercial environment. Use of APIs has been expanding steadily for some time - the ProgrammableWeb register has been growing at 2,000 APIs per year since 2015. Web service APIs are the best way to exchange data, enabling providers to define how they expose it, and giving users stable interfaces. 

JSON is now the favoured delivery format. Elegant, concise, self-describing, and integrated with Web programming language Javascript, it is easy to use and a natural choice. Even so, obtaining data from APIs typically requires some bespoke programming, to transform data that follows the provider's model to data that follows the user's model.

Soda supports import of data from JSON APIs, in a two-stage process.

The first stage is to analyse the API and determine its data model. Most APIs have consistent data models, but are documented by examples, rather than by published schemas. By analysing provided examples and actual API data, Soda determines the object classes and properties of the underlying data model, and creates views to display the data. Finalising the model requires some user input, but this is much simpler and quicker than writing a parser program.

The second stage is to import the data from the API. This can be a one-off operation, or a repeated operation in which changes are noted to keep the imported data in sync with the source, but unchanged data is not re-written, allowing persistent use by applications.

Combine Data

Combined DataApplications often use data from different sources. They generally represent information in different ways, and follow different data models. Combining them often takes substantial effort, not related to the information itself, but to the way it has been stored and structured. 

The virtual data lakes used by Soda to access data represent information as triples. This is the simplest and most flexible way to represent information. Many intelligent data application programs use triples as their underlying storage mechanism. Data from relational databases, graph databases, hierarchical databases, and other kinds of store can all be converted to triples in a straightforward way.

The virtual data lake implementation is open source and has a documented Java API. Applications can use this to perform complex analysis and transformation of data, which can be displayed in attractive web pages using Soda.

The Implementation

Soda is a full-featured content management system that enables users to work with data and the providers of the data to control access to it.

All data is held as subject-verb-object triples. It can be displayed and edited on web pages using views. 

Access control is built in at the core of the system. Access to a triple requires the user to have access to its subject, its verb, and its object. It is only by going to this level of detail that a system can provide the granularity of control needed for collaborative distributed processing. 

Before deciding whether to grant or deny access to a requesting party, the system must establish what rights it has. Oauth has become the de-facto standard for establishing the rights of requesting parties to cloud-based assets. Soda uses Oauth to establish the level of access to grant to people or systems requesting access to data.

Soda is implemented on AWS, using S3 for reliable data storage, but it could easily be ported to other cloud services.

Development Strategy

We are adding forms-based data entry and display to complement the current views mechanism.

We will add support for data import using other formats in addition to JSON. We have successfully imported spreadsheet data and Internet of Things data (using the O-DF data format standard) in experimental systems.

We do not plan to implement sophisticated applications using AI. Our focus is on providing the best platform for them.