Soda Services

Overview

Soda supports the definition and use of services written in Python, and of service workflows.

A service can perform actions. A workflow consists of a list of actions to be performed. Each action has a set of inputs and a set of outputs. Each workflow also has a set of inputs and a set of outputs. For each action in a workflow, each input is an input to the workflow or an output of a preceding action. Each output of a workflow is an output of one of its actions.

An input or an output can be a Boolean, an integer, a real number, a text string, a binary quantity, or a virtual data lake item.

Service Definitions

Each service, each action, each input, and each output has a name and a description. A service can also have a local run sequence, which is a command that can be executed to run the service in a local system. The names of services must be unique within the system. The names of actions must be unique within a service. The names of inputs and outputs must be unique within a workflow. A service can be defined using the services view in the views workspace of the services source (source nr -1013). Its actions can then be defined using the service actions view, its inputs can be defined using the service inputs view, and its outputs can be defined using the service outputs view. (These views are in the same workspace.)

Workflow Definitions

Each workflow has a name, a list of work items, a list of inputs, and a list of outputs. Each work item is an action of a service. A work item has an estimated duration in seconds, which is used for progress reporting, and can be set to run locally. Workflows, with their names, inputs and outputs, can be defined using the workflows view, and a workflow's work items can be defined using the workflow work items view. These views are in the views namespace of the services source.

Service Implementation

A service should be implemented as a python program containing, for each action of the service, a class that is a subclass of the ActionProcessor class of the sodapy services module. Its main code section should set the soda context using the run arguments, define a service as an instance of the Service class of the soda services module, passing it a dictionary on the action classes, and invoke its perform_actions method. For example:

if __name__ == '__main__':
    logger = logs.PrintLogger()
    vdl_url = str(sys.argv[1])
    credential = str(sys.argv[2])
    key = str(sys.argv[3])
    
    contexts = contexts.SodaContexts(
            False, vdl_url, credential, key)
    context = contexts.get_default_context()
    parse_service = services.Service(
            context,
            logger, 
            'Parse HTML', 
            {
                'parse file': FileParser(),
                'remove parse': ParseRemover()
            })
    parse_service.perform_actions()

Workflow Processing

A processing is the performance of the actions of a workflow given a set of workflow inputs. It results in a set of workflow outputs.

The sodapy workflows module has classes and methods to support the execution of workflow processings, and the sodapy cp_processors module has classes and methods to support use of workflows with the Cherrypy web framework.