User Guide

Getting Started

The scalems package requires a supported Python3 installation and assumes a Linux environment. For remote execution, SCALE-MS uses RADICAL Pilot, which has additional requirements.

See Installation and configuration

For the basics of running SCALE-MS scripts, see SCALE-MS invocation.

Idioms

Deferred execution

SCALE-MS allows the specific calculations in a workflow to be expressed independently of its execution. Commands return handles to future results, allowing chains of commands and data flow to be described before dispatching for execution.

This programming model is consistent with modern concurrency idioms, with an additional proxy layer that allows multiple tasks to be configured before any are launched. Compared to the standard Python concurrency modules, asyncio functionality that is only available within an async def function is available directly to the scripting interface, replacing ad hoc coroutine definitions with objects (operation instance s)

Parallel data flow

Generally, single instructions can be applied to multiple data without special syntax. An array of input streams implies an array of output streams. All SCALE-MS objects have “shape” as part of their typing information, and parallel streams of data may be represented by a single reference of higher dimensionality. Function inputs have specified typing, which allows the multiplicity of a command to be inferred from its input.

By default, sequencing is preserved in outer dimensions. In other words replicated pipelines can be consistently indexed.

Sometimes, bundles of data should be processed asynchronously and the unique identity of the data source is less important. In such use cases, the sequenced outer dimension can be explicitly converted to an asynchronous iterable.

Generally, commands that consume sequenced input produce sequenced output, while commands provided with unsequenced / unordered / asynchronous input produce unordered output.

Iteration

Iteration in SCALE-MS takes a few different forms, and we should first clarify a distinction between iterable objects and iterable coroutines.

As noted above, SCALE-MS data has shape. As with numpy, it is helpful to think in terms of “vectorized” operations instead of explicitly looping over elements. Most for or foreach use cases are handled implicitly by applying a function to iterable inputs. The functional style scalems.map can be used to apply a function to the elements of an iterable. This can be necessary when the operation instance needs to be generated dynamically, such as when the shape of data is not known until run time. It can also be useful to convert non-SCALE-MS functions or data into workflow objects (to explicitly defer execution of functions implemented outside of the data flow API).

Of course, some iteration is not vectorizable. Logic may be explicitly stateful, or commands may hide internal data graph management. The main looping construct in SCALE-MS, then, is scalems.while_loop. The condition of the while loop is evaluated before each application of the function.

Dynamic functions

Simple SCALE-MS commands add operation instance s to the work graph

scalems.map()

while_loop

conditional

Python interface

Data flow scripting interface is provided by the scalems Python package.