User Guide¶
Invocation¶
Users define and execute SCALE-MS workflows by using Python to define work and
submit it for execution through a SCALE-MS workflow manager.
The SCALE-MS machinery is accessible through the scalems
Python module.
For the greatest flexibility in execution, scripts should be written without explicit reference to the execution environment. Instead, a SCALE-MS workflow manager module can be specified on the command line to bootstrap an entry point.
Example:
python3 -m scalems.local myscript.py
The above example uses the workflow manager provided by the scalems.local
module to process myscript.py
. After the module performs some initialization,
the script is essentially just imported. After that, though, specifically annotated
callables (functions or function objects) are identified and submitted for execution.
See scalems.app()
.
For examples of more direct access to the SCALE-MS workflow management machinery,
the pytest scripts in tests/
will be instructive.
Idioms¶
Deferred execution¶
SCALE-MS allows the specific calculations in a workflow to be expressed independently of its execution. Commands return handles to future results, allowing chains of commands and data flow to be described before dispatching for execution.
This programming model is consistent with modern concurrency idioms,
with an additional proxy layer that allows multiple tasks to be configured
before any are launched. Compared to the standard Python concurrency modules,
asyncio
functionality that is only available within an async def
function is available directly to the scripting interface, replacing ad hoc
coroutine definitions with objects (operation instance s)
Parallel data flow¶
Generally, single instructions can be applied to multiple data without special syntax. An array of input streams implies an array of output streams. All SCALE-MS objects have “shape” as part of their typing information, and parallel streams of data may be represented by a single reference of higher dimensionality. Function inputs have specified typing, which allows the multiplicity of a command to be inferred from its input.
By default, sequencing is preserved in outer dimensions. In other words replicated pipelines can be consistently indexed.
Sometimes, bundles of data should be processed asynchronously and the unique identity of the data source is less important. In such use cases, the sequenced outer dimension can be explicitly converted to an asynchronous iterable.
Generally, commands that consume sequenced input produce sequenced output, while commands provided with unsequenced / unordered / asynchronous input produce unordered output.
Iteration¶
Iteration in SCALE-MS takes a few different forms, and we should first clarify a distinction between iterable objects and iterable coroutines.
As noted above, SCALE-MS data has shape. As with numpy, it is helpful to think
in terms of “vectorized” operations instead of explicitly looping over elements.
Most for
or foreach
use cases are handled implicitly by applying a
function to iterable inputs.
The functional style scalems.map can be used to apply a function
to the elements of an iterable.
This can be necessary when the operation instance needs to be generated
dynamically, such as when the shape of data is not known until run time.
It can also be useful to convert non-SCALE-MS functions or data into workflow
objects (to explicitly defer execution of functions implemented outside of the
data flow API).
Of course, some iteration is not vectorizable. Logic may be explicitly stateful, or commands may hide internal data graph management. The main looping construct in SCALE-MS, then, is scalems.while_loop. The condition of the while loop is evaluated before each application of the function.
Dynamic functions¶
Simple SCALE-MS commands add operation instance s to the work graph
scalems.map()
while_loop
conditional