Python package reference

SCALE-MS - Scalable Adaptive Large Ensembles of Molecular Simulations.

This package provides Python driven data flow scripting and graph execution for molecular science computational research protocols.

Refer to https://scale-ms.readthedocs.io/ for package documentation.

Refer to https://github.com/SCALE-MS/scale-ms/wiki for development documentation.

Invocation:

ScaleMS scripts describe a workflow. To run the workflow, you must tell ScaleMS how to dispatch the work for execution.

For most use cases, you can run the script in the context of a particular execution scheme by using the -m Python command line flag to specify a ScaleMS execution module:

# Execute with the default local execution manager.
python -m scalems.local myworkflow.py
# Execute with the RADICAL Pilot based execution manager.
python -m scalems.radical myworkflow.py

Execution managers can be configured and used from within the workflow script, but the user has extra responsibility to properly shut down the execution manager, and the resulting workflow may be less portable. For details, refer to the documentation for particular WorkflowContexts.

Object model

When the scalems package is imported, a default Context is instantiated to manage the API session. The Python scripting interface allows a directed acyclic graph of Resources and resource dependencies to be declared for management by the context. Resources may be static or stateful. Resources have type and shape. Resource type may be a fundamental data type or a nested and structured type. An Operation is a stateful Resource whose type maps to a scalems compatible Function.

Interfaces

scalems Resource references are proxies to resources managed by the framework.

A Resource reference may be used as input to a scalems compatible function.

A Resource provides a Future interface if the Resource represents an immutable data event that can be converted to concrete data in the client context. Future.result() forces the framework to resolve any pending dependencies and blocks until a local object can be provided to the caller.

When a client uses a function to add work to a work flow, the function returns a reference to an Operation.

An Operation reference has (read-only) attributes for the named resources it provides. These resources may be nested.

Operations provide a run() method to force execution at the point of call. run() is an alias for Resource.result()

Generated Resources.

Operations depend on input Resources provided to the Function. If native Python data is provided to the Function as input, the Function also generates static Resources to be placed on the graph. If the Operation is being created in a different context than that of a resource dependency, the contexts are responsible for fulfilling the dependency. The mechanism (subscription, transfer of ownership, etc.) is a detail of the Context collaboration.

Execution Modules

Every SCALE-MS object reference belongs to a workflow managed by a WorkflowManager. Workflows may be executed through different means and with different resources through distinct modules. Different middleware implementations may be accessed directly, but we recommend selecting a management module when invoking Python from the command line with the -m option:

python3 -m scalems.local myscript.py

scalems.local

Workflow subpackage for local ScaleMS execution.

Execute subprocesses and functions in terms of the built-in asyncio framework. Supports deferred execution by default, but close integration with the built-in asyncio allows sensible behavior of concurrency constructs.

Example

python3 -m scalems.local my_workflow.py

scalems.radical

Workflow subpackage for ScaleMS execution dispatching to RADICAL Pilot.

Example

python -m scalems.radical myworkflow.py

Manage workflow context for RADICAL Pilot.

Dispatching through RADICAL Pilot is still evolving, and this module may provide multiple disparate concepts.

Workflow Manager:
RPWorkflowContext provides a SCALE-MS workflow context and coordinates resources for a RADICAL Pilot Session.
Executor:

The RP dispatcher and executor are currently combined, and provided only as the implementation of the context.dispatch member function.

When “entered” (i.e. used as a with()), the Python Context Manager protocol manages the lifetime of a radical.pilot.Session. Two significant areas of future development include Context chaining, and improved support for multiple rp.Sessions through multiple RPContextManager instances.

Basic functions

Core Function implementations provided with the SCALE-MS package.

scalems.executable(*args, context=None, **kwargs)

Execute a command line program.

Configure an executable to run in one (or more) subprocess(es). Executes when run in an execution Context, as part of a work graph. Process environment and execution mechanism depends on the execution environment, but is likely similar to (or implemented in terms of) the POSIX execvp system call.

Shell processing of argv is disabled to improve determinism. This means that shell expansions such as environment variables, globbing (*), and other special symbols (like ~ for home directory) are not available. This allows a simpler and more robust implementation, as well as a better ability to uniquely identify the effects of a command line operation. If you think this disallows important use cases, please let us know.

Parameters:argv – a tuple (or list) to be the subprocess arguments, including the executable

argv is required. Additional key words are optional.

Other Parameters:
 
  • outputs (Mapping) – labeled output files, mapping command line flag to one (or more) filenames.
  • inputs (Mapping) – labeled input files, mapping command line flag to one (or more) filenames.
  • environment (Mapping) – environment variables to be set in the process environment.
  • stdin (str) – source for posix style standard input file handle (default None).
  • stdout (str) – Capture standard out to a filesystem artifact, even if it is not consumed in the workflow.
  • stderr (str) – Capture standard error to a filesystem artifact, even if it is not consumed in the workflow.
  • resources (Mapping) – Name additional required resources, such as an MPI environment.

Program arguments are iteratively added to the command line with standard Python iteration, so you should use a tuple or list even if you have only one parameter. I.e. If you provide a string with arguments="asdf" then it will be passed as ... "a" "s" "d" "f". To pass a single string argument, arguments=("asdf") or arguments=["asdf"].

inputs and outputs should be a dictionary with string keys, where the keys name command line “flags” or options.

Note that the Execution Context (e.g. RPContext, LocalContext, DockerContext) determines the handling of resources. Typical values in resources may include

  • procs_per_task (int): Number of processes to spawn for an instance of the exec.
  • threads_per_proc (int): Number of threads to allocate for each process.
  • gpus_per_task (int): Number of GPU devices to allocate for and instance of the exec.
  • launcher (str): Task launch mechanism, such as mpiexec.
Returns:Output collection contains exitcode, stdout, stderr, file.

The file output has the same keys as the outputs key word argument.

Example

Execute a command named exe that takes a flagged option for input and output file names (stored in a local Python variable my_filename and as the string literal 'exe.out') and an origin flag that uses the next three arguments to define a vector.

>>> my_filename = "somefilename"
>>> command = scalems.executable(('exe', '--origin', 1.0, 2.0, 3.0),
...                              inputs={'--infile': scalems.file(my_filename)},
...                              outputs={'--outfile': scalems.file('exe.out')})
>>> assert hasattr(command, 'file')
>>> import os
>>> assert os.path.exists(command.file['--outfile'].result())
>>> assert hasattr(command, 'exitcode')

Dynamic functions

Dynamic functions generate operations during graph execution.

Data shaping functions

Establish and manipulate data flow topology.

Helpers

Tools for dynamically generating Functions.

Speculative functions

These functions are probably not explicitly necessary, or at least not appropriate for the high level interface.

scalems.broadcast()
scalems.concatenate(iterable: Iterable[T]) → T

Equivalent to reduce(extend_sequence, iterable)

scalems.partial()

Provide an alternative to functools.partial() that plays well with SCALE-MS checkpointing and dispatched execution.

Base classes

class scalems.Subgraph

Base class with which to define Functions in terms of sub-graphs.

Proposed alternative to the subgraph-builder context manager provided by subgraph().

Example:

# Create a subgraph Function with several Variables.
#
# * *simulation* names an input/output Variable.
# * *conformation* names an output Variable.
# * *P* names an internal state and output Variable.
# * *is_converged* names an output Variable.
#
class MyFusedOperation(Subgraph):
    # The Subgraph metaclass applies special handling to these class variables
    # because of their type.
    simulation = Subgraph.InputOutputVariable(simulate)
    conformation = Subgraph.OutputVariable(default=simulation.conformation)
    P = Subgraph.OutputVariable(default=scalems.float(0., shape=(N, N)))
    is_converged = Subgraph.OutputVariable(default=False)

    # Update the simulation input at the beginning of an iteration.
    simulation.update(modify_input(input=simulation, conformation=conformation))

    # The Subgraph metaclass will hide these variables from clients.
    md = simulate(input=simulation)
    allframes = scalems.concatenate(md.trajectory)
    adaptive_msm = analysis.msm_analyzer(allframes, P)

    # Update Variables at the end of an iteration.
    simulation.update(md)
    P.update(adaptive_msm.transition_matrix)
    conformation.update(adaptive_msm.conformation)
    is_converged.update(adaptive_msm.is_converged)

    # That's all. The class body defined here is passed to the Subgraph
    # metaclass to generate the actual class definition, which will be
    # a SCALE-MS compatible Function that supports a (hidden) iteration
    # protocol, accessible with the `while_loop` dynamic Function.

loop = scalems.while_loop(function=MyFusedOperation,
                          condition=scalems.logical_not(MyFusedOperation.is_converged),
                          simulation=initial_input)
loop.run()

Logging

Python logging facilities use the built-in logging module.

Upon import, the scalems package sets a placeholder “NullHandler” to block propagation of log messages to the root logger (and sys.stderr, if not handled).

If you want to see logging output on sys.stderr, attach a logging.StreamHandler to the ‘scalems’ logger.

Example:

character_stream = logging.StreamHandler()
# Optional: Set log level.
logging.getLogger('scalems').setLevel(logging.DEBUG)
character_stream.setLevel(logging.DEBUG)
# Optional: create formatter and add to character stream handler
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
character_stream.setFormatter(formatter)
# add handler to logger
logging.getLogger('scalems').addHandler(character_stream)

To handle log messages that are issued while importing scalems and its submodules, attach the handler before importing scalems. Note that if scalems.radical will be used, you should import radical.pilot before importing logging to avoid spurious warnings.

Refer to submodule documentation for hierarchical loggers to allow granular control of log handling (e.g. logging.getLogger('scalems.radical')). Refer to the Python logging module for information on connecting to and handling logger output.

Exceptions

Exceptions thrown by SCALE-MS are catchable as scalems.ScaleMSException.

Additional common exceptions are defined in this module. scalems submodules may define additional exceptions, but all will be derived from exceptions specified in scalems.exceptions.

exception scalems.exceptions.APIError

Specified interfaces are being violated.

exception scalems.exceptions.DispatchError

SCALE-MS is unable to execute work or manage data in the requested environment.

exception scalems.exceptions.DuplicateKeyError

An identifier is being reused in a situation where this is not supported.

exception scalems.exceptions.InternalError

An otherwise unclassifiable error has occurred (a bug).

Please report the bug at https://github.com/SCALE-MS/scale-ms/issues

exception scalems.exceptions.MissingImplementationError

The expected feature is not available.

This indicates a bug or incomplete implementation. If error message does not cite an existing tracked issue, please file a bug report. https://github.com/SCALE-MS/scale-ms/issues

exception scalems.exceptions.ProtocolError

A behavioral protocol has not been followed correctly.

exception scalems.exceptions.ScaleMSError

Base exception for scalems package errors.

Users should be able to use this base class to catch errors emitted by SCALE-MS.

exception scalems.exceptions.ScopeError

A command or reference is not valid in the current scope or Context.