Python package reference

SCALE-MS - Scalable Adaptive Large Ensembles of Molecular Simulations.

This package provides Python driven data flow scripting and graph execution for molecular science computational research protocols.

Documentation is published online at https://scale-ms.readthedocs.io/.

Refer to https://github.com/SCALE-MS/scale-ms/wiki for development documentation.

Invocation:

ScaleMS scripts describe a workflow. To run the workflow, you must tell ScaleMS how to dispatch the work for execution.

For most use cases, you can run the script in the context of a particular execution scheme by using the -m Python command line flag to specify a ScaleMS execution module:

# Execute with the default local execution manager.
python -m scalems.local myworkflow.py
# Execute with the RADICAL Pilot based execution manager.
python -m scalems.radical myworkflow.py

Execution managers can be configured and used from within the workflow script, but the user has extra responsibility to properly shut down the execution manager, and the resulting workflow may be less portable. For details, refer to the documentation for particular WorkflowContexts.

Object model

When the scalems package is imported, a default Context is instantiated to manage the API session. The Python scripting interface allows a directed acyclic graph of Resources and resource dependencies to be declared for management by the context. Resources may be static or stateful. Resources have type and shape. Resource type may be a fundamental data type or a nested and structured type. An Operation is a stateful Resource whose type maps to a scalems compatible Function.

Interfaces

scalems Resource references are proxies to resources managed by the framework.

A Resource reference may be used as input to a scalems compatible function.

A Resource provides a Future interface if the Resource represents an immutable data event that can be converted to concrete data in the client context. Future.result() forces the framework to resolve any pending dependencies and blocks until a local object can be provided to the caller.

When a client uses a function to add work to a work flow, the function returns a reference to an Operation.

An Operation reference has (read-only) attributes for the named resources it provides. These resources may be nested.

Operations provide a run() method to force execution at the point of call. run() is an alias for Resource.result()

Execution Module

Every SCALE-MS object reference belongs to a workflow managed by a WorkflowManager. Workflows may be executed through different means and with different resources through distinct modules. Different middleware implementations may be accessed directly, but we recommend selecting a management module when invoking Python from the command line with the -m option.

See SCALE-MS invocation for usage information.

See scalems.invocation for more about Execution Modules.

Entry point

The entry point for a scalems workflow script is the function decorated with scalems.app

@scalems.app[source]

Annotate a callable for execution by SCALEMS.

Parameters:

func (Callable) –

Return type:

Callable

Basic functions

Core Function implementations provided with the SCALE-MS package.

scalems.executable(*args, manager=None, **kwargs)[source]

Execute a command line program.

Configure an executable to run in one (or more) subprocess(es). Executes when run in an execution Context, as part of a work graph. Process environment and execution mechanism depends on the execution environment, but is likely similar to (or implemented in terms of) the POSIX execvp system call.

Shell processing of argv is disabled to improve determinism. This means that shell expansions such as environment variables, globbing (*), and other special symbols (like ~ for home directory) are not available. This allows a simpler and more robust implementation, as well as a better ability to uniquely identify the effects of a command line operation. If you think this disallows important use cases, please let us know.

Parameters:
  • manager (WorkflowManager) – Workflow manager to which the work should be submitted.

  • args – a tuple (or list) to be the subprocess arguments, including the executable

args is required. Additional key words are optional.

Parameters:
  • outputs (Mapping) – labeled output files, mapping command line flag to one (or more) filenames.

  • inputs (Mapping) – labeled input files, mapping command line flag to one (or more) filenames.

  • environment (Mapping) – environment variables to be set in the process environment.

  • stdin (str) – source for posix style standard input file handle (default None).

  • stdout (str) – Capture standard out to a filesystem artifact, even if it is not consumed in the workflow.

  • stderr (str) – Capture standard error to a filesystem artifact, even if it is not consumed in the workflow.

  • resources (Mapping) – Name additional required resources, such as an MPI environment.

  • manager (WorkflowManager) –

Program arguments are iteratively added to the command line with standard Python iteration, so you should use a tuple or list even if you have only one parameter. I.e. If you provide a string with arguments="asdf" then it will be passed as ... "a" "s" "d" "f". To pass a single string argument, arguments=("asdf") or arguments=["asdf"].

inputs and outputs should be a dictionary with string keys, where the keys name command line “flags” or options.

Note that the Execution Context (e.g. RPContext, LocalContext, DockerContext) determines the handling of resources. Typical values in resources may include

  • procs_per_task (int): Number of processes to spawn for an instance of the exec.

  • threads_per_proc (int): Number of threads to allocate for each process.

  • gpus_per_task (int): Number of GPU devices to allocate for and instance of the exec.

  • launcher (str): Task launch mechanism, such as mpiexec.

Returns:

Output collection contains exitcode, stdout, stderr, file.

Parameters:

manager (WorkflowManager) –

The file output has the same keys as the outputs key word argument.

Example

Execute a command named exe that takes a flagged option for input and output file names (stored in a local Python variable my_filename and as the string literal 'exe.out') and an origin flag that uses the next three arguments to define a vector.

>>> my_filename = "somefilename"
>>> command = scalems.executable(
...    ('exe', '--origin', 1.0, 2.0, 3.0),
...    inputs={'--infile': scalems.file(my_filename)},
...    outputs={'--outfile': scalems.file('exe.out')})
>>> assert hasattr(command, 'file')
>>> import os
>>> assert os.path.exists(command.file['--outfile'].result())
>>> assert hasattr(command, 'exitcode')

TBD

Dynamic functions

TBD: Dynamic functions generate operations during graph execution.

Data shaping functions

TBD: Establish and manipulate data flow topology.

Helpers

TBD Tools for dynamically generating Functions.

Base classes

Logging

Python logging facilities use the built-in logging module.

Upon import, the scalems package sets a placeholder “NullHandler” to block propagation of log messages to the handler of last resort (and to sys.stderr).

If you want to see logging output on sys.stderr, attach a logging.StreamHandler to the ‘scalems’ logger.

Example:

character_stream = logging.StreamHandler()
# Optional: Set log level.
logging.getLogger('scalems').setLevel(logging.DEBUG)
character_stream.setLevel(logging.DEBUG)
# Optional: create formatter and add to character stream handler
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
character_stream.setFormatter(formatter)
# add handler to logger
logging.getLogger('scalems').addHandler(character_stream)

To handle log messages that are issued while importing scalems and its submodules, attach the handler before importing scalems. Note that if scalems.radical will be used, you should import radical.pilot before importing logging to avoid spurious warnings.

Refer to submodule documentation for hierarchical loggers to allow granular control of log handling (e.g. logging.getLogger('scalems.radical')). Refer to the Python logging module for information on connecting to and handling logger output.

Exceptions

Exceptions thrown by SCALE-MS are catchable as scalems.ScaleMSException.

Additional common exceptions are defined in this module. scalems submodules may define additional exceptions, but all will be derived from exceptions specified in scalems.exceptions.

exception scalems.exceptions.APIError[source]

Specified interfaces are being violated.

exception scalems.exceptions.ContextError[source]

A Context operation could not be performed.

exception scalems.exceptions.DispatchError[source]

SCALE-MS is unable to execute work or manage data in the requested environment.

exception scalems.exceptions.DuplicateKeyError[source]

An identifier is being reused in a situation where this is not supported.

exception scalems.exceptions.InternalError[source]

An otherwise unclassifiable error has occurred (a bug).

Please report the bug at https://github.com/SCALE-MS/scale-ms/issues

exception scalems.exceptions.MissingImplementationError[source]

The expected feature is not available.

This indicates a bug or incomplete implementation. If error message does not cite an existing tracked issue, please file a bug report. https://github.com/SCALE-MS/scale-ms/issues

exception scalems.exceptions.ProtocolError[source]

A behavioral protocol has not been followed correctly.

exception scalems.exceptions.ProtocolWarning[source]

Unexpected behavior is detected that is not fatal, but which may indicate a bug.

exception scalems.exceptions.ScaleMSError[source]

Base exception for scalems package errors.

Users should be able to use this base class to catch errors emitted by SCALE-MS.

exception scalems.exceptions.ScaleMSWarning[source]

Base Warning for scalems package warnings.

Users and testers should be able to use this base class to filter warnings emitted by SCALE-MS.

exception scalems.exceptions.ScopeError[source]

A command or reference is not valid in the current scope or Context.

scalems.exceptions.deprecated(explanation)[source]

Mark a deprecated definition.

Wraps a callable to issue a DeprecationWarning when called.

Use as a parameterized decorator:

@deprecated("func is deprecated because...")
def func():
    ...
Parameters:

explanation (str) –

class scalems.file.DataLocalizationError[source]

The requested file operation is not possible in the given context.

This may be because a referenced file exists in a different FileStore than the current default. Note that data localization is a potentially expensive operation and so must be explicitly requested by the user.

__init__(*args, **kwargs)
__new__(**kwargs)