SCALE-MS invocation

Users define and execute SCALE-MS workflows by using Python to define work and submit it for execution through a SCALE-MS workflow manager. The SCALE-MS machinery is accessible through the scalems Python module.

For the greatest flexibility in execution, scripts should be written without explicit reference to the execution environment. Instead, a SCALE-MS workflow manager Execution Modules can be specified on the command line to bootstrap an entry point.

python3 -m scalems.local myscript.py would use the workflow manager provided by the scalems.local module to process myscript.py. After the module performs some initialization, the script is essentially just imported. After that, though, specifically annotated callables (functions or function objects) are identified and submitted for execution. See scalems.app().

Command line execution

Use the --help command line option for an execution module for details about available and required command line arguments:

$ python -m scalems.radical --help
usage: python -m scalems.radical <scalems.radical args> script-to-run.py.py <script args>
...

The base command line parser is provided by scalems.invocation.base_parser(), extended (optionally) by the Execution Modules, and further extended by scalems.invocation.run(). Get usage for a particular backend with reference to the particular module.

python3 -m scalems.local –help

Unrecognized command line arguments will be passed along to the called script.

Documentation for built-in execution modules is shown below. Documentation also may be accessed from the command line with pydoc or from within the interpreter with help(). (E.g. pydoc scalems.radical)

Execution Modules

scalems provides the following built-in execution modules.

scalems.radical Python module

Workflow subpackage for ScaleMS execution dispatching to RADICAL Pilot.

Manage workflow context for RADICAL Pilot.

Command Line Invocation:

python -m scalems.radical \
    --resource local.localhost --venv $HOME/myvenv --access local myworkflow.py

For required and optional command line arguments:

python -m scalems.radical --help

or refer to the web-based documentation for scalems.radical.

The user is largely responsible for establishing appropriate RADICAL Cybertools (RCT) software environment at both the client side and the execution side. See RADICAL Pilot (RP) in the Installation and configuration for complete documentation.

See also

Refer to https://github.com/SCALE-MS/scale-ms/issues/141 for the status of scalems support for automatic execution environment bootstrapping.

scalems.radical

Command line interface for scalems.radical workflow execution module.

usage: python -m scalems.radical <scalems.radical args> script-to-run.py.py <script args>
script-to-run.py

The workflow script. Must contain a function decorated with scalems.app

-h, --help

show this help message and exit

--version

show program’s version number and exit

--log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG}

Optionally configure console logging to the indicated level.

--pycharm

Attempt to connect to PyCharm remote debugging system, where appropriate.

--access <access>

Explicitly specify the access_schema to use from the RADICAL resource.

--enable-raptor

Enable RP Raptor, and manage an execution side dispatching task.

--pilot-option <<key>=<value>>

Add a key value pair to the radical.pilot.PilotDescription.

--resource <resource>

Specify a Resource for the radical.pilot.PilotDescription. (Required)

--venv <path>

Path to the (pre-configured) Python virtual environment with which RP tasks should be executed. (Required. See also https://github.com/SCALE-MS/scale-ms/issues/90)

More notes on Python virtual environments

Pilot environment

The RADICAL Pilot remote software components launch from a Python virtual environment determined by parameters in the Resource definition.

By default, RADICAL Pilot resources are configured to bootstrap the target environment by creating a fresh virtual environment. (virtenv_mode=create and rp_version=local in most resource definitions.) virtenv_mode=update is a better choice than create, so that later sessions can re-use a previously bootstrapped pilot venv.

Static Pilot venv

To minimize the amount of bootstrapping RP performs for each Session, you can set up a completely static set of virtual environments with customized resource definitions in $HOME/.radical/pilot/configs/. Configure the Resource to use an existing virtenv and the RP installation it contains. Set virtenv_mode=use, virtenv=/path/to/venv, rp_version=installed in the RP resource definition.

Note

This optimization is relevant even for the local.localhost resource and local access scheme!

The user (or client) is then responsible for maintaining venv(s) with the correct RCT stack (matching the API used by the client-side RCT stack). Optionally, the same static venv may be used for task execution (see below), in which case the user must also maintain a compatible scalems installation, along with any other software dependencies of the workflow.

Task environment

shell command injection

RP TaskDescriptions allow environment preparation with lines of shell commands using pre_exec. (Note that, in addition to the attribute descriptions, RP docs include further discussion at the bottom of the TaskDescription class documentation section.)

See also

https://github.com/SCALE-MS/scale-ms/issues/203 for discussion on whether/how to expose this through scalems.radical.

static Task venv

Use --venv to specify the virtual environment in which tasks should execute at the target resource. The user is responsible for ensuring a compatible scalems installation in the target venv, as well as for satisfying any other workflow software dependencies.

Warning

When maintaining a venv for task execution, keep the RCT stack synchronized.

The scalems package depends on the RADICAL packages, but it is important that the RCT stack in the Worker environment is compatible with that in the Pilot agent environment and the client environment. If the Pilot resource is set to update (see above), the agent environment will be updated to the client-side versions automatically. When the task uses a separate environment, the user must separately update the environment named by --venv.

Ultimately, scalems.radical will provide more automatic assistance for this. (See https://github.com/SCALE-MS/scale-ms/issues/141). In the mean time, users should be aware that they need to update remote RADICAL installations whenever they update their client-side installation.

To reproduce the environment seen by your Tasks when interactively using the static venv, be sure to activate the venv.

If you are using a static venv for the Pilot resource, you may specify the Pilot venv path to --venv. You still must make sure that the venv provides scalems and the other workflow software dependencies. If you are using a dynamically maintained Pilot venv (create or update), then you should use a separate venv for your tasks.

named_env

Note

scalems does not currently use prepare_env() or named_env. See https://github.com/SCALE-MS/scale-ms/issues/90

scalems.radical is migrating towards more dynamic and automated Python environment preparation for workflow tasks.

RADICAL Pilot now allows a Task some explicitly Python-aware environment preparation, (though users are still free to activate Task venvs using pre_exec).

TaskDescription may use named_env to identify a virtual environment to be activated for the Task. The virtual environment may be an existing virtual environment, or a new environment.

In either case, to use named_env, prepare_env() must be called to register the named environment.

Warning

prepare_env() may enter hard-to-diagnose states with invalid virtual environments. https://github.com/radical-cybertools/radical.pilot/issues/2589 describes incompletely provisioned new virtual environments. But similar symptoms can occur when trying to reference existing virtual environments.

See https://github.com/SCALE-MS/scale-ms/issues/90 for discussion on how environments are named and provisioned, and how they are made available to tasks.

In addition to supporting named_env and the other task environment hooks, Master and Worker tasks have some of the RP stack injected into their environment. Raptor tasks are executed in new processes that are launched by the Worker interpreter process through various mechanisms, depending on task requirements. Various possible launch methods include forking from the Worker interpreter process. In other words, assumptions about the task Python environment are complicated, and it is best if we try to base the task environment on the Worker environment.

See also

Provisioning Workers for (groups of) Tasks. Issue #93.

scalems.radical dispatches (most) tasks through raptor “call” mode, so it constructs and uses a venv for the Worker (work in progress: Issue #90), and must not specify named_env for work load tasks.

scalems will be unable to infer all software dependencies, such as special package builds, or software managed outside of a supported Python package management system (e.g. CMake-driven LAMMPS installation, Plumed-enabled GROMACS). It is not yet clear in what way and to what extent scalems, radical.pilot, and users will interact to prepare, verify, and specify such software environments before or during run time.

See also

Provisioning the SCALE-MS task environment. Issue #141.

Pure Python execution

For some use cases (such as Jupyter notebooks), it may be preferable to configure the execution target and launch a workflow entirely from within Python.

Such use cases are not yet well-supported in scalems.

Refer to the test suite for examples, or follow https://github.com/SCALE-MS/scale-ms/issues/82.