SCALE-MS invocation
Users define and execute SCALE-MS workflows by using Python to define work and
submit it for execution through a SCALE-MS workflow manager.
The SCALE-MS machinery is accessible through the scalems Python module.
For the greatest flexibility in execution, scripts should be written without explicit reference to the execution environment. Instead, a SCALE-MS workflow manager Execution Modules can be specified on the command line to bootstrap an entry point.
python3 -m scalems.local myscript.py
would use the workflow manager provided by the scalems.local
module to process myscript.py. After the module performs some initialization,
the script is essentially just imported. After that, though, specifically annotated
callables (functions or function objects) are identified and submitted for execution.
See scalems.app().
Command line execution
Use the --help command line option for an execution module for details about
available and required command line arguments:
$ python -m scalems.radical --help
usage: python -m scalems.radical <scalems.radical args> script-to-run.py.py <script args>
...
The base command line parser is provided by scalems.invocation.base_parser(),
extended (optionally) by the Execution Modules, and further extended by
scalems.invocation.run(). Get usage for a particular backend with
reference to the particular module.
python3 -m scalems.local –help
Unrecognized command line arguments will be passed along to the called script.
Documentation for built-in execution modules is shown below.
Documentation also may be accessed from the command line with
pydoc
or from within the interpreter with help().
(E.g. pydoc scalems.radical)
Execution Modules
scalems provides the following built-in execution modules.
scalems.radical Python module
Workflow subpackage for ScaleMS execution dispatching to RADICAL Pilot.
Manage workflow context for RADICAL Pilot.
Command Line Invocation:
python -m scalems.radical \
--resource local.localhost --venv $HOME/myvenv --access local myworkflow.py
For required and optional command line arguments:
python -m scalems.radical --help
or refer to the web-based documentation for scalems.radical.
The user is largely responsible for establishing appropriate RADICAL Cybertools (RCT) software environment at both the client side and the execution side. See RADICAL Pilot (RP) in the Installation and configuration for complete documentation.
See also
Refer to https://github.com/SCALE-MS/scale-ms/issues/141
for the status of scalems support for automatic execution environment bootstrapping.
See also
scalems.radical
Command line interface for scalems.radical workflow execution module.
usage: python -m scalems.radical <scalems.radical args> script-to-run.py.py <script args>
- script-to-run.py
The workflow script. Must contain a function decorated with
scalems.app
- -h, --help
show this help message and exit
- --version
show program’s version number and exit
- --log-level {CRITICAL,ERROR,WARNING,INFO,DEBUG}
Optionally configure console logging to the indicated level.
- --pycharm
Attempt to connect to PyCharm remote debugging system, where appropriate.
- --access <access>
Explicitly specify the access_schema to use from the RADICAL resource.
- --enable-raptor
Enable RP Raptor, and manage an execution side dispatching task.
- --pilot-option <<key>=<value>>
Add a key value pair to the
radical.pilot.PilotDescription.
- --venv <path>
Path to the (pre-configured) Python virtual environment with which RP tasks should be executed. (Required. See also https://github.com/SCALE-MS/scale-ms/issues/90)
See also
More notes on Python virtual environments
Pilot environment
The RADICAL Pilot remote software components launch from a Python virtual environment determined by parameters in the Resource definition.
By default, RADICAL Pilot resources are configured to bootstrap the target
environment by creating a fresh virtual environment.
(virtenv_mode=create and rp_version=local in most
resource
definitions.)
virtenv_mode=update is a better choice than create, so that later
sessions can re-use a previously bootstrapped pilot venv.
Static Pilot venv
To minimize the amount of bootstrapping RP performs for each
Session,
you can set up a completely static set of virtual environments with customized
resource definitions in $HOME/.radical/pilot/configs/.
Configure the Resource to use an existing virtenv and the RP
installation it contains.
Set virtenv_mode=use, virtenv=/path/to/venv, rp_version=installed
in the RP resource definition.
Note
This optimization is relevant even for the local.localhost resource
and local access scheme!
The user (or client) is then responsible for maintaining venv(s) with the
correct RCT stack (matching the API used by the client-side RCT stack).
Optionally, the same static venv may be used for task execution (see below),
in which case the user must also maintain a compatible scalems installation,
along with any other software dependencies of the workflow.
Task environment
shell command injection
RP TaskDescriptions allow environment preparation with lines of shell commands
using pre_exec.
(Note that, in addition to the attribute descriptions,
RP docs include further discussion at the bottom of the
TaskDescription class documentation section.)
See also
https://github.com/SCALE-MS/scale-ms/issues/203
for discussion on whether/how to expose this through scalems.radical.
static Task venv
Use --venv to specify the virtual environment
in which tasks should execute at the target resource.
The user is responsible for ensuring a compatible scalems installation in the
target venv, as well as for satisfying any other workflow software dependencies.
Warning
When maintaining a venv for task execution, keep the RCT stack synchronized.
The scalems package depends on the RADICAL packages, but it is important that
the RCT stack in the Worker environment is
compatible with that in the Pilot agent environment and the client environment.
If the Pilot resource is set to update (see above),
the agent environment will be updated to the client-side versions automatically.
When the task uses a separate environment,
the user must separately update the environment named by
--venv.
Ultimately, scalems.radical will provide more automatic assistance for this.
(See https://github.com/SCALE-MS/scale-ms/issues/141). In the mean time,
users should be aware that they need to update remote RADICAL installations
whenever they update their client-side installation.
To reproduce the environment seen by your Tasks when interactively using the static venv, be sure to activate the venv.
If you are using a static venv for the Pilot resource,
you may specify the Pilot venv path to
--venv.
You still must make sure that the venv provides scalems and the other
workflow software dependencies.
If you are using a dynamically maintained Pilot venv (create or update),
then you should use a separate venv for your tasks.
Note
The scalems.radical --venv option is intended to be optional.
See https://github.com/SCALE-MS/scale-ms/issues/90 and
https://github.com/SCALE-MS/scale-ms/issues/141
named_env
Note
scalems does not currently use prepare_env() or named_env.
See https://github.com/SCALE-MS/scale-ms/issues/90
scalems.radical is migrating towards more dynamic and automated Python
environment preparation for workflow tasks.
RADICAL Pilot now allows a Task some explicitly
Python-aware environment preparation,
(though users are still free to activate Task venvs using
pre_exec).
TaskDescription may use named_env
to identify a virtual environment to be activated for the Task.
The virtual environment may be an existing virtual environment,
or a new environment.
In either case, to use named_env, prepare_env()
must be called to register the named environment.
Warning
prepare_env() may enter hard-to-diagnose states with invalid virtual environments. https://github.com/radical-cybertools/radical.pilot/issues/2589 describes incompletely provisioned new virtual environments. But similar symptoms can occur when trying to reference existing virtual environments.
See https://github.com/SCALE-MS/scale-ms/issues/90 for discussion on how environments are named and provisioned, and how they are made available to tasks.
In addition to supporting named_env
and the other task environment hooks,
Master and
Worker
tasks have some of the RP stack injected into their environment.
Raptor tasks are executed in new processes that are launched by the Worker
interpreter process through various mechanisms, depending on task requirements.
Various possible launch methods include forking from the
Worker interpreter process.
In other words, assumptions about the task Python environment are complicated,
and it is best if we try to base the task environment on the Worker environment.
See also
Provisioning Workers for (groups of) Tasks. Issue #93.
scalems.radical dispatches (most) tasks through
raptor “call” mode, so it constructs and uses a venv for the Worker
(work in progress:
Issue #90),
and must not specify named_env for work load tasks.
scalems will be unable to infer all software dependencies, such as special
package builds, or software managed outside of a supported Python package
management system
(e.g. CMake-driven LAMMPS installation, Plumed-enabled GROMACS).
It is not yet clear in what way and to what extent scalems, radical.pilot,
and users will interact to prepare, verify, and specify such software
environments before or during run time.
See also
Provisioning the SCALE-MS task environment. Issue #141.
Pure Python execution
For some use cases (such as Jupyter notebooks), it may be preferable to configure the execution target and launch a workflow entirely from within Python.
Such use cases are not yet well-supported in scalems.
Refer to the test suite for examples, or follow https://github.com/SCALE-MS/scale-ms/issues/82.