Execution middleware layer

Executable graphs or graph segments produced by client software are dispatched and translated for execution on managed computing resources.

scalems.execution

Runtime management for the scalems client.

Provide the framework for the SCALE-MS execution middleware layer.

Execution dispatching implementations can be chosen from the command line when the interpreter is launched with Python’s -m command line option.

Alternatively, the script can use a Python Context Manager (a with block) to activate a WorkflowContext with an execution dispatcher implementation.

It is inadvisable to activate an execution dispatching environment with procedural calls because it is more difficult to make sure that acquired resources are properly released, in case the interpreter has to terminate prematurely.

Execution dispatching generally uses some sort of concurrency model, but there is not a consistent concurrency model for all dispatchers. ScaleMS provides abstractions to insulate scripting from particular implementations or concurrency primitives (such as the need to call asyncio.run()).

The following diagram shows the relationships between the WorkflowManager, the Executor, and the RuntimeManager for a particular execution backend. For a concrete example of execution on RADICAL Pilot, refer to the scalems.radical execution module documentation.

The interface available to @scalems.app is under development. See scalems.workflow.

The details of work dispatching are not yet strongly specified or fully encapsulated. manage_execution mediates a collaboration between a RuntimeManager and a WorkflowManager (via AbstractWorkflowUpdater).

class scalems.execution.AbstractWorkflowUpdater[source]

abstract async submit(*, item)[source]

Submit work to update a workflow item.

Parameters:: item (Task) – managed workflow item as produced by WorkflowManager.edit_item()
Returns:: watcher task for a coroutine that has already started.
Return type:: Task

The returned Task is a watcher task to confirm completion of or allow cancellation of the dispatched work. Additional hidden Tasks may be responsible for updating the provided item when results are available, so the result of this task is not necessarily directly interesting to the execution manager.

The caller check for unexpected cancellation or early failure. If the task has not failed or been canceled by the time submit() returns, then we can conclude that the task has been appropriately bound to the workfow item.

class scalems.execution.RuntimeManager[source]

Client side manager for dispatching work loads and managing data flow.

A RuntimeManager is instantiated for a scalems.workflow.WorkflowManager using the executor_factory provided to scalems.execution.dispatch.

__init__(*, editor_factory=None, datastore=None, loop, configuration)[source]

Parameters:

editor_factory (Callable[[], Callable]) –
datastore (FileStore) –
loop (AbstractEventLoop) –
configuration (_BackendT) –

runtime_configuration()[source]

Runtime startup context manager hook.

This allows arbitrary environment changes during the scope of the __aenter__() function, as well as a chance to modify the stored runtime configuration object.

Warning: this is probably not a permanent part of the interface. We should settle on whether to store a configuration/context object or rely on the current contextvars.Context.

runtime_shutdown(runtime)[source]

Shutdown hook for runtime facilities.

Called while exiting the context manager. Allows specialized handling of runtime backends in terms of the RuntimeManager instance and an abstract Runtime object, presumably acquired while entering the context manager through the runtime_startup hook.

The method is static as a reminder that all state should come through the runtime argument. The hook is used in the context manager implemented by the base class, which manages the removal of state information regardless of the actions (or errors) of subclasses using this hook.

abstract async runtime_startup()[source]

Runtime startup hook.

If the runtime manager uses a Session, this is the place to acquire it.

This coroutine itself returns a asyncio.Task. This allows the caller to yield until until the runtime manager is actually ready.

Implementations may perform additional checks before returning to ensure that the runtime queue processing task has run far enough to be well behaved if it later encounters an error or is subsequently canceled. Also, blocking interfaces used by the runtime manager could be wrapped in separate thread executors so that the asyncio event loop doesn’t have to block on this call.

Return type:: Task

abstract updater()[source]

Initialize a WorkflowUpdater for the configured runtime.

Return type:: AbstractWorkflowUpdater

get_edit_item: Callable[[], Callable]: Get the function that creates a WorkflowItem editing context.

runtime

Get/set the current runtime state information.

Attempting to overwrite an existing runtime state raises an APIError.

De-initialize the stored runtime state by calling del on the attribute.

To re-initialize, de-initialize and then re-assign.

Design Note:

This pattern allows runtime state objects to be removed from the instance data while potentially still in use (such as during clean-up), allowing clearer scoping of access to the runtime state object.

In the future, we may find it more practical to use a module-level “ContextVar” and to manage the scope in terms of PEP-567 Contexts. For the initial implementation, though, we are using module-level runtime configuration information, but storing the state information for initialized runtime facilities through this Data Descriptor on the RuntimeManager instance.

Raises:: APIError if state is not None and a runtime state already exists. –

scalems.execution.dispatch(workflow_manager, *, executor_factory, queuer=None, params=None)[source]

Enter the execution dispatching state.

Attach to a dispatching executor, then provide a scope for concurrent activity. This is also the scope during which the RADICAL Pilot Session exists.

Provide the executor with any currently-managed work in a queue. While the context manager is active, new work added to the queue will be picked up by the executor. When the context manager is exited, new work will resume queuing locally and the remote tasks will be resolved, then the dispatcher will be disconnected.

Currently, we tie the lifetime of the dispatcher to this context manager. When leaving the The with statement block, we trigger the executor to clean-up and wait for its task to complete. We may choose some other relationship in the future.

Parameters:

executor_factory – Implementation-specific callable to get a run time work manager.
queuer (Queuer) – A queue processor that will subscribe to the add_item hook to feed the executor.
params – a parameters object relevant to the execution back-end

async scalems.execution.manage_execution(executor, *, processing_state)[source]

Process workflow messages until a stop message is received.

Initial implementation processes commands serially without regard for possible concurrency.

Towards concurrency:

We can create all tasks without awaiting any of them.

Some tasks will be awaiting results from other tasks.

All tasks will be awaiting a asyncio.Lock or asyncio.Condition for each required resource, but must do so indirectly.

To avoid dead-locks, we can’t have a Lock object for each resource unless they are managed by an intermediary that can do some serialization of requests. In other words, we need a Scheduler that tracks the resource pool, packages resource locks only when they can all be acquired without race conditions or blocking, and which then notifies the Condition for each task that it is allowed to run.

It should not do so until the dependencies of the task are known to have all of the resources they need to complete (running with any dynamic dependencies also running) and, preferably, complete.

Alternatively, the Scheduler can operate in blocks, allocating all resources, offering the locks to tasks, waiting for all resources to be released, then repeating. We can allow some conditions to “wake up” the scheduler to back fill a block of resources, but we should be careful with that.

(We still need to consider dynamic tasks that generate other tasks. I think the only way to distinguish tasks which can’t be dynamic from those which might be would be with the Function definitions versus Coroutine function definition in the implementing function declaration. If we abstract Await expression with scalems.wait, we can throw an exception at execution time after checking a ContextVar. It may be better to just let implementers use Await expression for dynamically created tasks, but we need to make the same check if a function calls .result() or otherwise tries to create a dependency on an item that was not allocated resources before the function started executing. In a conservative first draft, we can simply throw an exception if a non-Coroutine function definition function attempts to call a scalems workflow command like “add_item” while in an executing context.)

Parameters:

executor (RuntimeManager) –
processing_state (Event) –

scalems.workflow

WorkflowManager and Item interfaces.

Information and resources supporting a defined workflow are managed in the scope of a WorkflowManager. All ScaleMS API calls and references occur in or between the scopes of WorkflowManager instances. When the ScaleMS API is active, there is a notion of a “current” WorkflowContext.

When the scalems module is simply imported, the default context is able to discover local workflow state from the working directory and manage a workflow representation, but it is not able to dispatch the work directly for execution.

SCALE-MS optimizes data flow and data locality in part by attributing all workflow references to well-defined scopes. Stateful API facilities, workflow state, and scoped references are managed as WorkflowManager instances.

“Workflow Manager” is an important parameter in several cases. Execution management tools like scalems.run(), scalems.wait(), and scalems.dispatch() update the workflow managed in a particular scope, possibly by interacting with other scopes. Commands for declaring work or data add items to specific instances of workflow managers. Internally, SCALEMS explicitly refers to theses scopes as manager, or context, depending on the nature of the code collaboration. manager may appear as an optional parameter for user-facing functions to allow a particular managed workflow to be specified.

This module supports scope() and get_scope() with internal module state. These tools interact with the context management of the asynchronous dispatching, but note that they are not thread-safe. scope() should not be used in a coroutine except in the root coroutine of a Task or otherwise within the scope of a contextvars.copy_context().run(). scalems will try to flag misuse by raising a ProtocolError, but please be sensible.

class scalems.workflow.WorkflowManager[source]

Composable context for SCALE-MS workflow management.

The WorkflowManager maintains the local metadata and provides the representation of workflow state. The workflow can be edited and state updated through several collaborations, but the WorkflowManager is not directly responsible for the dispatching of tasks for execution.

Notably, we rely on the Python contextmanager protocol to regulate the acquisition and release of resources, so SCALE-MS workflow contexts do not initialize Executors at creation. Instead, client code should use The with statement blocks for scoped initialization and shutdown of Executor roles.

For instance,

Implement a root context singleton and require acquisition of new Context handles through methods in this module.
Use abstract base class machinery to register Context implementations.
Require Context instances to track their parent Context, or otherwise participate in a single tree structure.
Prevent instantiation of Command references without a reference to a Context instance.

Notes

workflow manager maintains the workflow state without expensive or stateful volatile resources, and can mediate updates to the managed workflow at any time. Items enter the graph in an IDLE state. The WorkflowManager can provide Futures for the results of the managed items. For IDLE items, the WorkflowManager retains a weakref to the issued Futures, which it can use to make sure that there is only zero or one Future for a particular result.

WorkflowManager collaborates with Queuer to transition the graph to an “active” or “executing” state. This transition is mediated through the dispatcher_lock.

Queuer sequences and queues workflow items to be handled, pushing them to a dispatch_queue. No state change to the workflow item seems necessary at this time.

The dispatch_queue is read by an ExecutionManager. Items may be processed immediately or staged in a command_queue. Workflow items are then either SUBMITTED or BLOCKED (awaiting dependencies). Optionally, Items may be marked ELIGIBLE and re-queued for batch submission.

If the ExecutionManager is able to submit a task, the Task has a call-back registered for the workflow item. The WorkflowManager needs to convert any Future weakrefs to strong references when items are SUBMITTED, and the workflow Futures are subscribed to the item. Tasks are wrapped in a scalems object that the WorkflowManager is able to take ownership of. BLOCKED items are wrapped in Tasks which are subscribed to their dependencies (WorkflowItems should already be subscribed to WorkflowItem Futures for any dependencies) and stored by the ExecutionManager. When the call-backs for all of the dependencies indicate the Item should be processed into an upcoming workload, the Item becomes ELIGIBLE, and its wrapper Task (in collaboration with the ExecutionManager) puts it in the command_queue.

As an optimization, and to support co-scheduling, a WorkflowItem call-back can provide notification of state changes. For instance, a BLOCKED item may become ELIGIBLE once all of its dependencies are SUBMITTED, when the actual Executor has some degree of data flow management capabilities.

__init__(*, loop, directory=None)[source]

The event loop for the program should be launched in the root thread, preferably early in the application launch. Whether the WorkflowManager uses it directly, it is useful to require the client to provide the event loop, if for no other reason than to ensure that one exists.

Parameters:

loop (AbstractEventLoop) – event loop, such as from asyncio.new_event_loop()
directory (str | PathLike) – Filesystem path for the workflow file store.

add_item(item)[source]

Add an item to the managed workflow.

Parameters:: item – Describe the item to be managed. (Usually a task to be submitted.)
Returns:: View of the managed item.
Return type:: ItemView

close()[source]

Finalize and release resources.

While not usually necessary, an explicit call to close() can force the backing data store to be closed and allow exceptions to be caught at a clear point.

Once close() is called, additional attempts to access the managed workflow may raise ScopeError.

datastore()[source]: Get a handle to the non-volatile backing data store.

edit_item(identifier)[source]

Scoped workflow item editor.

Grant the caller full access to the managed task.

Return type:: Generator[Task, None, None]

item(identifier)[source]

Access an item in the managed workflow.

Return type:: ItemView

subscribe(event, callback)[source]

Register a callback for method call events.

Currently, only the add_item method provides a subscribable event.

Registered callbacks will be called before the subscribed-to method returns. The callback will be given a dictionary with the event name as a key and a value that is the same as the argument(s) to the method.

Unsubscribe with unsubscribe() to avoid unexpected behavior and to avoid prolonging the life of the object providing the callback.

Parameters:

event (str) –
callback (Callable[[_QItem_T], None]) –

scalems.workflow.get_scope()[source]: Get a reference to the manager of the current workflow scope.

scalems.workflow.scope(manager, close_on_exit=False)[source]

Set the current workflow management within a clear scope.

Restores the previous workflow management scope on exiting the context manager.

To allow better tracking of dispatching chains, this context manager does not allow the global workflow management scope to be “stolen”. If manager is already the current scope, a recursion depth is tracked, and the previous scope is restored only when the last “scope” context manager for manager exits. Multiple “scope” context managers are not allowed for different manager instances in the same context.

If close_on_exit=True, calls manager.close() when leaving manager’s scope for the last time, if possible.

Note

Scope indicates the “active” WorkflowManager instance. This is separate from WorkflowManager lifetime and ownership. WorkflowManagers should track their own activation status and provide logic for whether to allow reentrant dispatching.

Within the context managed by scope, get_scope() will return context.

While this context manager should be thread-safe, in general, this context manager should only be used in the root thread where the UI and event loop are running to ensure we can clean up properly. Dispatchers may provide environments in which this context manager can be used in non-root threads, but the Dispatcher needs to curate the contextvars.ContextVars and ensure that the Context is properly cleaned up.

scalems.workflow.workflow_item_director_factory(item, *, manager, label=None)[source]

scalems.workflow.workflow_item_director_factory(item_type, *, manager, label=None)

scalems.workflow.workflow_item_director_factory(item, *, manager, label=None)

Get a workflow item director for a workflow manager and input type.

When called, the director finalizes the new item and returns a view.

Parameters:

manager (WorkflowManager) –
label (str) –

Return type:

Callable[[…], ItemView]

scalems.context

Execution environment context management.

scalems.context.scoped_chdir(directory)[source]

Restore original working directory when exiting the context manager.

Caveats

Current working directory is a process-level property. To avoid unexpected behavior across threads, only one instance of this context manager may be active at a time. If necessary, we could allow for nested cwd contexts, but we cannot make this behavior thread-safe.

scalems.messages

Intra-component communication support.

Message types and protocol support for control signals, queue management, and work dispatching.

The QueueItem classes are simple key-value pairs for use in local message-passing queues.

The Command classes are richer structures that ensure (de)serializability for use between the client and (remote) runtime managers.

class scalems.messages.CommandQueueAddItem[source]

String-encoded add_item command for the Executor command queue.

The intended payload is an item to be added to the workflow graph: e.g. an operation, data reference, subgraph, or something meaningful to an AbstractWorkflowUpdater.

In practice, the value may just be a token or identifier. In the initial version of the scalems protocol, the value is passed to the item editor factory that is obtained from scalems.execution.RuntimeManager.get_edit_item().

class scalems.messages.CommandQueueControlItem[source]

String-encoded Command for the Executor command queue.

Instructions for the RuntimeManager, intercepted and processed by scalems.execution.manage_execution().

Currently, the only supported key is “command”.

Supported commands may grow to comprise a Compute Provide Interface.

class scalems.messages.QueueItem[source]

Queue items are either workflow items or control messages.

Control messages are indicated by the key 'command', and are described by CommandQueueControlItem.

Workflow items are indicated by the key 'add_item', and are described by CommandQueueAddItem.

Backends

See SCALE-MS invocation for user facing module interfaces.

Built-in Execution Modules include scalems.radical.

For command line usage, an Execution Modules should support interaction with the scalems.invocation module.

At least temporarily, we also have a non-normative internal Execution Module for executing serialized function calls as shell command lines.

scalems.call

Support command line execution of scalems packaged function calls.

Example

python -m scalems.call record.json

Note that this whole module is a workaround to avoid using RP raptor. scalems/call/__main__.py provides command line behavior for a radical.pilot.Task executable and arguments, with which we execute a serialized representation of a function call, pending restoration of full “raptor” support.

We don’t really need to reconcile the divergent notions of “Subprocess”, if we can remove the need for this entry point. But we _do_ need to formalize our serialized representation of a function call, and the task lifecycle and artifacts management.

class scalems.call.CallResult[source]

Result type (container) for a dispatched function call.

__init__(return_value=None, exception=None, stdout=None, stderr=None, directory=None, decoder='dill', encoder='dill')

Parameters:

return_value (Any | None) –
exception (str | None) –
stdout (str | None) –
stderr (str | None) –
directory (str | None) –
decoder (str) –
encoder (str) –

Return type:

None

directory: str | None = None: string-encoded URI for archive of the working directory after task execution.

exception: str | None = None: string representation of the Exception, if any.

return_value: Any | None = None

Return value, if any.

Value assumed to be round-trip serializable via encoder and decoder.

stderr: str | None = None: string-encoded URI for file holding captured stderr.

stdout: str | None = None: string-encoded URI for file holding captured stdout.

scalems.call.cli(*argv)[source]

Command line entry point.

Invoke with python -m scalems.call <args>

TODO: Configurable log level.

Parameters:: argv (str) –

scalems.call.main(call)[source]

Execute the packaged call.

Return a packaged result.

Parameters:: call (_Call) –
Return type:: CallResult

scalems.call.serialize_call(func, *, args=(), kwargs=None, requirements=None)[source]

Create a serialized representation of a function call.

This utility function is provided for stability while the serialization machinery and the CallPack structure evolve.

Parameters:

func (Callable) –
args (tuple) –
kwargs (dict) –
requirements (dict) –

Return type:

str

Additional internal details

async scalems.call.function_call_to_subprocess(func, *, label, args=(), kwargs=None, manager, requirements=None)[source]

Wrap a function call in a command line based on scalems.call.

Parameters:

func (Callable) – A callable Python object.
label (str) – Name (prefix) for identifying tasks and artifacts.
args (tuple) – Positional arguments for func.
kwargs (dict) – Key word arguments for func.
manager – workflow manager instance.
requirements (dict) – run time requirements (passed through to execution backend).

Returns:

Serializable representation of the executable task.

Return type:

_Subprocess

Warning

This is a temporary utility while we explore use cases and prepare to migrate back to radical.pilot.raptor. A user-facing tool should return a view on a workflow item, whereas this function produces even lower-level details.

Collaborations

async scalems.radical.task.subprocess_to_rp_task(call_handle, dispatcher)[source]

Dispatch a subprocess task through the scalems.radical execution backend.

Get a Future for a RPTaskResult (a collection representing a completed rp.Task).

Schedule a RP Task and wrap with asyncio. Subscribe a dependent asyncio.Task that can provide the intended result type and return it as the Future. Schedule a call-back to clean up temporary files.

Parameters:

call_handle (scalems.call._Subprocess) – result of a await on a scalems.call.function_call_to_subprocess()
dispatcher (RPDispatchingExecutor) – An active execution manager.

Return type:

RPTaskResult

async scalems.radical.task.wrapped_function_result_from_rp_task(subprocess, rp_task_result)[source]

Once subprocess_to_rp_task has produced a RPTaskResult for a scalems.call._Subprocess, we produce a scalems.call.CallResult with the help of some file transfers.

Parameters:

subprocess (_Subprocess) – the result of a scalems.call.function_call_to_subprocess
rp_task_result (RPTaskResult) – the result of a subprocess_to_rp_task for the subprocess

Returns:

localized result from scalems.call.function_call_to_subprocess()

Return type:

CallResult

class scalems.radical.task.RPTaskResult[source]

A collection of data and managed objects associated with a completed RP Task.

A rp.Task is associated with additional artifacts that are not directly tied to the rp.Task object, such as an arbitrary number of files that are not generally knowable a priori.

Design note:: This data structure is intentionally ignorant of wrapped Python functions. We should try to reconcile our multiple notions of RP task management as soon as possible. We should be careful to distinguish classes for managing traditional executable RP tasks from Python tasks managed with raptor functionality.

__init__(uid, task_dict, exit_code, final_state, directory, directory_archive)

Parameters:

uid (str) –
task_dict (dict) –
exit_code (int) –
final_state (str) –
directory (Url) –
directory_archive (Awaitable[FileReference]) –

Return type:

None

directory: Url

Resource location information for the Task working directory.

Constructed from the filesystem_endpoint. See Managing locality and localization.

directory_archive: Awaitable[FileReference]

An archive of the task working directory.

This result is separately awaitable to allow elision of unnecessary transfers. The exact implementation is not strongly specified; a Future usually implies that the result will eventually be produced, whereas a more general “awaitable” may not and may never be scheduled.

A future optimization should allow individual files to be selected and remotely extracted, but this would require more progress on the Scale-MS structured data typing model.

TODO: In the next iteration, use a non-local FileReference or (TBD) DirectoryReference.

exit_code: int: Exit code of the executable task.

final_state: str: Final state of the radical.pilot.Task.

task_dict: dict: Dictionary representation from radical.pilot.Task.as_dict().

uid: str: radical.pilot.Task.uid identifier.

Support for execution module authors: `scalems.invocation`

Python invocation of SCALE-MS workflow scripts.

Refer to SCALE-MS invocation for user-level documentation about SCALE-MS command line invocation.

The base command line parser is provided by scalems.invocation.base_parser(), extended (optionally) by the Execution Modules, and further extended by scalems.invocation.run(). Get usage for a particular backend with reference to the particular module.

Workflow Manager

Managed workflows are dispatched to custom execution back-ends through run(), which accepts a WorkflowManager creation function as its argument. Most of the customization hooks are provided through the implementing module. I.e. the manager_factory argument has its __module__ attribute queried to get the implementing module.

An execution back end <module> supports this invocation model by implementing a __main__ method that passes a WorkflowManager creation function and an executor_factory to scalems.invocation.run. For example, in the scalems.radical module, scalems/radical/__main__.py contains:

if __name__ == '__main__':
    sys.exit(scalems.invocation.run(
        manager_factory=scalems.radical.workflow_manager,
        executor_factory=scalems.radical.executor_factory))

Required Attributes

Execution back-end modules (modules providing a manager_factory) MUST provide the following module attribute(s).

<module>.parser: argparse.ArgumentParser for the execution module. For correct composition, see scalems.invocation.make_parser().

Optional Attributes

Execution back-end modules (modules providing a manager_factory) MAY provide the following module attribute(s) for hooks in run()

<module>.logger: logging.Logger instance to use for the invocation.

<module>.configuration: A callable to acquire a runtime module configuration. If present in module, run() calls module.configuration(args), where args is a argparse.Namespace object created by the module’s parser. The resulting configuration object is provided when invoking the executor_factory.

scalems.invocation.run(*, manager_factory, executor_factory, _loop=None)[source]

Execute boilerplate for scalems entry point scripts.

Provides consistent logic for command line invocation of a script with a main scalems.app function.

Supports invocation of the following form with minimal backend/__main__.py

python -m scalems.radical –venv=/path/to/venv –resource=local.localhost myscript.py arg1 –foo bar

See scalems.invocation module documentation for details about the expected manager_factory module interface.

Refer to module documentation for the execution backend for command line arguments. Unrecognized command line arguments will be passed along to the called script through modification to sys.argv.

Parameters:

executor_factory – Implementation-specific callable to get a run time work manager.
manager_factory (Callable[..., scalems.workflow.WorkflowManager]) – workflow manager creation function
_loop (AbstractEventLoop) –

utilities

Execution module authors should also be aware of the following utilities.

scalems.invocation.base_parser(add_help=False)[source]

Get the base scalems argument parser.

Provides a base argument parser for scripts or other module parsers.

By default, the returned ArgumentParser is created with add_help=False to avoid conflicts when used as a parent for a parser more local to the caller. If add_help is provided, it is passed along to the ArgumentParser created in this function.

Execution middleware layer

scalems.execution

scalems.workflow

scalems.context

scalems.messages

Backends

scalems.call

Additional internal details

Collaborations

Support for execution module authors: scalems.invocation

Workflow Manager

Required Attributes

Optional Attributes

utilities

Support for execution module authors: `scalems.invocation`