legate::VariantOptions#

class VariantOptions#

A helper class for specifying variant options.

Public Functions

VariantOptions &with_concurrent(bool concurrent)#

Changes the value of the concurrent flag.

Parameters:: `concurrent` – A new value for the concurrent flag

VariantOptions &with_has_allocations(bool has_allocations)#

Changes the value of the has_allocations flag.

Parameters:: `has_allocations` – A new value for the has_allocations flag

VariantOptions &with_elide_device_ctx_sync(bool elide_sync)#

Sets whether the variant can elide device context synchronization after task completion.

See also

elide_device_ctx_sync

Parameters:: `elide_sync` – true if this variant can skip synchronizing the device context after task completion, false otherwise.
Returns:: reference to this.

VariantOptions &with_has_side_effect(bool side_effect)#

Sets whether the variant has side effects.

See also

has_side_effect.

Parameters:: side_effect – true if the task has side-effects, false otherwise.
Returns:: reference to this.

VariantOptions &with_may_throw_exception(bool may_throw)#

Sets whether the variant may throw exceptions.

See also

may_throw_exception.

Parameters:: may_throw – true if the variant may throw exceptions, false otherwise.
Returns:: reference to this.

inline VariantOptions &with_communicators( std::initializer_list<std::string_view> comms ) noexcept#

Sets the communicator(s) for the variant.

This call implies concurrent = true as well.

The VariantOptions does not take ownership of comms in any way. If comms are not constructed from a string-literal, or some other object with static storage duration, then the user must ensure that the string(s) outlives this object.

Due to limitations with constexpr in C++17, the user may register at most MAX_COMMS number of communicators. This restriction is expected to be lifted in the future.

See also

communicators.

Parameters:: comms – The communicator(s) to use.
Returns:: reference to this.

void populate_registrar( Legion::TaskVariantRegistrar &registrar ) const#

Populate a Legion::TaskVariantRegistrar using the options contained.

Parameters:: registrar – The registrar to fill out.

Public Members

bool concurrent = {false}#

Whether the variant needs a concurrent task launch. false by default.

Normally, leaf tasks (i.e. all individual task instances created by a single launch) are allowed to execute in any order so long as their preconditions are met. For example, if a task is launched that creates 100 leaf tasks, those tasks can execute at any time so long as each individual task’s inputs are satisfied. It is even possible to have other leaf tasks (from other tasks) executing at the same time or between them.

Setting concurrent to true says: if this task is parallelized, then all leaf tasks must execute concurrently. Note, concurrency is a requirement, not a grant. The entire machine must execute the tasks at exactly the same time as one giant block. No other tasks marked concurrent may execute at the same time.

Setting concurrent to false (the default) says: the task can execute as normal. The leaf tasks can execute in any order.

This feature is most often used when doing collective communications (i.e. all-reduce, all-gather) inside the tasks. In this case, the tasks need to execute in lockstep because otherwise deadlocks may occur.

Suppose there are 2 tasks (A and B) that do collectives. If they execute without concurrency, it is possible for half of the “task A” tasks and half of the “task B” tasks to be running at the same time. Eventually each of those tasks will reach a point where they must all-gather. The program would deadlock because both sides would be waiting for the communication that would never be able to finish.

For this reason, adding any communicators (see communicators) automatically implies concurrent = true.

bool has_allocations = {false}#: If the flag is true, the variant is allowed to create buffers (temporary or output) during execution. false by default.

bool elide_device_ctx_sync = {}#

Whether this variant can skip full device context synchronization after completion, or whether it can synchronize only on the task stream.

The user should typically set elide_device_ctx_sync = true for better performance, unless their task performs GPU work outside of its assigned stream. It is, in effect, a promise that the user task does not perform work on a stream other than the task’s stream. Or, if the task does do work on external streams, that those streams are synchronized (possibly asynchronously) against the task stream before leaving the task body.

The default is currently false for backwards compatibility, but it may default to true in the future.

When elide_device_ctx_sync = false:

The runtime will call the equivalent of cuCtxSynchronize() at the end of each GPU task.
This acts as a full device-wide barrier, ensuring any outstanding GPU work has completed.

When elide_device_ctx_sync = true:

The runtime may assume all GPU work was issued on the task’s stream.
Instead of a full synchronization, the runtime may insert stream dependencies for downstream tasks specific to each point, so any dependent work need only wait on the exact leaf task instance that produced it.
This avoids expensive context-wide synchronization, improving efficiency.

Has no effect on non-device variants (for example CPU variants).

See also

with_elide_device_ctx_sync()

Note

The synchronization schemes described here have no effect on Runtime::issue_execution_fence(). Execution fences wait until all prior tasks are “complete”, and since GPU work is treated as part of a task’s execution, a task is not considered “complete” until its stream is idle. As a result, an execution fence after a GPU task always has the same behavior, regardless of the synchronization scheme used. Either the runtime waits for the device-wide sync to finish, or it waits until all leaf-task streams are idle.

bool has_side_effect = {}#

Indicate whether a task has side effects outside of the runtime’s tracking that forbid it from replicated a task.

When a task only takes scalar stores, it gets replicated by default on all the ranks, as that’s more efficient than having only one of the ranks run it and broadcast the results.

However, sometimes a task may have “side effects” (which are outside the runtime’s tracking) which should otherwise forbid the runtime from replicating a particular variant.

For example, the task may write something to disk, or effect some other kind of permanent change to the system. In these cases the runtime must not replicate the task, as the effect must occur exactly once.

bool may_throw_exception = {}#

Whether this variant may throw an exception.

Tasks that throw exception must be handled specially by the runtime in order to safely and correctly propagate the thrown exceptions. For this reason, tasks must explicitly declare whether they throw an exception.

Warning

This special handling usually comes with severe performance penalties. For example, the runtime may block the calling thread (i.e. the main thread) on the completion of the possibly throwing task, or may opt not to schedule any other tasks concurrently.

Warning

It is highly recommended that tasks do not throw exceptions, and instead indicate an error state using some other way. Exceptions should be used as an absolute last resort.

std::optional<std::array<std::string_view, MAX_COMMS>> communicators = {}#

The communicator(s) to be used by the variant, or std::nullopt if no communicator is to be used.

Setting this to anything other than std::nullopt implies concurrent to be true.

Public Static Attributes

static auto MAX_COMMS = 3#

The maximum number of communicators allowed per variant.

This is a workaround for insufficient constexpr support in C++17 and will be removed in a future release.

class WithCommunicatorsAccessKey#