legate::VariantOptions#
-
class VariantOptions#
A helper class for specifying variant options.
Public Functions
-
VariantOptions &with_concurrent(bool concurrent)#
Changes the value of the
concurrent
flag.- Parameters:
`concurrent` – A new value for the
concurrent
flag
-
VariantOptions &with_has_allocations(bool has_allocations)#
Changes the value of the
has_allocations
flag.- Parameters:
`has_allocations` – A new value for the
has_allocations
flag
-
VariantOptions &with_elide_device_ctx_sync(bool elide_sync)#
Sets whether the variant can elide device context synchronization after task completion.
See also
- Parameters:
`elide_sync` –
true
if this variant can skip synchronizing the device context after task completion,false
otherwise.- Returns:
reference to
this
.
-
VariantOptions &with_has_side_effect(bool side_effect)#
Sets whether the variant has side effects.
See also
- Parameters:
side_effect –
true
if the task has side-effects,false
otherwise.- Returns:
reference to
this
.
-
VariantOptions &with_may_throw_exception(bool may_throw)#
Sets whether the variant may throw exceptions.
See also
- Parameters:
may_throw –
true
if the variant may throw exceptions,false
otherwise.- Returns:
reference to
this
.
- inline VariantOptions &with_communicators(
- std::initializer_list<std::string_view> comms
Sets the communicator(s) for the variant.
This call implies
concurrent = true
as well.The
VariantOptions
does not take ownership ofcomms
in any way. Ifcomms
are not constructed from a string-literal, or some other object with static storage duration, then the user must ensure that the string(s) outlives this object.Due to limitations with constexpr in C++17, the user may register at most
MAX_COMMS
number of communicators. This restriction is expected to be lifted in the future.See also
- Parameters:
comms – The communicator(s) to use.
- Returns:
reference to
this
.
- void populate_registrar(
- Legion::TaskVariantRegistrar ®istrar
Populate a Legion::TaskVariantRegistrar using the options contained.
- Parameters:
registrar – The registrar to fill out.
Public Members
-
bool concurrent = {false}#
Whether the variant needs a concurrent task launch.
false
by default.Normally, leaf tasks (i.e. all individual task instances created by a single launch) are allowed to execute in any order so long as their preconditions are met. For example, if a task is launched that creates 100 leaf tasks, those tasks can execute at any time so long as each individual task’s inputs are satisfied. It is even possible to have other leaf tasks (from other tasks) executing at the same time or between them.
Setting
concurrent
totrue
says: if this task is parallelized, then all leaf tasks must execute concurrently. Note, concurrency is a requirement, not a grant. The entire machine must execute the tasks at exactly the same time as one giant block. No other tasks marked concurrent may execute at the same time.Setting
concurrent
tofalse
(the default) says: the task can execute as normal. The leaf tasks can execute in any order.This feature is most often used when doing collective communications (i.e. all-reduce, all-gather) inside the tasks. In this case, the tasks need to execute in lockstep because otherwise deadlocks may occur.
Suppose there are 2 tasks (A and B) that do collectives. If they execute without concurrency, it is possible for half of the “task A” tasks and half of the “task B” tasks to be running at the same time. Eventually each of those tasks will reach a point where they must all-gather. The program would deadlock because both sides would be waiting for the communication that would never be able to finish.
For this reason, adding any communicators (see
communicators
) automatically impliesconcurrent = true
.
-
bool has_allocations = {false}#
If the flag is
true
, the variant is allowed to create buffers (temporary or output) during execution.false
by default.
-
bool elide_device_ctx_sync = {}#
Whether this variant can skip full device context synchronization after completion, or whether it can synchronize only on the task stream.
The user should typically set
elide_device_ctx_sync = true
for better performance, unless their task performs GPU work outside of its assigned stream. It is, in effect, a promise that the user task does not perform work on a stream other than the task’s stream. Or, if the task does do work on external streams, that those streams are synchronized (possibly asynchronously) against the task stream before leaving the task body.The default is currently
false
for backwards compatibility, but it may default totrue
in the future.When
elide_device_ctx_sync = false
:The runtime will call the equivalent of
cuCtxSynchronize()
at the end of each GPU task.This acts as a full device-wide barrier, ensuring any outstanding GPU work has completed.
When
elide_device_ctx_sync = true
:The runtime may assume all GPU work was issued on the task’s stream.
Instead of a full synchronization, the runtime may insert stream dependencies for downstream tasks specific to each point, so any dependent work need only wait on the exact leaf task instance that produced it.
This avoids expensive context-wide synchronization, improving efficiency.
Has no effect on non-device variants (for example CPU variants).
See also
Note
The synchronization schemes described here have no effect on
Runtime::issue_execution_fence()
. Execution fences wait until all prior tasks are “complete”, and since GPU work is treated as part of a task’s execution, a task is not considered “complete” until its stream is idle. As a result, an execution fence after a GPU task always has the same behavior, regardless of the synchronization scheme used. Either the runtime waits for the device-wide sync to finish, or it waits until all leaf-task streams are idle.
-
bool has_side_effect = {}#
Indicate whether a task has side effects outside of the runtime’s tracking that forbid it from replicated a task.
When a task only takes scalar stores, it gets replicated by default on all the ranks, as that’s more efficient than having only one of the ranks run it and broadcast the results.
However, sometimes a task may have “side effects” (which are outside the runtime’s tracking) which should otherwise forbid the runtime from replicating a particular variant.
For example, the task may write something to disk, or effect some other kind of permanent change to the system. In these cases the runtime must not replicate the task, as the effect must occur exactly once.
-
bool may_throw_exception = {}#
Whether this variant may throw an exception.
Tasks that throw exception must be handled specially by the runtime in order to safely and correctly propagate the thrown exceptions. For this reason, tasks must explicitly declare whether they throw an exception.
Warning
This special handling usually comes with severe performance penalties. For example, the runtime may block the calling thread (i.e. the main thread) on the completion of the possibly throwing task, or may opt not to schedule any other tasks concurrently.
Warning
It is highly recommended that tasks do not throw exceptions, and instead indicate an error state using some other way. Exceptions should be used as an absolute last resort.
Public Static Attributes
-
static auto MAX_COMMS = 3#
The maximum number of communicators allowed per variant.
This is a workaround for insufficient constexpr support in C++17 and will be removed in a future release.
-
class WithCommunicatorsAccessKey#
-
VariantOptions &with_concurrent(bool concurrent)#