task#
- group Task
Classes and utilities to define tasks.
Typedefs
-
using SymbolicPoint = tuple<SymbolicExpr>#
A symbolic representation of points.
Symbolic points are used to capture mappings between points in different domains in a concise way. Each element of a symbolic point is a
SymbolicExpr
symbolically representing the coordinate of that dimension. AManualTask
can optionally pass for its logical store partition argument a symbolic point that describes a mapping from points in the launch domain to sub-stores in the partition.
Functions
- std::ostream &operator<<(
- std::ostream &out,
- const SymbolicExpr &expr
-
inline SymbolicExpr dimension(std::uint32_t dim)#
Constructs a
SymbolicExpr
representing coordinates of a dimension.- Parameters:
dim – The dimension index
- Returns:
A symbolic expression for the given dimension
-
inline SymbolicExpr constant(std::int32_t value)#
Constructs a
SymbolicExpr
representing a constant value.- Parameters:
value – The constant value to embed
- Returns:
A symbolic expression for the given constant
- std::ostream &operator<<(
- std::ostream &os,
- const VariantOptions &options
Variables
-
static const VariantOptions DEFAULT_OPTIONS = {}#
The default variant options used during task creation if no user-supplied options are given.
-
class Communicator
- #include <legate/comm/communicator.h>
A thin wrapper class for communicators stored in futures. This class only provides a template method to retrieve the communicator handle and the client is expected to pass the right handle type.
The following is the list of handle types for communicators supported in Legate:
NCCL: ncclComm_t*
CPU communicator in Legate: legate::comm::coll::CollComm*
CAL: cal_comm_t
Public Functions
-
template<typename T>
T get() const Returns the communicator stored in the wrapper.
- Template Parameters:
T – The type of communicator handle to get (see valid types above)
- Returns:
A communicator
-
class StreamView
- #include <legate/cuda/stream_pool.h>
A simple wrapper around CUDA streams to inject auxiliary features.
- Deprecated:
since 24.11: please provide your own implementation of this class
When
LEGATE_SYNC_STREAM_VIEW
is set to 1, everyStreamView
synchronizes the CUDA stream that it wraps when it is destroyed.Public Functions
-
inline explicit StreamView(CUstream stream)
Creates a
StreamView
with a raw CUDA stream.- Deprecated:
since 24.11: please provide your own implementation of this class
- Parameters:
stream – Raw CUDA stream to wrap
-
inline operator CUstream() const
Unwraps the raw CUDA stream.
- Deprecated:
since 24.11: please provide your own implementation of this class
- Returns:
Raw CUDA stream wrapped by the
StreamView
-
class StreamPool
- #include <legate/cuda/stream_pool.h>
A stream pool.
- Deprecated:
since 24.11: use legate::TaskContext::get_task_stream() instead
Public Functions
-
StreamView get_stream()
Returns a
StreamView
in the pool.- Deprecated:
since 24.11: use legate::TaskContext::get_task_stream() instead
- Returns:
A
StreamView
object. Currently, all stream views returned from this pool are backed by the same CUDA stream.
Public Static Functions
-
static StreamPool &get_stream_pool()
Returns a singleton stream pool.
- Deprecated:
since 24.11: use legate::TaskContext::get_task_stream() instead
The stream pool is alive throughout the program execution.
- Returns:
A
StreamPool
object
-
class SymbolicExpr
- #include <legate/operation/projection.h>
A class that symbolically represents coordinates.
A \(\mathtt{SymbolicExpr}(i, w, c)\) object denotes an expression \( w \cdot \mathit{dim}_i + c \), where \( \mathit{dim}_i \) corresponds to the coordinate of the \(i\)-th dimension. A special case is when \(i\) is \(-1\), which means the expression denotes a constant \(c\).
Public Functions
-
inline std::uint32_t dim() const
Returns the dimension index of this expression.
- Returns:
Dimension index
-
inline std::int32_t weight() const
Returns the weight for the coordinates.
- Returns:
Weight value
-
inline std::int32_t offset() const
Returns the offset of the expression.
- Returns:
Offset
-
inline bool is_identity(std::uint32_t dim) const
Indicates if the expression denotes an identity mapping for the given dimension.
- Parameters:
dim – The dimension for which the identity mapping is checked
- Returns:
true The expression denotes an identity mapping
- Returns:
false The expression does not denote an identity mapping
-
inline bool is_constant() const
Indicates if the expression denotes a constant.
- Returns:
true The expression denotes a constant
- Returns:
false The expression does not denote a constant
-
inline std::uint32_t dim() const
-
class AutoTask
- #include <legate/operation/task.h>
A class for auto-parallelized task descriptors.
Public Functions
-
Variable add_input(LogicalArray array)
Adds an array to the task as input.
Partitioning of the array is controlled by constraints on the partition symbol associated with the array
- Parameters:
array – An array to add to the task as input
- Returns:
The partition symbol assigned to the array
-
Variable add_output(LogicalArray array)
Adds an array to the task as output.
Partitioning of the array is controlled by constraints on the partition symbol associated with the array
- Parameters:
array – An array to add to the task as output
- Returns:
The partition symbol assigned to the array
- Variable add_reduction(
- LogicalArray array,
- ReductionOpKind redop_kind
Adds an array to the task for reductions.
Partitioning of the array is controlled by constraints on the partition symbol associated with the array
- Parameters:
array – An array to add to the task for reductions
redop_kind – ID of the reduction operator to use. The array’s type must support the operator.
- Returns:
The partition symbol assigned to the array
-
Variable add_reduction(LogicalArray array, std::int32_t redop_kind)
Adds an array to the task for reductions.
Partitioning of the array is controlled by constraints on the partition symbol associated with the array
- Parameters:
array – An array to add to the task for reductions
redop_kind – ID of the reduction operator to use. The array’s type must support the operator.
- Returns:
The partition symbol assigned to the array
-
Variable add_input(LogicalArray array, Variable partition_symbol)
Adds an array to the task as input.
Partitioning of the array is controlled by constraints on the partition symbol associated with the array
- Parameters:
array – An array to add to the task as input
partition_symbol – A partition symbol for the array
- Returns:
The partition symbol assigned to the array
-
Variable add_output(LogicalArray array, Variable partition_symbol)
Adds an array to the task as output.
Partitioning of the array is controlled by constraints on the partition symbol associated with the array
- Parameters:
array – An array to add to the task as output
partition_symbol – A partition symbol for the array
- Returns:
The partition symbol assigned to the array
- Variable add_reduction(
- LogicalArray array,
- ReductionOpKind redop_kind,
- Variable partition_symbol
Adds an array to the task for reductions.
Partitioning of the array is controlled by constraints on the partition symbol associated with the array
- Parameters:
array – An array to add to the task for reductions
redop_kind – ID of the reduction operator to use. The array’s type must support the operator.
partition_symbol – A partition symbol for the array
- Returns:
The partition symbol assigned to the array
- Variable add_reduction(
- LogicalArray array,
- std::int32_t redop_kind,
- Variable partition_symbol
Adds an array to the task for reductions.
Partitioning of the array is controlled by constraints on the partition symbol associated with the array
- Parameters:
array – An array to add to the task for reductions
redop_kind – ID of the reduction operator to use. The array’s type must support the operator.
partition_symbol – A partition symbol for the array
- Returns:
The partition symbol assigned to the array
-
void add_scalar_arg(const Scalar &scalar)
Adds a by-value scalar argument to the task.
- Parameters:
scalar – The Scalar to add to the task
-
template<typename T, typename = std::enable_if_t<!std::is_same_v<std::decay_t<T>, Scalar> && std::is_constructible_v<Scalar, T>>>
void add_scalar_arg( - T &&value
Adds a by-value scalar argument to the task.
-
void add_constraint(const Constraint &constraint)
Adds a partitioning constraint to the task.
- Parameters:
constraint – A partitioning constraint
-
Variable find_or_declare_partition(const LogicalArray &array)
Finds or creates a partition symbol for the given array.
- Parameters:
array – Array for which the partition symbol is queried
- Returns:
The existing symbol if there is one for the array, a fresh symbol otherwise
-
Variable declare_partition()
Declares partition symbol.
- Returns:
A new symbol that can be used when passing an array to an operation
-
std::string_view provenance() const
Returns the provenance information of this operation.
- Returns:
Provenance
-
void set_concurrent(bool concurrent)
Sets whether the task needs a concurrent task launch.
Any task with at least one communicator will implicitly use concurrent task launch, so this method is to be used when the task needs a concurrent task launch for a reason unknown to Legate.
- Parameters:
concurrent – A boolean value indicating whether the task needs a concurrent task launch
-
void set_side_effect(bool has_side_effect)
Sets whether the task has side effects or not.
A task is assumed to be free of side effects by default if the task only has scalar arguments.
- Parameters:
has_side_effect – A boolean value indicating whether the task has side effects
-
void throws_exception(bool can_throw_exception)
Sets whether the task can throw an exception or not.
- Parameters:
can_throw_exception – A boolean value indicating whether the task can throw an exception
-
void add_communicator(std::string_view name)
Requests a communicator for this task.
- Parameters:
name – The name of the communicator to use for this task
-
class Impl
-
Variable add_input(LogicalArray array)
-
class ManualTask
- #include <legate/operation/task.h>
A class for manually parallelized task descriptors.
Public Functions
-
void add_input(LogicalStore store)
Adds a store to the task as input.
The store will be unpartitioned but broadcasted to all the tasks
- Parameters:
store – A store to add to the task as input
- void add_input(
- LogicalStorePartition store_partition,
- std::optional<SymbolicPoint> projection = std::nullopt
Adds a store partition to the task as input.
- Parameters:
store_partition – A store partition to add to the task as input
projection – An optional symbolic point describing a mapping between points in the launch domain and substores in the partition
-
void add_output(LogicalStore store)
Adds a store to the task as output.
The store will be unpartitioned but broadcasted to all the tasks
- Parameters:
store – A store to add to the task as output
- void add_output(
- LogicalStorePartition store_partition,
- std::optional<SymbolicPoint> projection = std::nullopt
Adds a store partition to the task as output.
- Parameters:
store_partition – A store partition to add to the task as output
projection – An optional symbolic point describing a mapping between points in the launch domain and substores in the partition
-
void add_reduction(LogicalStore store, ReductionOpKind redop_kind)
Adds a store to the task for reductions.
The store will be unpartitioned but broadcasted to all the tasks
- Parameters:
store – A store to add to the task for reductions
redop_kind – ID of the reduction operator to use. The store’s type must support the operator.
-
void add_reduction(LogicalStore store, std::int32_t redop_kind)
Adds a store to the task for reductions.
The store will be unpartitioned but broadcasted to all the tasks
- Parameters:
store – A store to add to the task for reductions
redop_kind – ID of the reduction operator to use. The store’s type must support the operator.
- void add_reduction(
- LogicalStorePartition store_partition,
- ReductionOpKind redop_kind,
- std::optional<SymbolicPoint> projection = std::nullopt
Adds a store partition to the task for reductions.
- Parameters:
store_partition – A store partition to add to the task for reductions
redop_kind – ID of the reduction operator to use. The store’s type must support the operator.
projection – An optional symbolic point describing a mapping between points in the launch domain and substores in the partition
- void add_reduction(
- LogicalStorePartition store_partition,
- std::int32_t redop_kind,
- std::optional<SymbolicPoint> projection = std::nullopt
Adds a store partition to the task for reductions.
- Parameters:
store_partition – A store partition to add to the task for reductions
redop_kind – ID of the reduction operator to use. The store’s type must support the operator.
projection – An optional symbolic point describing a mapping between points in the launch domain and substores in the partition
-
void add_scalar_arg(const Scalar &scalar)
Adds a by-value scalar argument to the task.
- Parameters:
scalar – The Scalar to add to the task
-
template<typename T, typename = std::enable_if_t<!std::is_same_v<std::decay_t<T>, Scalar> && std::is_constructible_v<Scalar, T>>>
void add_scalar_arg( - T &&value
Adds a by-value scalar argument to the task.
-
std::string_view provenance() const
Returns the provenance information of this operation.
- Returns:
Provenance
-
void set_concurrent(bool concurrent)
Sets whether the task needs a concurrent task launch.
Any task with at least one communicator will implicitly use concurrent task launch, so this method is to be used when the task needs a concurrent task launch for a reason unknown to Legate.
- Parameters:
concurrent – A boolean value indicating whether the task needs a concurrent task launch
-
void set_side_effect(bool has_side_effect)
Sets whether the task has side effects or not.
A task is assumed to be free of side effects by default if the task only has scalar arguments.
- Parameters:
has_side_effect – A boolean value indicating whether the task has side effects
-
void throws_exception(bool can_throw_exception)
Sets whether the task can throw an exception or not.
- Parameters:
can_throw_exception – A boolean value indicating whether the task can throw an exception
-
void add_communicator(std::string_view name)
Requests a communicator for this task.
- Parameters:
name – The name of the communicator to use for this task
-
class Impl
-
void add_input(LogicalStore store)
-
class TaskException : public std::exception
- #include <legate/task/exception.h>
An exception class used in cross language exception handling.
Any client that needs to catch a C++ exception during task execution and have it rethrown on the launcher side should wrap that C++ exception with a
TaskException
. In case the task can raise more than one type of exception, they are distinguished by integer ids; the launcher is responsible for enumerating a list of all exceptions that can be raised and the integer ids are positions in that list.Subclassed by legate::detail::PythonTaskException
Public Functions
-
inline TaskException(std::int32_t index, std::string error_message)
Constructs a
TaskException
object with an exception id and an error message. The id must be a valid index for the list of exceptions declared by the launcher.- Parameters:
index – Exception id
error_message – Error message
-
inline explicit TaskException(std::string error_message)
Constructs a
TaskException
object with an error message. The exception id is set to 0.- Parameters:
error_message – Error message
-
inline std::int32_t index() const noexcept
Returns the exception id.
- Returns:
The exception id
-
inline const std::string &error_message() const noexcept
Returns the error message.
- Returns:
The error message
-
inline TaskException(std::int32_t index, std::string error_message)
-
class TaskRegistrar
- #include <legate/task/registrar.h>
A helper class for task variant registration.
The
legate::TaskRegistrar
class is designed to simplify the boilerplate that client libraries need to register all its task variants. The following is a boilerplate that each library needs to write:struct MyLibrary { static legate::TaskRegistrar& get_registrar(); }; template <typename T> struct MyLibraryTaskBase : public legate::LegateTask<T> { using Registrar = MyLibrary; ... };
In the code above, the
MyLibrary
has a static member that returns a singletonlegate::TaskRegistrar
object. Then, theMyLibraryTaskBase
points to the class so Legate can find where task variants are collected.Once this registrar is set up in a library, each library task can simply register itself with the
LegateTask::register_variants
method like the following:// In a header struct MyLibraryTask : public MyLibraryTaskBase<MyLibraryTask> { ... }; // In a C++ file static void __attribute__((constructor)) register_tasks() { MyLibraryTask::register_variants(); }
Public Functions
-
class Impl
-
class RecordTaskKey
-
class Impl
-
template<typename T>
class LegateTask - #include <legate/task/task.h>
A base class template for Legate task implementations.
Any Legate task class must inherit legate::LegateTask directly or transitively. The type parameter
T
needs to be bound to a child Legate task class that inherits legate::LegateTask.Currently, each task can have up to three variants. Each variant must be static member functions of the class under the following names and signatures:
void cpu_variant(legate::TaskContext)
: CPU implementation of the taskvoid gpu_variant(legate::TaskContext)
: GPU implementation of the taskvoid omp_variant(legate::TaskContext)
: OpenMP implementation of the task
Tasks must have at least one variant, and all task variants must be semantically equivalent (modulo some minor rounding errors due to floating point imprecision).
Each task class must also have a type alias
Registrar
that points to a library specific registrar class. (See legate::TaskRegistrar for details.)Tasks may also declare the following static members, which are used to populate defaults and other information in various circumstances. These are split into 2 categories, task-wide and per-variant. Task-wide members are usually used to describe invariants across the entire task, or, to provide default values across all task variants. Per-variant members on the other hand apply only to the specified variant, and override any task-wide settings where applicable.
Task-wide static members:
static const legate::TaskConfig TASK_CONFIG
: This specifies the task-wide configuration, such as the task ID, the task signature, and default variant options to be used.
Per-variant static members:
static constexpr VariantOptions CPU_VARIANT_OPTIONS
: Specifies the default variant options used when registering the CPU variant of the task.static constexpr VariantOptions OMP_VARIANT_OPTIONS
: Specifies the default variant options used when registering the OMP variant of the task.static constexpr VariantOptions GPU_VARIANT_OPTIONS
: Specifies the default variant options used when registering the GPU variant of the task.
If the default variant options are not present, the variant options for a given variant
v
are selected in the following order:#. The variant options (if any) supplied at the call-site of
register_variants()
. #. The default variant options (if any) found inXXX_VARIANT_OPTIONS
. #. (If defined) The default variant options found inTASK_CONFIG
(if any). #. The variant options provided byLibrary::get_default_variant_options()
. #. The global default variant options found inVariantOptions::DEFAULT_OPTIONS
.See also
See also
Note
Users are highly encouraged to use these static members to pre-declare their task and variant properties. In all cases, the same information can be supplied dynamically at either task registration, construction, or launch time, but doing so statically is preferred as the runtime is able to make more efficient decisions when scheduling or launching the tasks.
Subclassed by legate::detail::LegionTask< T >
Public Static Functions
- static void register_variants(
- std::map<VariantCode, VariantOptions> all_options = {}
Records all variants of this task in a registrar.
Registers the variant with the task registrar (pointed to by the task’s static type alias
Registrar
, seeTaskRegistrar
for details about setting up a registrar in a library).The registration of the task is deferred until such time as
TaskRegistrar::register_all_tasks()
is called.The task must have a static
TASK_CONFIG
member defined. Failure to do so is diagnosed at compile-time.- Parameters:
all_options – Options for task variants. Variants with no entries in
all_options
will use the default set of options as discussed in the class description.
- static void register_variants(
- Library library,
- const std::map<VariantCode, VariantOptions> &all_options = {}
Registers all variants of this task immediately.
Registration of the task is performed immediately.
The value of
T::TASK_CONFIG.task_id()
is used as the task id.- Parameters:
library – Library to which the task should be registered.
all_options – Options for task variants. Variants with no entries in
all_options
will use the default set of options as discussed in the class description.
- static void register_variants(
- Library library,
- LocalTaskID task_id,
- const std::map<VariantCode, VariantOptions> &all_options = {}
Registers all variants of this task immediately.
Registration of the task is performed immediately.
In almost all cases, the user should prefer the
TaskConfig
overload to this method, as it allows specifying additional task properties.- Parameters:
library – Library to which the task should be registered.
task_id – Task id.
all_options – Options for task variants. Variants with no entries in
all_options
will use the default set of options as discussed in the class description.
- static void register_variants(
- Library library,
- const TaskConfig &task_config,
- const std::map<VariantCode, VariantOptions> &all_options = {}
Registers all variants of this task immediately.
Registration of the task is performed immediately.
- Parameters:
library – Library to which the task should be registered.
task_config – The task configuration.
all_options – Options for task variants. Variants with no entries in
all_options
will use the default set of options as discussed in the class description.
-
class TaskConfig
- #include <legate/task/task_config.h>
A class representing the configuration of a task.
This class provides methods for constructing a task configuration, setting various task options, and retrieving information about the task configuration.
Public Functions
-
TaskConfig()
Deleted default constructor.
The default constructor is deleted to prevent creating a
TaskConfig
object without specifying a task ID.
-
explicit TaskConfig(LocalTaskID task_id)
Construct a TaskConfig.
- Parameters:
task_id – The local ID of the task.
-
TaskConfig &with_signature(const TaskSignature &signature)
Set the task signature for this task.
- Parameters:
signature – The task signature to associate with the task.
- Returns:
A reference to
this
.
-
TaskConfig &with_variant_options(const VariantOptions &options)
Set the variant options for this task.
- Parameters:
options – The variant options to associate with the task.
- Returns:
A reference to
this
.
-
LocalTaskID task_id() const
- Returns:
The local task ID for this task.
-
std::optional<TaskSignature> task_signature() const
- Returns:
The task signature, if set,
std::nullopt
otherwise.
- std::optional<std::reference_wrapper<const VariantOptions>> variant_options(
- Returns:
The variant options, if set,
std::nullopt
otherwise.
-
TaskConfig()
-
class TaskContext
- #include <legate/task/task_context.h>
A task context that contains task arguments and communicators.
Public Functions
-
GlobalTaskID task_id() const noexcept
Returns the global ID of the task.
- Returns:
The global task id
-
VariantCode variant_kind() const noexcept
Returns the Legate variant kind of the task.
- Returns:
The variant kind
-
PhysicalArray input(std::uint32_t index) const
Returns an input array of the task.
- Parameters:
index – Index of the array
- Returns:
Array
-
std::vector<PhysicalArray> inputs() const
Returns all input arrays of the task.
- Returns:
Vector of arrays
-
PhysicalArray output(std::uint32_t index) const
Returns an output array of the task.
- Parameters:
index – Index of the array
- Returns:
Array
-
std::vector<PhysicalArray> outputs() const
Returns all output arrays of the task.
- Returns:
Vector of arrays
-
PhysicalArray reduction(std::uint32_t index) const
Returns a reduction array of the task.
- Parameters:
index – Index of the array
- Returns:
Array
-
std::vector<PhysicalArray> reductions() const
Returns all reduction arrays of the task.
- Returns:
Vector of arrays
-
Scalar scalar(std::uint32_t index) const
Returns a by-value argument of the task.
- Parameters:
index – Index of the scalar
- Returns:
-
std::vector<Scalar> scalars() const
Returns by-value arguments of the task.
- Returns:
Vector of scalars
-
const comm::Communicator &communicator(std::uint32_t index) const
Returns a communicator of the task.
If a task launch ends up emitting only a single point task, that task will not get passed a communicator, even if one was requested at task launching time. Therefore, tasks using communicators should be prepared to handle the case where the returned vector is empty.
- Parameters:
index – Index of the communicator
- Returns:
Communicator
-
const std::vector<comm::Communicator> &communicators() const
Returns communicators of the task.
If a task launch ends up emitting only a single point task, that task will not get passed a communicator, even if one was requested at task launching time. Therefore, most tasks using communicators should be prepared to handle the case where the returned vector is empty.
- Returns:
Vector of communicators
-
std::size_t num_inputs() const
Returns the number of task’s inputs.
- Returns:
Number of arrays
-
std::size_t num_outputs() const
Returns the number of task’s outputs.
- Returns:
Number of arrays
-
std::size_t num_reductions() const
Returns the number of task’s reductions.
- Returns:
Number of arrays
-
std::size_t num_communicators() const
Returns the number of communicators.
- Returns:
Number of communicators
-
bool is_single_task() const
Indicates whether the task is parallelized.
- Returns:
true The task is a single task
- Returns:
false The task is one in a set of multiple parallel tasks
-
bool can_raise_exception() const
Indicates whether the task is allowed to raise an exception.
- Returns:
true The task can raise an exception
- Returns:
false The task must not raise an exception
-
const DomainPoint &get_task_index() const
Returns the point of the task. A 0D point will be returned for a single task.
- Returns:
The point of the task
-
const Domain &get_launch_domain() const
Returns the task group’s launch domain. A single task returns an empty domain.
- Returns:
The task group’s launch domain
-
mapping::TaskTarget target() const
Returns the kind of processor executing this task.
- Returns:
The processor kind
-
void concurrent_task_barrier()
Perform a blocking barrier across all the leaf tasks in a concurrent task launch.
When a leaf task invokes this operation, control will not return to the task until all the leaf tasks in the same launch have executed the same barrier.
This is useful e.g. to work around NCCL deadlocks, that can be triggered when another concurrent CUDA operation creates a false dependence or resource conflict with the resident NCCL kernels. By performing a barrier before and after every NCCL collective operation happening inside the leaf tasks in a concurrent task launch, we can effectively isolate the execution of the NCCL collective from all other CUDA work, thus preventing the deadlock. In more detail:
put a barrier before the collective operation
emit the collective operation
ensure that NCCL has actually emitted all its operations on the stream (e.g.
ncclGroupEnd
has been called, if grouping operations)perform another barrier
// The barrier must happen before the NCCL calls begin context.concurrent_task_barrier(); auto result = ncclAllGather(p_send, p_recv, 1, ncclUint64, *comm, stream); EXPECT_EQ(result, ncclSuccess); // And insert a barrier after all NCCL calls return, to ensure that all ranks have // emitted the NCCL calls context.concurrent_task_barrier();
This operation can only be performed inside leaf tasks (not on the top-level task), and only in variants that have been declared as concurrent. All leaf tasks in a launch must take part in the barrier (it cannot be done only on a subset of them). Breaking any of the previously stated invariants is a fatal error.
-
CUstream_st *get_task_stream() const
Get the current task CUDA stream.
All asynchronous stream work performed by a GPU variant must be performed on, or synchronized with the stream returned by this method. Doing asynchronous work on other streams and failing to encode those dependencies (or otherwise synchronizing them) on this stream will result in undefined behavior.
If the current task is not a GPU task, or does not have GPU support enabled, this method returns
nullptr
.- Returns:
The current tasks CUDA stream.
-
GlobalTaskID task_id() const noexcept
-
class TaskInfo
- #include <legate/task/task_info.h>
An object describing a Legate task registration info.
Public Functions
-
explicit TaskInfo(std::string task_name)
Construct a
TaskInfo
.- Parameters:
task_name – The name of the task.
Construct a
TaskInfo
.- Parameters:
impl – A pointer to the implementation class.
-
std::string_view name() const
- Returns:
The name of the task.
-
std::optional<VariantInfo> find_variant(VariantCode vid) const
Look up a variant of the task.
See also
- Parameters:
vid – The variant to look up.
- Returns:
An optional containing the
VariantInfo
for the variant, orstd::nullopt
if the variant was not found.
- void add_variant_(
- AddVariantKey,
- const Library &library,
- VariantCode vid,
- VariantImpl body,
- Processor::TaskFuncPtr entry,
- const TaskConfig &task_config,
- const VariantOptions *decl_options,
- const std::map<VariantCode, VariantOptions> ®istration_options = {}
Register a new variant to the task description.
- Parameters:
library – The library to retrieve the default variant options from.
vid – The variant type to register.
body – The variant function pointer.
entry – The pointer to the entry point wrapping
body
, to be passed to Legion.task_config – The task-wide configuration options.
decl_options – Any variant options declared in the task declaration, or
nullptr
if none were found.registration_options – Variant options specified at task registration time.
-
template<typename T>
void add_variant_( - AddVariantKey,
- const Library &library,
- VariantCode vid,
- LegionVariantImpl<T> body,
- Processor::TaskFuncPtr entry,
- const TaskConfig &task_config,
- const VariantOptions *decl_options,
- const std::map<VariantCode, VariantOptions> ®istration_options = {}
Register a new variant to the task description.
- Parameters:
library – The library to retrieve the default variant options from.
vid – The variant type to register.
body – The variant function pointer.
entry – The pointer to the entry point wrapping
body
, to be passed to Legion.task_config – The task-wide configuration options.
decl_options – Any variant options declared in the task declaration, or
nullptr
if none were found.registration_options – Variant options specified at task registration time.
-
std::string to_string() const
- Returns:
A human-readable representation of the Task.
-
class AddVariantKey
-
explicit TaskInfo(std::string task_name)
-
class TaskSignature
- #include <legate/task/task_signature.h>
A helper class for specifying a task’s call signature.
This class is used to statically declare a task’s expected signature. For example:
legate::TaskSignature{} // The task expects exactly 2 inputs... .inputs(2) // But may take at least 3 and no more than 5 outputs... .outputs(3, 5) // While taking an unbounded number of scalars (but must have at least 1) .scalars(1, legate::TaskSignature::UNBOUNDED) // With the following constraints imposed on the arguments .constraints( {{// Align the first input with the first output legate::align(legate::proxy::inputs[0], legate::proxy::outputs[0]), // Broadcast ALL inputs legate::broadcast(legate::proxy::inputs), // All arguments (including axes) of constraints are supported legate::scale({1, 2, 3}, legate::proxy::outputs[1], legate::proxy::inputs[1])}});
A tasks signature describes how many input, output, scalar, or reduction arguments they take, as well as any constraints that are to be applied to the task. If a task predeclares its signature in this manner, the runtime will be able to perform a number of optimizations and sanity-checks for the user, including (but not limited to):
Checking the number of arguments matches the expected signature (and raising exceptions if not).
Automatically applying constraints on task arguments.
Improved scheduling of tasks.
Note
While it is highly recommended that user statically declare their tasks’ signatures, the user is no longer allowed to deviate from the signature at runtime. For example, tasks that predeclare their constraints are not allowed to add additional constraints during task launch.
Public Functions
-
TaskSignature()
Default-construct an empty TaskSignature.
-
TaskSignature &inputs(std::uint32_t n) noexcept
Set the number of input arguments taken by the task.
If
n
isUNBOUNDED
, it signifies that the task takes a variable (or possibly unknown) number of input arguments. Otherwise, this call signifies that a task takes exactlyn
input arguments, no more, no less.See also
- Parameters:
n – The argument specification.
- Returns:
A reference to this.
- TaskSignature &inputs(
- std::uint32_t low_bound,
- std::uint32_t upper_bound
Set the number of input arguments taken by the task.
This call signifies that a task takes at least
low_bound
but no more thanupper_bound
number of input arguments. Ifupper_bound
isUNBOUNDED
, then the task takes at leastlow_bound
number of arguments, but can take an unlimited number of arguments past that.If given,
upper_bound
must be strictly greater thanlow_bound
.See also
- Parameters:
low_bound – The lower bound on the number of input arguments.
upper_bound – The upper bound on the number of input arguments.
- Throws:
std::out_of_range – If
upper_bound
<=low_bound
.- Returns:
A reference to this.
-
TaskSignature &outputs(std::uint32_t n) noexcept
Set the number of output arguments taken by the task.
If
n
isUNBOUNDED
, it signifies that the task takes a variable (or possibly unknown) number of output arguments. Otherwise, this call signifies that a task takes exactlyn
output arguments, no more, no less.See also
- Parameters:
n – The argument specification.
- Returns:
A reference to this.
- TaskSignature &outputs(
- std::uint32_t low_bound,
- std::uint32_t upper_bound
Set the number of output arguments taken by the task.
This call signifies that a task takes at least
low_bound
but no more thanupper_bound
number of output arguments. Ifupper_bound
isUNBOUNDED
, then the task takes at leastlow_bound
number of arguments, but can take an unlimited number of arguments past that.If given,
upper_bound
must be strictly greater thanlow_bound
.See also
- Parameters:
low_bound – The lower bound on the number of output arguments.
upper_bound – The upper bound on the number of output arguments.
- Throws:
std::out_of_range – If
upper_bound
<=low_bound
.- Returns:
A reference to this.
-
TaskSignature &scalars(std::uint32_t n) noexcept
Set the number of scalar arguments taken by the task.
If
n
isUNBOUNDED
, it signifies that the task takes a variable (or possibly unknown) number of scalar arguments. Otherwise, this call signifies that a task takes exactlyn
scalar arguments, no more, no less.See also
- Parameters:
n – The argument specification.
- Returns:
A reference to this.
- TaskSignature &scalars(
- std::uint32_t low_bound,
- std::uint32_t upper_bound
Set the number of scalar arguments taken by the task.
This call signifies that a task takes at least
low_bound
but no more thanupper_bound
number of scalar arguments. Ifupper_bound
isUNBOUNDED
, then the task takes at leastlow_bound
number of arguments, but can take an unlimited number of arguments past that.If given,
upper_bound
must be strictly greater thanlow_bound
.See also
- Parameters:
low_bound – The lower bound on the number of scalar arguments.
upper_bound – The upper bound on the number of scalar arguments.
- Throws:
std::out_of_range – If
upper_bound
<=low_bound
.- Returns:
A reference to this.
-
TaskSignature &redops(std::uint32_t n) noexcept
Set the number of redop arguments taken by the task.
If
n
isUNBOUNDED
, it signifies that the task takes a variable (or possibly unknown) number of redop arguments. Otherwise, this call signifies that a task takes exactlyn
redop arguments, no more, no less.See also
- Parameters:
n – The argument specification.
- Returns:
A reference to this.
- TaskSignature &redops(
- std::uint32_t low_bound,
- std::uint32_t upper_bound
Set the number of redop arguments taken by the task.
This call signifies that a task takes at least
low_bound
but no more thanupper_bound
number of redop arguments. Ifupper_bound
isUNBOUNDED
, then the task takes at leastlow_bound
number of arguments, but can take an unlimited number of arguments past that.If given,
upper_bound
must be strictly greater thanlow_bound
.See also
- Parameters:
low_bound – The lower bound on the number of redop arguments.
upper_bound – The upper bound on the number of redop arguments.
- Throws:
std::out_of_range – If
upper_bound
<=low_bound
.- Returns:
A reference to this.
- TaskSignature &constraints(
- std::optional<Span<const ProxyConstraint>> constraints
Set the constraints imposed on task arguments.
Passing
std::nullopt
vs passing an empty range has different meanings:If
std::nullopt
is passed, this is taken to mean that an unknown number of dynamic constraints (of that type) may be imposed on the task during launch.Passing an empty range signifies that exactly 0 constraints of the given type must be imposed on the task during launch.
If any constraints are imposed via the use of this API (including empty ranges), tasks are no longer allowed to add constraints dynamically during task construction.
AutoTask::add_constraint()
will raise an exception in this case.- Parameters:
constraints – The constraints, or
std::nullopt
if none are imposed.- Returns:
A reference to this.
Public Static Attributes
-
static auto UNBOUNDED = std::numeric_limits<std::uint32_t>::max()
A value indicating that a particular option has “unbounded” (or unknown) number of possibilities.
This is commonly used for e.g.
inputs()
,outputs()
,scalars()
, orredops()
when a task takes an unknown number of arguments, or when the upper limit on the number of arguments is unknown.
-
class VariantInfo
- #include <legate/task/variant_info.h>
A class describing the various properties of a task variant.
Public Functions
-
const VariantOptions &options() const noexcept
See also
- Returns:
Get the variant options sets for this variant.
-
const VariantOptions &options() const noexcept
-
class VariantOptions
- #include <legate/task/variant_options.h>
A helper class for specifying variant options.
Public Functions
-
VariantOptions &with_concurrent(bool concurrent)
Changes the value of the
concurrent
flag.- Parameters:
`concurrent` – A new value for the
concurrent
flag
-
VariantOptions &with_has_allocations(bool has_allocations)
Changes the value of the
has_allocations
flag.- Parameters:
`has_allocations` – A new value for the
has_allocations
flag
-
VariantOptions &with_elide_device_ctx_sync(bool elide_sync)
Sets whether the variant can elide device context synchronization after task completion.
See also
- Parameters:
`elide_sync` –
true
if this variant can skip synchronizing the device context after task completion,false
otherwise.- Returns:
reference to
this
.
-
VariantOptions &with_has_side_effect(bool side_effect)
Sets whether the variant has side effects.
See also
- Parameters:
side_effect –
true
if the task has side-effects,false
otherwise.- Returns:
reference to
this
.
-
VariantOptions &with_may_throw_exception(bool may_throw)
Sets whether the variant may throw exceptions.
See also
- Parameters:
may_throw –
true
if the variant may throw exceptions,false
otherwise.- Returns:
reference to
this
.
- inline VariantOptions &with_communicators(
- std::initializer_list<std::string_view> comms
Sets the communicator(s) for the variant.
This call implies
concurrent = true
as well.The
VariantOptions
does not take ownership ofcomms
in any way. Ifcomms
are not constructed from a string-literal, or some other object with static storage duration, then the user must ensure that the string(s) outlives this object.Due to limitations with constexpr in C++17, the user may register at most
MAX_COMMS
number of communicators. This restriction is expected to be lifted in the future.See also
- Parameters:
comms – The communicator(s) to use.
- Returns:
reference to
this
.
- void populate_registrar(
- Legion::TaskVariantRegistrar ®istrar
Populate a Legion::TaskVariantRegistrar using the options contained.
- Parameters:
registrar – The registrar to fill out.
Public Members
-
bool concurrent = {false}
Whether the variant needs a concurrent task launch.
false
by default.Normally, leaf tasks (i.e. all individual task instances created by a single launch) are allowed to execute in any order so long as their preconditions are met. For example, if a task is launched that creates 100 leaf tasks, those tasks can execute at any time so long as each individual task’s inputs are satisfied. It is even possible to have other leaf tasks (from other tasks) executing at the same time or between them.
Setting
concurrent
totrue
says: if this task is parallelized, then all leaf tasks must execute concurrently. Note, concurrency is a requirement, not a grant. The entire machine must execute the tasks at exactly the same time as one giant block. No other tasks marked concurrent may execute at the same time.Setting
concurrent
tofalse
(the default) says: the task can execute as normal. The leaf tasks can execute in any order.This feature is most often used when doing collective communications (i.e. all-reduce, all-gather) inside the tasks. In this case, the tasks need to execute in lockstep because otherwise deadlocks may occur.
Suppose there are 2 tasks (A and B) that do collectives. If they execute without concurrency, it is possible for half of the “task A” tasks and half of the “task B” tasks to be running at the same time. Eventually each of those tasks will reach a point where they must all-gather. The program would deadlock because both sides would be waiting for the communication that would never be able to finish.
For this reason, adding any communicators (see
communicators
) automatically impliesconcurrent = true
.
-
bool has_allocations = {false}
If the flag is
true
, the variant is allowed to create buffers (temporary or output) during execution.false
by default.
-
bool elide_device_ctx_sync = {}
Whether this variant can skip device context synchronization after completion.
Normally, for device-enabled task variants, Legate will emit a device-wide barrier to ensure that all outstanding (potentially asynchronous) work performed by the variant has completed. However, if the task launches no such work, or if that work is launched using the task-specific device streams, then such a context synchronization is not necessary.
Setting this value to
true
ensures that no context synchronization is performed. Setting it tofalse
guarantees that a context synchronization is done.Has no effect on non-device variants (for example CPU variants).
See also
-
bool has_side_effect = {}
Indicate whether a task has side effects outside of the runtime’s tracking that forbid it from replicated a task.
When a task only takes scalar stores, it gets replicated by default on all the ranks, as that’s more efficient than having only one of the ranks run it and broadcast the results.
However, sometimes a task may have “side effects” (which are outside the runtime’s tracking) which should otherwise forbid the runtime from replicating a particular variant.
For example, the task may write something to disk, or effect some other kind of permanent change to the system. In these cases the runtime must not replicate the task, as the effect must occur exactly once.
-
bool may_throw_exception = {}
Whether this variant may throw an exception.
Tasks that throw exception must be handled specially by the runtime in order to safely and correctly propagate the thrown exceptions. For this reason, tasks must explicitly declare whether they throw an exception.
Warning
This special handling usually comes with severe performance penalties. For example, the runtime may block the calling thread (i.e. the main thread) on the completion of the possibly throwing task, or may opt not to schedule any other tasks concurrently.
Warning
It is highly recommended that tasks do not throw exceptions, and instead indicate an error state using some other way. Exceptions should be used as an absolute last resort.
-
std::optional<std::array<std::string_view, MAX_COMMS>> communicators = {}
The communicator(s) to be used by the variant, or
std::nullopt
if no communicator is to be used.Setting this to anything other than
std::nullopt
impliesconcurrent
to betrue
.
Public Static Attributes
-
static auto MAX_COMMS = 3
The maximum number of communicators allowed per variant.
This is a workaround for insufficient constexpr support in C++17 and will be removed in a future release.
-
class WithCommunicatorsAccessKey
-
VariantOptions &with_concurrent(bool concurrent)
-
using SymbolicPoint = tuple<SymbolicExpr>#