Runtime#
- group runtime
Runtime and Library contexts for the management and launching of tasks.
Enums
-
enum class ExceptionMode : std::uint8_t#
Enum for exception handling modes.
Values:
-
enumerator IMMEDIATE#
Handles exceptions immediately. Any throwable task blocks until completion.
-
enumerator DEFERRED#
Defers all exceptions until the current scope exits.
-
enumerator IGNORED#
All exceptions are ignored.
-
enumerator IMMEDIATE#
Functions
-
std::int32_t start(std::int32_t argc, char *argv[])#
Starts the Legate runtime.
- Deprecated:
Use the argument-less version of this function instead:
start()
See also
- Parameters:
argc – Argument is ignored.
argv – Argument is ignored.
- Returns:
Always returns 0
-
void start()#
Starts the Legate runtime.
This makes the runtime ready to accept requests made via its APIs. It may be called any number of times, only the first call has any effect.
- Throws:
ConfigurationError – If runtime configuration fails.
AutoConfigurationError – If the automatic configuration heuristics fail.
-
bool has_started()#
Checks if the runtime has started.
- Returns:
true
if the runtime has started,false
if the runtime has not started yet or afterfinish()
is called.
-
bool has_finished()#
Checks if the runtime has finished.
- Returns:
true
iffinish()
has been called,false
otherwise.
-
std::int32_t finish()#
Waits for the runtime to finish.
The client code must call this to make sure all Legate tasks run
- Returns:
Non-zero value when the runtime encountered a failure, 0 otherwise
-
void destroy()#
-
template<typename T>
void register_shutdown_callback(T &&callback)# Registers a callback that should be invoked during the runtime shutdown.
Any callbacks will be invoked before the core library and the runtime are destroyed. All callbacks must be non-throwable. Multiple registrations of the same callback are not deduplicated, and thus clients are responsible for registering their callbacks only once if they are meant to be invoked as such. Callbacks are invoked in the FIFO order, and thus any callbacks that are registered by another callback will be added to the end of the list of callbacks. Callbacks can launch tasks and the runtime will make sure of their completion before initializing its shutdown.
- Parameters:
callback – A shutdown callback
-
mapping::Machine get_machine()#
Returns the machine for the current scope.
- Returns:
Machine object
-
bool is_running_in_task()#
Checks if the code is running in a task.
- Returns:
true If the code is running in a task
- Returns:
false If the code is not running in a task
-
class Library#
- #include <legate/runtime/library.h>
A library class that provides APIs for registering components.
Public Functions
-
std::string_view get_task_name(LocalTaskID local_task_id) const#
Returns the name of a task.
- Parameters:
local_task_id – Task id
- Returns:
Name of the task
-
template<typename REDOP>
GlobalRedopID register_reduction_operator( - LocalRedopID redop_id,
Registers a library specific reduction operator.
The type parameter
REDOP
points to a class that implements a reduction operator. Each reduction operator class has the following structure:struct RedOp { using LHS = ...; // Type of the LHS values using RHS = ...; // Type of the RHS values static const RHS identity = ...; // Identity of the reduction operator template <bool EXCLUSIVE> LEGATE_HOST_DEVICE inline static void apply(LHS& lhs, RHS rhs) { ... } template <bool EXCLUSIVE> LEGATE_HOST_DEVICE inline static void fold(RHS& rhs1, RHS rhs2) { ... } };
Semantically, Legate performs reductions of values
V0
, …,Vn
to elementE
in the following way:I.e., Legate gathers all reduction contributions usingRHS T = RedOp::identity; RedOp::fold(T, V0) ... RedOp::fold(T, Vn) RedOp::apply(E, T)
fold
and applies the accumulator to the element usingapply
.Oftentimes, the LHS and RHS of a reduction operator are the same type and
fold
andapply
perform the same computation, but that’s not mandatory. For example, one may implement a reduction operator for subtraction, where thefold
would sum up all RHS values whereas theapply
would subtract the aggregate value from the LHS.The reduction operator id (
REDOP_ID
) can be local to the library but should be unique for each opeartor within the library.Finally, the contract for
apply
andfold
is that they must update the reference atomically when theEXCLUSIVE
isfalse
.Warning
Because the runtime can capture the reduction operator and wrap it with CUDA boilerplates only at compile time, the registration call should be made in a .cu file that would be compiled by NVCC. Otherwise, the runtime would register the reduction operator in CPU-only mode, which can degrade the performance when the program performs reductions on non-scalar stores.
- Template Parameters:
REDOP – Reduction operator to register
- Parameters:
redop_id – Library-local reduction operator ID
- Returns:
Global reduction operator ID
- void register_task(
- LocalTaskID local_task_id,
- const TaskInfo &task_info,
Register a task with the library.
See also
- Parameters:
local_task_id – The library-local task ID to assign for this task.
task_info – The
TaskInfo
object describing the task.
- Throws:
std::out_of_range – If the chosen local task ID exceeds the maximum local task ID for the library.
std::invalid_argument – If the task (or another task with the same
local_task_id
) has already been registered with the library.
-
std::string_view get_task_name(LocalTaskID local_task_id) const#
-
struct ResourceConfig#
- #include <legate/runtime/resource.h>
POD for library configuration.
Public Members
-
std::int64_t max_tasks = {1024}#
Maximum number of tasks that the library can register.
-
std::int64_t max_dyn_tasks = {0}#
Maximum number of dynamic tasks that the library can register (cannot exceed max_tasks)
-
std::int64_t max_reduction_ops = {}#
Maximum number of custom reduction operators that the library can register.
-
std::int64_t max_tasks = {1024}#
-
class ConfigurationError : public std::runtime_error#
- #include <legate/runtime/runtime.h>
Exception thrown during Legate startup when configuration fails.
This exception implies that the Legate runtime failed to start. The error behind this exception is most likely not recoverable, and restarting the Legate runtime in the same process will likely fail.
The underlying issue is likely that the caller requested a resource that does not exist on the current machine, or is not supported by the current build of Legate (e.g. requested GPUs in a CPU-only build of Legate). The caller should adjust the options specified in
LEGATE_CONFIG
before restarting the application and callinglegate::start
again.Public Functions
-
explicit ConfigurationError(std::string_view msg)#
Create a
ConfigurationError
with the given explanatory message.- Parameters:
msg – The explanatory message
-
explicit ConfigurationError(std::string_view msg)#
-
class AutoConfigurationError : public std::runtime_error#
- #include <legate/runtime/runtime.h>
Exception thrown during Legate startup when the automatic configuration heuristics fail.
This exception implies that the Legate runtime failed to start. The error behind this exception is most likely not recoverable, and restarting the Legate runtime in the same process will likely fail.
The underlying issue is that Legate was unable to synthesize a suitable configuration, either because hardware detection failed, or the detected resources were not enough to compute a sane configuration. The caller should manually specify the configuration using
LEGATE_CONFIG
, and/or disable automatic configuration altogether withLEGATE_AUTO_CONFIG=0
, before restarting the application and callinglegate::start
again.Public Functions
-
explicit AutoConfigurationError(std::string_view msg)#
Create an
AutoConfigurationError
with the given explanatory message.- Parameters:
msg – The explanatory message
-
explicit AutoConfigurationError(std::string_view msg)#
-
class Runtime#
- #include <legate/runtime/runtime.h>
Class that implements the Legate runtime.
The legate runtime provides common services, including as library registration, store creation, operator creation and submission, resource management and scoping, and communicator management. Legate libraries are free of all these details about distribute programming and can focus on their domain logics.
Public Functions
- Library create_library(
- std::string_view library_name,
- const ResourceConfig &config = ResourceConfig{},
- std::unique_ptr<mapping::Mapper> mapper = nullptr,
- std::map<VariantCode, VariantOptions> default_options = {},
Creates a library.
A library is a collection of tasks and custom reduction operators. The maximum number of tasks and reduction operators can be optionally specified with a
ResourceConfig
object. Each library can optionally have a mapper that specifies mapping policies for its tasks. When no mapper is given, the default mapper is used.
- std::optional<Library> maybe_find_library(
- std::string_view library_name,
Attempts to find a library.
If no library exists for a given name, a null value will be returned
- Library find_or_create_library(
- std::string_view library_name,
- const ResourceConfig &config = ResourceConfig{},
- std::unique_ptr<mapping::Mapper> mapper = nullptr,
- const std::map<VariantCode, VariantOptions> &default_options = {},
- bool *created = nullptr,
Finds or creates a library.
The optional configuration and mapper objects are picked up only when the library is created.
- Parameters:
library_name – Library name. Must be unique to this library
config – Optional configuration object
mapper – Optional mapper object
default_options – Optional default task variant options
created – Optional pointer to a boolean flag indicating whether the library has been created because of this call
- Returns:
Context object for the library
-
AutoTask create_task(Library library, LocalTaskID task_id)#
Creates an AutoTask.
- Parameters:
library – Library to query the task
task_id – Library-local Task ID
- Returns:
Task object
- ManualTask create_task(
- Library library,
- LocalTaskID task_id,
- const tuple<std::uint64_t> &launch_shape,
Creates a ManualTask.
- Parameters:
library – Library to query the task
task_id – Library-local Task ID
launch_shape – Launch domain for the task
- Returns:
Task object
- ManualTask create_task( )#
Creates a ManualTask.
This overload should be used when the lower bounds of the task’s launch domain should be non-zero. Note that the upper bounds of the launch domain are inclusive (whereas the
launch_shape
in the other overload is exlusive).- Parameters:
library – Library to query the task
task_id – Library-local Task ID
launch_domain – Launch domain for the task
- Returns:
Task object
- void issue_copy(
- LogicalStore &target,
- const LogicalStore &source,
- std::optional<ReductionOpKind> redop_kind = std::nullopt,
Issues a copy between stores.
The source and target stores must have the same shape.
- Parameters:
target – Copy target
source – Copy source
redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator.
- Throws:
std::invalid_argument – If the store’s type doesn’t support the reduction operator
- void issue_copy(
- LogicalStore &target,
- const LogicalStore &source,
- std::optional<std::int32_t> redop_kind,
Issues a copy between stores.
The source and target stores must have the same shape.
- Parameters:
target – Copy target
source – Copy source
redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator.
- Throws:
std::invalid_argument – If the store’s type doesn’t support the reduction operator
- void issue_gather(
- LogicalStore &target,
- const LogicalStore &source,
- const LogicalStore &source_indirect,
- std::optional<ReductionOpKind> redop_kind = std::nullopt,
Issues a gather copy between stores.
The indirection store and the target store must have the same shape.
- Parameters:
target – Copy target
source – Copy source
source_indirect – Store for source indirection
redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator.
- Throws:
std::invalid_argument – If the store’s type doesn’t support the reduction operator
- void issue_gather(
- LogicalStore &target,
- const LogicalStore &source,
- const LogicalStore &source_indirect,
- std::optional<std::int32_t> redop_kind,
Issues a gather copy between stores.
The indirection store and the target store must have the same shape.
- Parameters:
target – Copy target
source – Copy source
source_indirect – Store for source indirection
redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator.
- Throws:
std::invalid_argument – If the store’s type doesn’t support the reduction operator
- void issue_scatter(
- LogicalStore &target,
- const LogicalStore &target_indirect,
- const LogicalStore &source,
- std::optional<ReductionOpKind> redop_kind = std::nullopt,
Issues a scatter copy between stores.
The indirection store and the source store must have the same shape.
- Parameters:
target – Copy target
target_indirect – Store for target indirection
source – Copy source
redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator.
- Throws:
std::invalid_argument – If the store’s type doesn’t support the reduction operator
- void issue_scatter(
- LogicalStore &target,
- const LogicalStore &target_indirect,
- const LogicalStore &source,
- std::optional<std::int32_t> redop_kind,
Issues a scatter copy between stores.
The indirection store and the source store must have the same shape.
- Parameters:
target – Copy target
target_indirect – Store for target indirection
source – Copy source
redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator.
- Throws:
std::invalid_argument – If the store’s type doesn’t support the reduction operator
- void issue_scatter_gather(
- LogicalStore &target,
- const LogicalStore &target_indirect,
- const LogicalStore &source,
- const LogicalStore &source_indirect,
- std::optional<ReductionOpKind> redop_kind = std::nullopt,
Issues a scatter-gather copy between stores.
The indirection stores must have the same shape.
- Parameters:
target – Copy target
target_indirect – Store for target indirection
source – Copy source
source_indirect – Store for source indirection
redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator.
- Throws:
std::invalid_argument – If the store’s type doesn’t support the reduction operator
- void issue_scatter_gather(
- LogicalStore &target,
- const LogicalStore &target_indirect,
- const LogicalStore &source,
- const LogicalStore &source_indirect,
- std::optional<std::int32_t> redop_kind,
Issues a scatter-gather copy between stores.
The indirection stores must have the same shape.
- Parameters:
target – Copy target
target_indirect – Store for target indirection
source – Copy source
source_indirect – Store for source indirection
redop_kind – ID of the reduction operator to use (optional). The store’s type must support the operator.
- Throws:
std::invalid_argument – If the store’s type doesn’t support the reduction operator
-
void issue_fill(const LogicalArray &lhs, const LogicalStore &value)#
Fills a given array with a constant.
- Parameters:
lhs – Logical array to fill
value – Logical store that contains the constant value to fill the array with
-
void issue_fill(const LogicalArray &lhs, const Scalar &value)#
Fills a given array with a constant.
- Parameters:
lhs – Logical array to fill
value – Value to fill the array with
- LogicalStore tree_reduce(
- Library library,
- LocalTaskID task_id,
- const LogicalStore &store,
- std::int32_t radix = 4,
Performs reduction on a given store via a task.
- Parameters:
library – The library for the reducer task
task_id – reduction task ID
store – Logical store to reduce
radix – Optional radix value that determines the maximum number of input stores to the task at each reduction step
-
void submit(AutoTask &&task)#
Submits an AutoTask for execution.
Each submitted operation goes through multiple pipeline steps to eventually get scheduled for execution. It’s not guaranteed that the submitted operation starts executing immediately.
The runtime takes the ownership of the submitted task. Once submitted, the task becomes invalid and is not reusable.
- Parameters:
task – An AutoTask to execute
-
void submit(ManualTask &&task)#
Submits a ManualTask for execution.
Each submitted operation goes through multiple pipeline steps to eventually get scheduled for execution. It’s not guaranteed that the submitted operation starts executing immediately.
The runtime takes the ownership of the submitted task. Once submitted, the task becomes invalid and is not reusable.
- Parameters:
task – A ManualTask to execute
- LogicalArray create_array(
- const Type &type,
- std::uint32_t dim = 1,
- bool nullable = false,
Creates an unbound array.
- Parameters:
type – Element type
dim – Number of dimensions
nullable – Nullability of the array
- Returns:
Logical array
- LogicalArray create_array( )#
Creates a normal array.
- Parameters:
shape – Shape of the array. The call does not block on this shape
type – Element type
nullable – Nullability of the array
optimize_scalar – When true, the runtime internally uses futures optimized for storing scalars
- Returns:
Logical array
- LogicalArray create_array_like(
- const LogicalArray &to_mirror,
- std::optional<Type> type = std::nullopt,
Creates an array isomorphic to the given array.
- Parameters:
to_mirror – The array whose shape would be used to create the output array. The call does not block on the array’s shape.
type – Optional type for the resulting array. Must be compatible with the input array’s type
- Returns:
Logical array isomorphic to the input
- StringLogicalArray create_string_array(
- const LogicalArray &descriptor,
- const LogicalArray &vardata,
Creates a string array from the existing sub-arrays.
The caller is responsible for making sure that the vardata sub-array is valid for all the descriptors in the descriptor sub-array
- Parameters:
descriptor – Sub-array for descriptors
vardata – Sub-array for characters
- Throws:
std::invalid_argument – When any of the following is true: 1)
descriptor
orvardata
is unbound or N-D where N > 1 2)descriptor
does not have a 1D rect type 3)vardata
is nullable 4)vardata
does not have an int8 type- Returns:
String logical array
- ListLogicalArray create_list_array(
- const LogicalArray &descriptor,
- const LogicalArray &vardata,
- std::optional<Type> type = std::nullopt,
Creates a list array from the existing sub-arrays.
The caller is responsible for making sure that the vardata sub-array is valid for all the descriptors in the descriptor sub-array
- Parameters:
descriptor – Sub-array for descriptors
vardata – Sub-array for vardata
type – Optional list type the returned array would have
- Throws:
std::invalid_argument – When any of the following is true: 1)
type
is not a list type 2)descriptor
orvardata
is unbound or N-D where N > 1 3)descriptor
does not have a 1D rect type 4)vardata
is nullable 5)vardata
andtype
have different element types- Returns:
List logical array
-
LogicalStore create_store(const Type &type, std::uint32_t dim = 1)#
Creates an unbound store.
- Parameters:
type – Element type
dim – Number of dimensions of the store
- Returns:
Logical store
- LogicalStore create_store( )#
Creates a normal store.
- Parameters:
shape – Shape of the store. The call does not block on this shape.
type – Element type
optimize_scalar – When true, the runtime internally uses futures optimized for storing scalars
- Returns:
Logical store
- LogicalStore create_store( )#
Creates a normal store out of a
Scalar
object.- Parameters:
scalar – Value of the scalar to create a store with
shape – Shape of the store. The volume must be 1. The call does not block on this shape.
- Returns:
Logical store
- LogicalStore create_store(
- const Shape &shape,
- const Type &type,
- void *buffer,
- bool read_only = true,
- const mapping::DimOrdering &ordering = mapping::DimOrdering::c_order(),
Creates a store by attaching to an existing allocation.
This call does not block wait on the input shape
- Parameters:
shape – Shape of the store. The call does not block on this shape.
type – Element type
buffer – Pointer to the beginning of the allocation to attach to; allocation must be contiguous, and cover the entire contents of the store (at least
extents.volume() * type.size()
bytes)read_only – Whether the allocation is read-only
ordering – In what order the elements are laid out in the passed buffer
- Returns:
Logical store
- LogicalStore create_store(
- const Shape &shape,
- const Type &type,
- const ExternalAllocation &allocation,
- const mapping::DimOrdering &ordering = mapping::DimOrdering::c_order(),
Creates a store by attaching to an existing allocation.
- Parameters:
shape – Shape of the store. The call does not block on this shape.
type – Element type
allocation – External allocation descriptor
ordering – In what order the elements are laid out in the passed allocation
- Returns:
Logical store
- std::pair<LogicalStore, LogicalStorePartition> create_store(
- const Shape &shape,
- const tuple<std::uint64_t> &tile_shape,
- const Type &type,
- const std::vector<std::pair<ExternalAllocation, tuple<std::uint64_t>>> &allocations,
- const mapping::DimOrdering &ordering = mapping::DimOrdering::c_order(),
Creates a store by attaching to multiple existing allocations.
External allocations must be read-only.
- Parameters:
- Throws:
std::invalid_argument – If any of the external allocations are not read-only
- Returns:
A pair of a logical store and its partition
- void prefetch_bloated_instances(
- const LogicalStore &store,
- tuple<std::uint64_t> low_offsets,
- tuple<std::uint64_t> high_offsets,
- bool initialize = false,
Gives the runtime a hint that the store can benefit from bloated instances.
The runtime currently does not look ahead in the task stream to recognize that a given set of tasks can benefit from the ahead-of-time creation of “bloated” instances encompassing multiple slices of a store. This means that the runtime will construct bloated instances incrementally and completely only when it sees all the slices, resulting in intermediate instances that (temporarily) increases the memory footprint. This function can be used to give the runtime a hint ahead of time about the bloated instances, which would be reused by the downstream tasks without going through the same incremental process.
For example, let’s say we have a 1-D store A of size 10 and we want to partition A across two GPUs. By default, A would be partitioned equally and each GPU gets an instance of size 5. Suppose we now have a task that aligns two slices A[1:10] and A[:9]. The runtime would partition the slices such that the task running on the first GPU gets A[1:6] and A[:5], and the task running on the second GPU gets A[6:] and A[5:9]. Since the original instance on the first GPU does not cover the element A[5] included in the first slice A[1:6], the mapper needs to create a new instance for A[:6] that encompasses both of the slices, leading to an extra copy. In this case, if the code calls
prefetch(A, {0}, {1})
to pre-alloate instances that contain one extra element on the right before it uses A, the extra copy can be avoided.A couple of notes about the API:
Unless
initialize
istrue
, the runtime assumes that the store has been initialized. Passing an uninitialized store would lead to a runtime error.If the store has pre-existing instances, the runtime may combine those with the bloated instances if such combination is deemed desirable.
Note
This API is experimental
- Parameters:
store – Store to create bloated instances for
low_offsets – Offsets to bloat towards the negative direction
high_offsets – Offsets to bloat towards the positive direction
initialize – If
true
, the runtime will issue a fill on the store to initialize it. The default value isfalse
-
void issue_mapping_fence()#
Issues a mapping fence.
A mapping fence, when issued, blocks mapping of all downstream operations before those preceding the fence get mapped. An
issue_mapping_fence
call returns immediately after the request is submitted to the runtime, and the fence asynchronously goes through the runtime analysis pipeline just like any other Legate operations. The call also flushes the scheduling window for batched execution.Mapping fences only affect how the operations are mapped and do not change their execution order, so they are semantically no-op. Nevertheless, they are sometimes useful when the user wants to control how the resource is consumed by independent tasks. Consider a program with two independent tasks A and B, both of which discard their stores right after their execution. If the stores are too big to be allocated all at once, mapping A and B in parallel (which can happen because A and B are independent and thus nothing stops them from getting mapped concurrently) can lead to a failure. If a mapping fence exists between the two, the runtime serializes their mapping and can reclaim the memory space from stores that would be discarded after A’s execution to create allocations for B.
-
void issue_execution_fence(bool block = false)#
Issues an execution fence.
An execution fence is a join point in the task graph. All operations prior to a fence must finish before any of the subsequent operations start.
All execution fences are mapping fences by definition; i.e., an execution fence not only prevents the downstream operations from being mapped ahead of itself but also precedes their execution.
- Parameters:
block – When
true
, the control code blocks on the fence and all operations that have been submitted prior to this fence.
-
void raise_pending_exception()#
Raises a pending exception.
When the exception mode of a scope is “deferred” (i.e., Scope::exception_mode() == ExceptionMode::DEFERRED), the exceptions from tasks in the scope are not immediately handled, but are pushed to the pending exception queue. Accumulated pending exceptions are not flushed until raise_pending_exception is invoked. The function throws the first exception in the pending exception queue and clears the queue. If there is no pending exception to be raised, the function does nothing.
- Throws:
legate::TaskException – When there is a pending exception to raise
-
std::uint32_t node_count() const#
Returns the total number of nodes.
- Returns:
Total number of nodes
-
std::uint32_t node_id() const#
Returns the current rank.
- Returns:
Rank ID
-
mapping::Machine get_machine() const#
Returns the machine of the current scope.
- Returns:
Machine object
-
Processor get_executing_processor() const#
Returns the current Processor on which the caller is executing.
- Returns:
The current Processor.
-
void start_profiling_range()#
Start a Legion profiling range.
-
void stop_profiling_range(std::string_view provenance)#
Stop a Legion profiling range.
- Parameters:
provenance – User-supplied provenance string
-
enum class ExceptionMode : std::uint8_t#