legate::TaskContext#

class TaskContext#

A task context that contains task arguments and communicators.

Public Functions

GlobalTaskID task_id() const noexcept#

Returns the global ID of the task.

Returns:: The global task id

VariantCode variant_kind() const noexcept#

Returns the Legate variant kind of the task.

Returns:: The variant kind

PhysicalArray input(std::uint32_t index) const#

Returns an input array of the task.

Parameters:: index – Index of the array
Returns:: Array

std::vector<PhysicalArray> inputs() const#

Returns all input arrays of the task.

Returns:: Vector of arrays

PhysicalArray output(std::uint32_t index) const#

Returns an output array of the task.

Parameters:: index – Index of the array
Returns:: Array

std::vector<PhysicalArray> outputs() const#

Returns all output arrays of the task.

Returns:: Vector of arrays

PhysicalArray reduction(std::uint32_t index) const#

Returns a reduction array of the task.

Parameters:: index – Index of the array
Returns:: Array

std::vector<PhysicalArray> reductions() const#

Returns all reduction arrays of the task.

Returns:: Vector of arrays

Scalar scalar(std::uint32_t index) const#

Returns a by-value argument of the task.

Parameters:: index – Index of the scalar
Returns:: Scalar

std::vector<Scalar> scalars() const#

Returns by-value arguments of the task.

Returns:: Vector of scalars

const comm::Communicator &communicator(std::uint32_t index) const#

Returns a communicator of the task.

If a task launch ends up emitting only a single point task, that task will not get passed a communicator, even if one was requested at task launching time. Therefore, tasks using communicators should be prepared to handle the case where the returned vector is empty.

Parameters:: index – Index of the communicator
Returns:: Communicator

std::vector<comm::Communicator> communicators() const#

Returns communicators of the task.

If a task launch ends up emitting only a single point task, that task will not get passed a communicator, even if one was requested at task launching time. Therefore, most tasks using communicators should be prepared to handle the case where the returned vector is empty.

Returns:: Vector of communicators

std::size_t num_inputs() const#

Returns the number of task’s inputs.

Returns:: Number of arrays

std::size_t num_outputs() const#

Returns the number of task’s outputs.

Returns:: Number of arrays

std::size_t num_reductions() const#

Returns the number of task’s reductions.

Returns:: Number of arrays

std::size_t num_scalars() const#

Returns the number of Scalars.

Returns:: Number of Scalars

std::size_t num_communicators() const#

Returns the number of communicators.

Returns:: Number of communicators

bool is_single_task() const#

Indicates whether the task is parallelized.

Returns:: true The task is a single task
Returns:: false The task is one in a set of multiple parallel tasks

bool can_raise_exception() const#

Indicates whether the task is allowed to raise an exception.

Returns:: true The task can raise an exception
Returns:: false The task must not raise an exception

const DomainPoint &get_task_index() const#

Returns the point of the task. A 0D point will be returned for a single task.

Returns:: The point of the task

const Domain &get_launch_domain() const#

Returns the task group’s launch domain. A single task returns an empty domain.

Returns:: The task group’s launch domain

mapping::TaskTarget target() const#

Returns the kind of processor executing this task.

Returns:: The processor kind

void concurrent_task_barrier()#

Perform a blocking barrier across all the leaf tasks in a concurrent task launch.

When a leaf task invokes this operation, control will not return to the task until all the leaf tasks in the same launch have executed the same barrier.

This is useful e.g. to work around NCCL deadlocks, that can be triggered when another concurrent CUDA operation creates a false dependence or resource conflict with the resident NCCL kernels. By performing a barrier before and after every NCCL collective operation happening inside the leaf tasks in a concurrent task launch, we can effectively isolate the execution of the NCCL collective from all other CUDA work, thus preventing the deadlock. In more detail:

put a barrier before the collective operation
emit the collective operation
ensure that NCCL has actually emitted all its operations on the stream (e.g. ncclGroupEnd has been called, if grouping operations)
perform another barrier

    // The barrier must happen before the NCCL calls begin
    context.concurrent_task_barrier();
    auto result = ncclAllGather(p_send, p_recv, 1, ncclUint64, *comm, stream);
    EXPECT_EQ(result, ncclSuccess);
    // And insert a barrier after all NCCL calls return, to ensure that all ranks have
    // emitted the NCCL calls
    context.concurrent_task_barrier();

This operation can only be performed inside leaf tasks (not on the top-level task), and only in variants that have been declared as concurrent. All leaf tasks in a launch must take part in the barrier (it cannot be done only on a subset of them). Breaking any of the previously stated invariants is a fatal error.

CUstream_st *get_task_stream() const#

Get the current task CUDA stream.

All asynchronous stream work performed by a GPU variant must be performed on, or synchronized with the stream returned by this method. Doing asynchronous work on other streams and failing to encode those dependencies (or otherwise synchronizing them) on this stream will result in undefined behavior.

If the current task is not a GPU task, or does not have GPU support enabled, this method returns nullptr.

Returns:: The current tasks CUDA stream.