util#

group Utilities

General utilities.

Defines

LEGATE_CONCAT_(x, ...)#

Concatenate a series of tokens without macro expansion.

This macro will NOT macro-expand any tokens passed to it. If this behavior is undesirable, and the user wishes to have all tokens expanded before concatenation, use LEGATE_CONCAT() instead. For example:

#define FOO 1
#define BAR 2

LEGATE_CONCAT(FOO, BAR) // expands to FOOBAR

See also

LEGATE_CONCAT()

Parameters:

x – The first parameter to concatenate.
... – The remaining parameters to concatenate.

LEGATE_CONCAT(x, ...)#

Concatenate a series of tokens.

This macro will first macro-expand any tokens passed to it. If this behavior is undesirable, use LEGATE_CONCAT_() instead. For example:

#define FOO 1
#define BAR 2

LEGATE_CONCAT(FOO, BAR) // expands to 12

See also

LEGATE_CONCAT_()

Parameters:

x – The first parameter to concatenate.
... – The remaining parameters to concatenate.

LEGATE_STRINGIZE_(...)#

Stringize a series of tokens.

This macro will turn its arguments into compile-time constant C strings.

This macro will NOT macro-expand any tokens passed to it. If this behavior is undesirable, and the user wishes to have all tokens expanded before stringification, use LEGATE_STRINGIZE() instead. For example:

#define FOO 1
#define BAR 2

LEGATE_STRINGIZE_(FOO, BAR) // expands to "FOO, BAR" (note the "")

See also

LEGATE_STRINGIZE()

Parameters:

... – The tokens to stringize.

LEGATE_STRINGIZE(...)#

Stringize a series of tokens.

This macro will turn its arguments into compile-time constant C strings.

This macro will first macro-expand any tokens passed to it. If this behavior is undesirable, use LEGATE_STRINGIZE_() instead. For example:

#define FOO 1
#define BAR 2

LEGATE_STRINGIZE(FOO, BAR) // expands to "1, 2" (note the "")

See also

LEGATE_STRINGIZE_()

Parameters:

... – The tokens to stringize.

LEGATE_DEFINED_ENABLED_FORM_1#

LEGATE_DEFINED_ENABLED_FORM_#

LEGATE_DEFINED_PRIVATE_3_(ignored, val, ...)#

LEGATE_DEFINED_PRIVATE_2_(args)#

LEGATE_DEFINED_PRIVATE_1_(...)#

LEGATE_DEFINED_PRIVATE(x)#

LEGATE_DEFINED(x)#

Determine if a preprocessor definition is positively defined.

LEGATE_DEFINED() returns 1 if and only if x expands to integer literal 1, or is defined (but empty). In all other cases, LEGATE_DEFINED() returns the integer literal 0. Therefore this macro should not be used if its argument may expand to a non-empty value other than

The only exception is if the argument is defined but expands to 0, in which case LEGATE_DEFINED() will also expand to 0:

#define FOO_EMPTY
#define FOO_ONE 1
#define FOO_ZERO 0
// #define FOO_UNDEFINED

static_assert(LEGATE_DEFINED(FOO_EMPTY) == 1);
static_assert(LEGATE_DEFINED(FOO_ONE) == 1);
static_assert(LEGATE_DEFINED(FOO_ZERO) == 0);
static_assert(LEGATE_DEFINED(FOO_UNDEFINED) == 0);

Conceptually, LEGATE_DEFINED() is equivalent to

#if defined(x) && (x == 1 || x == *empty*)
// "return" 1
#else
// "return" 0
#endif

As a result this macro works both in preprocessor statements:

#if LEGATE_DEFINED(FOO_BAR)
  foo_bar_is_defined();
#else
  foo_bar_is_not_defined();
#endif

And in regular C++ code:

if (LEGATE_DEFINED(FOO_BAR)) {
  foo_bar_is_defined();
} else {
  foo_bar_is_not_defined();
}

Note that in the C++ example above both arms of the if statement must compile. If this is not desired, then — since LEGATE_DEFINED() produces a compile-time constant expression — the user may use C++17’s if constexpr to block out one of the arms:

if constexpr (LEGATE_DEFINED(FOO_BAR)) {
  foo_bar_is_defined();
} else {
  foo_bar_is_not_defined();
}

See also

LEGATE_CONCAT()

Parameters:

x – The legate preprocessor definition.

Returns:

1 if the argument is defined and true, 0 otherwise.

LEGATE_SCOPE_GUARD(...)#

Construct an unnamed legate::ScopeGuard from the contents of the macro arguments.

It is impossible to enable or disable the legate::ScopeGuard constructed by this macro.

This macro is useful if the user need only define some action to be executed on scope exit, but doesn’t care to name the legate::ScopeGuard and/or has no need to enable/disable it after construction.

For example:

int *mem = std::malloc(10 * sizeof(int));

LEGATE_SCOPE_GUARD(std::free(mem));
// use mem...
// scope exits, and mem is free'd.

Multi-line statements are also supported:

int *mem = std::malloc(10 * sizeof(int));

LEGATE_SCOPE_GUARD(
  if (frobnicate()) {
    std::free(mem);
  }
);
// use mem...
// scope exits, and mem is free'd depending on return value of frobnicate()

If the body of the guard should only be executed on failure, use LEGATE_SCOPE_FAIL instead.

See also

ScopeGuard

See also

LEGATE_SCOPE_FAIL

Parameters:

... – The body of the constructed legate::ScopeGuard.

LEGATE_SCOPE_FAIL(...)#

Construct an unnamed legate::ScopeFail from the contents of the macro arguments.

This macro behaves identically to LEGATE_SCOPE_GUARD, except that it creates a legate::ScopeFail instead of a legate::ScopeGuard. Please refer to its documentation for further discussion.

See also

ScopeFail

See also

LEGATE_SCOPE_GUARD

Parameters:

... – The body of the constructed legate::ScopeFail.

Typedefs

using VariantImpl = void (*)(TaskContext)#: Function signature for task variants. Each task variant must be a function of this type.

template<typename T = void> using LegionVariantImpl = T (*)(const Legion::Task*, const std::vector<Legion::PhysicalRegion>&, Legion::Context, Legion::Runtime*)#: Function signature for direct-to-legion task variants. Users should usually prefer VariantImpl instead.

using ShutdownCallback = std::function<void(void)>#: Signature for a callable to be executed right before the runtime shuts down.

Enums

enum class VariantCode : Legion::VariantID#

An enum describing the kind of variant.

Note

The values don’t start at 0. This is to match Legion, where 0 is the ‘None’ variant.

Values:

enumerator CPU#: A CPU variant.

enumerator GPU#: A GPU variant.

enumerator OMP#: An OpenMP variant.

enum class LocalTaskID : std::int64_t#

Integer type representing a Library-local task ID.

All tasks are uniquely identifiable via a “task ID”. These task ID’s come in 2 flavors: global and local. When a task is registered to a Library, the task must declare a unique “local” task ID (LocalTaskID) within that Library. This task ID must not coincide with any other task ID within that Library. After registration, the task is also assigned a “global” ID (GlobalTaskID) which is guaranteed to be unique across the entire program.

GlobalTaskIDs may therefore be used to refer to tasks registered to other Librarys or to refer to the task when interfacing with Legion.

For example, consider a task Foo:

class Foo : public legate::LegateTask<Foo> {
 public:
  // Foo declares a local task ID of 10
  static inline const auto TASK_CONFIG =  // NOLINT(cert-err58-cpp)
    legate::TaskConfig{legate::LocalTaskID{10}};

  static void cpu_variant(legate::TaskContext /* ctx */)
  {
    // some very useful work...
  }
};

And two Librarys, bar and baz:

  legate::Library bar_lib = runtime->create_library(BAR_LIBNAME);
  legate::Library baz_lib = runtime->create_library(BAZ_LIBNAME);

  // Foo registers itself with bar, claiming the bar-local task ID of 10.
  Foo::register_variants(bar_lib);
  // Retrieve the global task ID after registration.
  legate::GlobalTaskID gid_bar = bar_lib.get_task_id(Foo::TASK_CONFIG.task_id());

  // This should be false, Foo has not registered itself to baz yet.
  ASSERT_FALSE(baz_lib.valid_task_id(gid_bar));

  // However, we can query information from Legion about this task (such as its name), since
  // the global task ID has been assigned.
  const char* legion_task_name{};

  Legion::Runtime::get_runtime()->retrieve_name(static_cast<Legion::TaskID>(gid_bar),
                                                legion_task_name);
  ASSERT_STREQ(legion_task_name, "example::Foo");

  // We can get the same information using the local ID from the Library
  auto task_name = bar_lib.get_task_name(Foo::TASK_CONFIG.task_id());

  ASSERT_EQ(task_name, legion_task_name);

See also

GlobalTaskID Library Library::get_task_id()

Values:

enum class GlobalTaskID : Legion::TaskID#

Integer type representing a global task ID.

GlobalTaskIDs may be used to refer to tasks registered to other Librarys or to refer to the task when interfacing with Legion. See LocalTaskID for further discussion on task ID’s and task registration.

See also

LocalTaskID Library Library::get_local_task_id()

Values:

enum class LocalRedopID : std::int64_t#

Integer type representing a Library-local reduction operator ID.

All reduction operators are uniquely identifiable via a “reduction ID”, which serve as proxy task ID’s for the reduction meta-tasks. When a reduction operator is registered with a Library, the reduction must declare a unique “local” ID (LocalRedopID) within that Library. The Library then assigns a globally unique ID to the reduction operator, which may be used to refer to the operator across the entire program.

See also

GlobalRedopID Library Library::get_reduction_op_id()

Values:

enum class GlobalRedopID : Legion::ReductionOpID#

Integer type representing a global reduction operator ID.

GlobalRedopIDs may be used to refer to reduction operators registered to other Librarys, or to refer to the reduction operator when interfacing with Legion. See LocalRedopID for further discussion on reduction operator ID’s.

See also

LocalRedopID

Values:

Functions

Time measure_microseconds()#

Returns a timestamp at the resolution of microseconds.

The returned timestamp indicates the time at which all preceding Legate operations finish. This timestamp generation is a non-blocking operation, and the blocking happens when the value wrapped within the returned Time object is retrieved.

Returns:: A Time object

Time measure_nanoseconds()#

Returns a timestamp at the resolution of nanoseconds.

Returns:: A Time object

std::size_t linearize( const DomainPoint &lo, const DomainPoint &hi, const DomainPoint &point )#

Given an N-Dimensional shape and a point inside that shape, compute the “linearized” index of the point within the shape.

This routine is often used to determine the “local”, 0-based position of a point within a task, that will be in the range [0, shape.volume() - 1). This may be used to e.g. copy a sub-store into a temporary 1D buffer, in which case linearize() would map each point in the shape to an index within the buffer:

auto shape = store.shape<DIM>();
auto *buf  = new int[shape.volume()];

for (auto it = legate::PointInRectIterator<DIM>{shape}; it.valid(); ++it) {
  auto local_idx = legate::linearize(shape.lo, shape.hi, *it);
  // local_idx contains the 0-based index of *it, regardless of how the task was
  // parallelized
  buf[local_idx] = accessor[*it];
}

For example, given a 2x2 shape with bounds lo of (0, 0) and hi of (2, 2), then for each point the linearized indices would be as follows:

Point  -> idx
(0, 0) -> 0
(0, 1) -> 1
(0, 2) -> 2
(1, 0) -> 3
(1, 1) -> 4
(1, 2) -> 5
(2, 0) -> 6
(2, 1) -> 7
(2, 2) -> 8

Similarly, with a lo of (2, 2) and hi of (4, 4):

Point  -> idx
(2, 2) -> 0
(2, 3) -> 1
(2, 4) -> 2
(3, 2) -> 3
(3, 3) -> 4
(3, 4) -> 5
(4, 2) -> 6
(4, 3) -> 7
(4, 4) -> 8

See also

delinearize

Parameters:

lo – The lowest point in the shape.
hi – The highest point in the shape.
point – The point whose position in the shape you wish to linearize.

Returns:

The linear index of the point.

DomainPoint delinearize( const DomainPoint &lo, const DomainPoint &hi, std::size_t idx )#

Given an N-Dimensional shape and an index corresponding to a point inside that shape, compute the point corresponding to the index.

This routine is often used to convert a “local” 1d index, in the range [0, shape.volume() - 1), to a point within the “local” shape. For example, this is often used to convert a thread ID in a CUDA kernel or OpenMP loop to the corresponding point within the shape:

// e.g. in an OpenMP loop
auto shape = store.shape<DIM>();

#omp parallel for
for (std::size_t i = 0; i < shape.volume(); ++i) {
  auto local_pt = legate::delinearize(shape.lo, shape.hi, i);
  // local_pt now contains the local point corresponding to index i
}

For example, given a 2x2 shape with bounds lo of (0, 0) and hi of (2, 2), then for each idx, the delinearized points would be as follows:

idx -> Point
 -> (0, 0)
 -> (0, 1)
 -> (0, 2)
 -> (1, 0)
 -> (1, 1)
 -> (1, 2)
 -> (2, 0)
 -> (2, 1)
 -> (2, 2)

See also

linearize

Parameters:

lo – The lowest point in the shape.
hi – The highest point in the shape.
idx – The linearized index of the point.

Returns:

The point inside the shape.

template<typename Functor, typename ...Fnargs> decltype(auto) double_dispatch( int dim, Type::Code code, Functor f, Fnargs&&... args )#

Converts the runtime dimension and type code into compile time constants and invokes the functor with them.

The functor’s operator() should take a dimension and a type code as template parameters.

Parameters:

dim – Dimension
code – Type code
f – Functor to dispatch
args – Extra arguments to the functor

Returns:

The functor’s return value

template<typename Functor, typename ...Fnargs> decltype(auto) double_dispatch( int dim1, int dim2, Functor f, Fnargs&&... args )#

Converts the runtime dimensions into compile time constants and invokes the functor with them.

The functor’s operator() should take exactly two integers as template parameters.

Parameters:

dim1 – First dimension
dim2 – Second dimension
f – Functor to dispatch
args – Extra arguments to the functor

Returns:

The functor’s return value

template<typename Functor, typename ...Fnargs> decltype(auto) dim_dispatch( int dim, Functor f, Fnargs&&... args )#

Converts the runtime dimension into a compile time constant and invokes the functor with it.

The functor’s operator() should take an integer as its sole template parameter.

Parameters:

dim – Dimension
f – Functor to dispatch
args – Extra arguments to the functor

Returns:

The functor’s return value

template<typename Functor, typename ...Fnargs> decltype(auto) type_dispatch( Type::Code code, Functor &&f, Fnargs&&... args )#

Converts the runtime type code into a compile time constant and invokes the functor with it.

The functor’s operator() should take a type code as its sole template parameter.

Parameters:

code – Type code
f – Functor to dispatch
args – Extra arguments to the functor

Returns:

The functor’s return value

template<typename Element, typename Extent, typename Layout, typename Accessor> detail::FlatMDSpanView<::cuda::std::mdspan<Element, Extent, Layout, Accessor>> flatten( ::cuda::std::mdspan<Element, Extent, Layout, Accessor> span ) noexcept#

Create a flattened view of an mdspan that allows efficient random elementwise access.

The returned view object supports all the usual iterator semantics.

Unfortunately, flattening mdspan into a linear iterator ends up with inefficient code-gen as compilers are unable to untangle the internal state required to make this work. This is not really an “implementation quality” issue so much as a fundamental constraint. In order to implement iterators, you need to solve the problem of mapping a linear index to a N-dimensional point in space. This linearization is done via the following:

std::array<std::size_t, DIM> point;

for (auto dim = DIM; dim-- > 0;) {
  point[dim] = index % span.extent(dim);
  index /= span.extent(dim);
}

The problem are the modulus and div commands. Modern compilers are seemingly unable to hoist those computations out of the loop and vectorize the code. So an equivalent loop over the extents “normally”:

for (std::size_t i = 0; i < span.extent(0); ++i) {
  for (std::size_t j = 0; j < span.extent(1); ++j) {
    span(i, j) = ...
  }
}

Will be fully vectorized by optimizers, but the following (which is more or less what this iterator expands to):

for (std::size_t i = 0; i < PROD(span.extents()...); ++i) {
  std::array<std::size_t, DIM> point = delinearize(i);

  span(point) = ...
}

Defeats all known modern optimizing compilers. Therefore, unless this iterator is truly required, the user is strongly encouraged to iterate over their mdspan normally.

Parameters:: span – The mdspan to flatten.
Returns:: The flat view.

template<typename IndexType, std::size_t... Extents, typename F> void for_each_in_extent( const ::cuda::std::extents<IndexType, Extents...> &extents, F &&fn )#

Execute a function fn for each i, j, k, ...-th point in an extent extents.

Invoking this method is roughly equivalent to

for (std::size_t i = 0; i < extents.extent(0); ++i) {
  for (std::size_t j = 0; j < extents.extent(1); ++j) {
    // ...
    fn(i, j, ...);
  }
}

Where the number of nested loops generated are equal to the rank of the extent.

The utility of this function is multi-fold:

#. It allow efficient iteration over an mdspan of variable dimension. #. It separates the iteration from the container. For example, if the user wanted to iterate over the intersection of multiple mdspans, they could compute the intersection of their extents, and use this function to generate the loops.

Parameters:

extents – The extents to iterate over.
fn – The function to execute.

template<std::int32_t DIM, typename F> void for_each_in_extent( const Point<DIM> &point, F &&fn )#

Execute a function fn for each i, j, k, ...-th index in point point.

This routine treats point as an “extent”, where each index of point gives the 0-based extent for that dimension. So given a 2D point <1, 1>, then this routine would generate the following calls:

fn(0, 0)
fn(0, 1)
fn(1, 0)
fn(1, 1)

Parameters:

point – The Point to iterate over.
fn – The function to execute.

template<std::int32_t DIM, typename F> void for_each_in_extent( const Rect<DIM> &rect, F &&fn )#

Execute a function fn for each i, j, k, ...-th index in rect rect.

This routine is similar to the Point overload, except that the extents are given by the difference between rect[i].lo and rect[i].hi. The indices are then converted to 0-based indices before being passed to fn. So given a 2D rect: [<1, 1>, <2, 2>], then this routine would generate the following calls:

fn(0, 0)
fn(0, 1)
fn(0, 2)
fn(1, 0)
fn(1, 1)
fn(1, 2)
fn(2, 0)
fn(2, 1)
fn(2, 2)

Parameters:

rect – The Rect to iterate over.
fn – The function to execute.

template<typename F> ScopeGuard<F> make_scope_guard( F &&fn ) noexcept#

Create a ScopeGuard from a given functor.

See also

ScopeGuard

Parameters:: fn – The functor to create the ScopeGuard with.
Template Parameters:: The – type of fn, usually inferred from the argument itself.
Returns:: The constructed ScopeGuard

template<typename F> ScopeFail<F> make_scope_fail(F &&fn) noexcept#

Create a ScopeFail from a given functor.

See also

ScopeFail

Parameters:: fn – The functor to create the ScopeFail with.
Template Parameters:: The – type of fn, usually inferred from the argument itself.
Returns:: The constructed ScopeFail

class Time

#include <legate/timing/timing.h>

Deferred timestamp class.

Public Functions

std::int64_t value() const

Returns the timestamp value in this Time object.

Blocks on all Legate operations preceding the call that generated this Time object.

Returns:: A timestamp value

class Impl

template<typename T> class ProcLocalStorage

#include <legate/utilities/proc_local_storage.h>

A helper data structure to store processor-local objects.

Oftentimes, users need to create objects, usually some library handles, each of which is associated with only one processor (GPU, most likely). For those cases, users can create a ProcLocalStorage<T> that holds a unique singleton object of type T for each processor thread. The object can be retrieved simply by the get() method and internally the calls are distinguished by IDs of the processors invoking them.

Two parallel tasks running on the same processor will get the same object if they query the same ProcLocalStorage. Atomicity of access to the storage is guaranteed by the programming model running parallel tasks atomically on each processor; in other words, no synchronization is needed to call the get() method on a ProcLocalStorage even when it’s shared by multiple tasks.

Despite the name, the values that are stored in this storage don’t have static storage duration, but they are alive only as long as the owning ProcLocalStorage object is.

This example uses a ProcLocalStorage<int> to count the number of task invocations on each processor:

static void cpu_variant(legate::TaskContext context)
{
  static legate::ProcLocalStorage<int> counter{};

  if (!storage.has_value()) {
    // If this is the first visit, initialize the counter
    counter.emplace(1);
  } else {
    // Otherwise, increment the counter by 1
    ++counter.get();
  }
}

Template Parameters:: T – Type of values stored in this ProcLocalStorage.

Public Types

using value_type = T: The type of stored objects.

Public Functions

bool has_value() const noexcept

Checks if the value has been created for the executing processor.

Returns:: true if the value exists, false otherwise.

template<typename ...Args> value_type &emplace(Args&&... args)

Constructs a new value for the executing processor.

The existing value will be overwritten by the new value.

Parameters:: args – Arguments to the constructor of type T.
Returns:: A reference to the newly constructed element.

value_type &get()

Returns the value for the executing processor.

Throws:: std::logic_error – If no value exists for this processor (i.e., if has_value() returns false), or if the method is invoked outside a task
Returns:: The value for the executing processor.

const value_type &get() const

Returns the value for the executing processor.

Throws:: std::logic_error – If no value exists for this processor (i.e., if has_value() returns false), or if the method is invoked outside a task
Returns:: The value for the executing processor

template<typename F> class ScopeGuard

#include <legate/utilities/scope_guard.h>

A simple wrapper around a callable that automatically executes the callable on exiting the scope.

Template Parameters:: F – The type of the callable to execute.

Public Types

using value_type = F: The type of callable stored within the ScopeGuard.

Public Functions

explicit ScopeGuard(value_type &&fn, bool enabled = true) noexcept

Construct a ScopeGuard.

On destruction, a ScopeGuard will execute fn if and only if it is in the enabled state. fn will be invoked with no arguments, and any return value discarded. fn must be no-throw move-constructible, and must not throw any exceptions when invoked.

See also

ScopeGuard::enable()

See also

ScopeGuard::disable()

See also

ScopeGuard::enabled()

See also

ScopeFail

Parameters:

fn – The function to execute.
enabled – Whether the ScopeGuard should start in the “enabled” state.

ScopeGuard(ScopeGuard &&other) noexcept

Move-construct a ScopeGuard.

other will be left in the “disabled” state, and will not execute its held functor upon destruction. Furthermore, the held functor is moved into the receiving ScopeGuard, so other's functor may be in an indeterminate state. It is therefore not advised to re-enable other.

Parameters:: other – The ScopeGuard to move from.

ScopeGuard &operator=(ScopeGuard &&other) noexcept

Construct a ScopeGuard via move-assignment.

This routine has no effect if other and this are the same.

Parameters:: other – The ScopeGuard to move from.
Returns:: A reference to this.

~ScopeGuard() noexcept

Destroy a ScopeGuard.

If the ScopeGuard is currently in the enabled state, executes the held functor, otherwise does nothing.

bool enabled() const

Query a ScopeGuard’s state.

See also

ScopeGuard::enable()

See also

ScopeGuard::disable()

Returns:: true if the ScopeGuard is enabled, false otherwise.

void disable()

Disable a ScopeGuard.

This routine prevents a ScopeGuard from executing its held functor on destruction. On return, ScopeGuard::enabled() will return false.

Calling this routine on an already disabled ScopeGuard has no effect.

See also

ScopeGuard::enable()

void enable()

Enable a ScopeGuard.

This routine makes a ScopeGuard execute its held functor on destruction. On return, ScopeGuard::enabled() will return true.

Calling this routine on an already enabled ScopeGuard has no effect.

See also

ScopeGuard::disable()

template<typename F> class ScopeFail

#include <legate/utilities/scope_guard.h>

Similar to ScopeGuard, except that the callable is only executed if the scope is exited due to an exception.

Template Parameters:: F – The type of the callable to execute.

Public Types

using value_type = F: The type of callable stored within the ScopeFail.

Public Functions

explicit ScopeFail(value_type &&fn) noexcept

Construct a ScopeFail.

On destruction, a ScopeFail will execute fn if and only if the scope is being exited due to an uncaught exception. Therefore, unlike ScopeGuard, it is not possible to “disable” a ScopeFail.

fn will be invoked with no arguments, and any return value discarded. fn must be no-throw move-constructible, and must not throw any exceptions when invoked.

See also

ScopeGuard

Parameters:: fn – The function to execute.

~ScopeFail() noexcept

Destroy a ScopeFail.

If the ScopeFail is being destroyed due to the result of exception-related stack unwinding, then the held functor is executed, otherwise has no effect.