IO#

group io

Utilities for serializing and deserializing Legate stores to disk.

HDF5#

group io-hdf5

I/O operations backed by HDF5.

Functions

LogicalArray from_file(
const std::filesystem::path &file_path,
std::string_view dataset_name,
)#

Load a HDF5 dataset into a LogicalArray.

Parameters:
  • file_path – The path to the file to load.

  • dataset_name – The name of the HDF5 dataset to load from the file.

Throws:
  • std::system_error – If file_path does not exist.

  • UnusupportedHDF5DataType – If the data type cannot be converted to a Type.

  • InvalidDataSetError – If the dataset is invalid, or is not found.

Returns:

LogicalArray The loaded array.

class UnsupportedHDF5DataTypeError : public std::invalid_argument#
#include <legate/io/hdf5/interface.h>

An exception thrown when a HDF5 datatype could not be converted to a Type.

class InvalidDataSetError : public std::invalid_argument#
#include <legate/io/hdf5/interface.h>

An exception thrown when an invalid dataset is encountered in an HDF5 file.

Public Functions

InvalidDataSetError(
const std::string &what,
std::filesystem::path path,
std::string dataset_name,
)#

Construct an InvalidDataSetError.

Parameters:
  • what – The exception string to forward to the constructor of std::invalid_argument.

  • path – The path to the HDF5 file containing the dataset.

  • dataset_name – The name of the offending dataset.

const std::filesystem::path &path() const noexcept#

Get the path to the file containing the dataset.

Returns:

The path to the file containing the dataset.

std::string_view dataset_name() const noexcept#

Get the name of the dataset.

Returns:

The name of the dataset.

KVikIO#

group io-kvikio

I/O operations backed by KVikIO.

Functions

LogicalArray from_file(
const std::filesystem::path &file_path,
const Type &type,
)#

Read a LogicalArray from a file.

The array stored in file_path must have been written by a call to to_file(const std::filesystem::path&, const LogicalArray&).

This routine expects the file to contain nothing but the raw data linearly in memory, starting at offset 0. The file must contain no other metadata, padding, or other data, it will be interpreted as data to be read into the store.

Warning

This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.

Parameters:
  • file_path – The path to the file.

  • type – The datatype of the array.

Throws:

std::system_error – If file_path does not exist.

Returns:

LogicalArray The loaded array.

void to_file(
const std::filesystem::path &file_path,
const LogicalArray &array,
)#

Write a LogicalArray to a file.

The array must be linear, i.e. have dimension of 1.

Warning

This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.

Parameters:
  • file_path – The path to the file.

  • array – The array to seralize.

Throws:

std::invalid_argument – If the dimension of array is not 1.

LogicalArray from_file(
const std::filesystem::path &file_path,
const Shape &shape,
const Type &type,
const std::vector<std::uint64_t> &tile_shape,
std::optional<std::vector<std::uint64_t>> tile_start = {},
)#

Load a LogicalArray from a file in tiles.

The file must have been written by a call to to_file(). If tile_start is not given, it is initialized with zeros.

tile_start and tile_shape must have the same size.

array must have the same number of dimensions as tiles. In effect array.dim() must equal tile_shape.size().

The array shape must be divisible by the tile shape.

Given some array stored on disk as:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

tile_shape sets the leaf-task launch group size. For example, tile_shape = [3] would result in each leaf-task getting assigned a contiguous triplet of the array:

  task_0     task_1     task_2
____|____  _____|___  ____|____
[1, 2, 3], [4, 5, 6], [7, 8, 9]

tile_start is a local offset into the tile from which to begin reading. Given tile_start = [1], in the above example would mean that the resulting array would be read as:

// First, split into tile_shape shapes.
[1, 2, 3], [4, 5, 6], [7, 8, 9]
// Then apply the offset (1) to each subgroup
   [2, 3],    [5, 6],    [8, 9]

Such that the resulting array would contain:

[2, 3, 5, 6, 8, 9]

Warning

This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.

Parameters:
  • file_path – The path to the dataset.

  • shape – The shape of the resulting array.

  • type – The datatype of the array.

  • tile_shape – The shape of each tile.

  • tile_start – The offsets into each tile from which to read.

Throws:
  • std::system_error – If file_path does not exist.

  • std::invalid_argument – If tile_shape and tile_start are not the same size.

  • std::invalid_argument – If the array dimension does not match the tile shape.

  • std::invalid_argument – If the array shape is not divisible by the tile shape.

Returns:

LogicalArray The loaded array.

void to_file(
const std::filesystem::path &file_path,
const LogicalArray &array,
const std::vector<std::uint64_t> &tile_shape,
std::optional<std::vector<std::uint64_t>> tile_start = {},
)#

Write a LogicalArray to file in tiles.

If tile_start is not given, it is initialized with zeros.

tile_start and tile_shape must have the same size.

array must have the same number of dimensions as tiles. In effect array.dim() must equal tile_shape.size().

The array shape must be divisible by the tile shape.

See from_file() for further discussion on the arguments.

Warning

This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.

Parameters:
  • file_path – The base path of the dataset to write.

  • array – The array to serialize.

  • tile_shape – The shape of the tiles.

  • tile_start – The offsets into each tile from which to write.

Throws:
  • std::invalid_argument – If tile_shape and tile_start are not the same size.

  • std::invalid_argument – If the array dimension does not match the tile shape.

  • std::invalid_argument – If the array shape is not divisible by the tile shape.

LogicalArray from_file_by_offsets(
const std::filesystem::path &file_path,
const Shape &shape,
const Type &type,
const std::vector<std::uint64_t> &offsets,
const std::vector<std::uint64_t> &tile_shape,
)#

Load a LogicalArray from a file in tiles.

array must have the same number of dimensions as tiles. In effect array.dim() must equal tile_shape.size().

This routine should be used if each leaf task in a tile should read from a potentially non-uniform offset than the others. If the offset is uniform (i.e. can be deduced by the leaf task index, and the tile shape), then from_file() should be preferred.

For example, given some array (of int32’s) stored on disk as:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

tile_shape sets the leaf-task launch group size. For example, tile_shape = {3} would result in each leaf-task getting assigned a contiguous triplet of the array:

  task_0     task_1     task_2
____|____  _____|___  ____|____
[1, 2, 3], [4, 5, 6], [7, 8, 9]

It also sets the number of elements to read. Each leaf-task will read tile_shape.volume() * type.size() bytes from the file.

offsets encodes the per-leaf-task global offset in bytes into the array for each tile. Crucially, these offsets need not (and by definition shall not) be the same for each leaf task. For example, assuming sizeof(std::int32_t) = 4:

std::vector<std::uint64_t> offsets = {
  // task_0 reads from byte index 0 of the file (i.e. starting from element 0)
  0,
  // task_1 reads from byte index 4 * 3 = 12 of the file (i.e. starting from element 4)
  3 * sizeof(std::int32_t),
  // task_2 reads from byte index 4 * 7 = 28 of the file (i.e. starting from element 8)
  7 * sizeof(std::int32_t),
};

Note how the final offset is arbitrary. If the offsets were uniform, it would start from element 7. The resulting array would then contain:

[1, 2, 3, 4, 5, 6, 8, 9]

If the data is multi-dimensional, the task IDs for the purposes of indexing into offsets are linearized in C order. For example, if we have 2x2 tiles (tile_shape = {2, 2}), the task IDs would be linearized as follows:

(0, 0) -> 0
(0, 1) -> 1
(1, 0) -> 2
(1, 1) -> 3

Warning

This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.

Parameters:
  • file_path – The path to the file to read.

  • shape – The shape of the resulting array.

  • type – The datatype of the array.

  • offsets – The per-leaf-task global offsets (in bytes) into the file from which to read.

  • tile_shape – The shape of each tile.

Throws:
  • std::system_error – If file_path does not exist.

  • std::invalid_argument – If offsets.size() does not equal the number of partitioned array tiles.

Returns:

LogicalArray The loaded array.