IO#
- group io
Utilities for serializing and deserializing Legate stores to disk.
HDF5#
- group io-hdf5
I/O operations backed by HDF5.
Functions
- LogicalArray from_file(
- const std::filesystem::path &file_path,
- std::string_view dataset_name,
Load a HDF5 dataset into a LogicalArray.
- Parameters:
file_path – The path to the file to load.
dataset_name – The name of the HDF5 dataset to load from the file.
- Throws:
std::system_error – If file_path does not exist.
UnusupportedHDF5DataType – If the data type cannot be converted to a Type.
InvalidDataSetError – If the dataset is invalid, or is not found.
- Returns:
LogicalArray The loaded array.
-
class UnsupportedHDF5DataTypeError : public std::invalid_argument#
- #include <legate/io/hdf5/interface.h>
An exception thrown when a HDF5 datatype could not be converted to a Type.
-
class InvalidDataSetError : public std::invalid_argument#
- #include <legate/io/hdf5/interface.h>
An exception thrown when an invalid dataset is encountered in an HDF5 file.
Public Functions
- InvalidDataSetError(
- const std::string &what,
- std::filesystem::path path,
- std::string dataset_name,
Construct an InvalidDataSetError.
- Parameters:
what – The exception string to forward to the constructor of std::invalid_argument.
path – The path to the HDF5 file containing the dataset.
dataset_name – The name of the offending dataset.
-
const std::filesystem::path &path() const noexcept#
Get the path to the file containing the dataset.
- Returns:
The path to the file containing the dataset.
-
std::string_view dataset_name() const noexcept#
Get the name of the dataset.
- Returns:
The name of the dataset.
KVikIO#
- group io-kvikio
I/O operations backed by KVikIO.
Functions
- LogicalArray from_file(
- const std::filesystem::path &file_path,
- const Type &type,
Read a LogicalArray from a file.
The array stored in file_path must have been written by a call to
to_file(const std::filesystem::path&, const LogicalArray&)
.This routine expects the file to contain nothing but the raw data linearly in memory, starting at offset 0. The file must contain no other metadata, padding, or other data, it will be interpreted as data to be read into the store.
Warning
This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.
- Parameters:
file_path – The path to the file.
type – The datatype of the array.
- Throws:
std::system_error – If
file_path
does not exist.- Returns:
LogicalArray The loaded array.
- void to_file(
- const std::filesystem::path &file_path,
- const LogicalArray &array,
Write a LogicalArray to a file.
The array must be linear, i.e. have dimension of 1.
Warning
This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.
- Parameters:
file_path – The path to the file.
array – The array to seralize.
- Throws:
std::invalid_argument – If the dimension of
array
is not 1.
- LogicalArray from_file(
- const std::filesystem::path &file_path,
- const Shape &shape,
- const Type &type,
- const std::vector<std::uint64_t> &tile_shape,
- std::optional<std::vector<std::uint64_t>> tile_start = {},
Load a LogicalArray from a file in tiles.
The file must have been written by a call to
to_file()
. Iftile_start
is not given, it is initialized with zeros.tile_start
andtile_shape
must have the same size.array
must have the same number of dimensions as tiles. In effectarray.dim()
must equaltile_shape.size()
.The array shape must be divisible by the tile shape.
Given some array stored on disk as:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
tile_shape
sets the leaf-task launch group size. For example,tile_shape = [3]
would result in each leaf-task getting assigned a contiguous triplet of the array:task_0 task_1 task_2 ____|____ _____|___ ____|____ [1, 2, 3], [4, 5, 6], [7, 8, 9]
tile_start
is a local offset into the tile from which to begin reading. Giventile_start = [1]
, in the above example would mean that the resulting array would be read as:// First, split into tile_shape shapes. [1, 2, 3], [4, 5, 6], [7, 8, 9] // Then apply the offset (1) to each subgroup [2, 3], [5, 6], [8, 9]
Such that the resulting array would contain:
[2, 3, 5, 6, 8, 9]
Warning
This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.
- Parameters:
file_path – The path to the dataset.
shape – The shape of the resulting array.
type – The datatype of the array.
tile_shape – The shape of each tile.
tile_start – The offsets into each tile from which to read.
- Throws:
std::system_error – If
file_path
does not exist.std::invalid_argument – If
tile_shape
andtile_start
are not the same size.std::invalid_argument – If the array dimension does not match the tile shape.
std::invalid_argument – If the array shape is not divisible by the tile shape.
- Returns:
LogicalArray The loaded array.
- void to_file(
- const std::filesystem::path &file_path,
- const LogicalArray &array,
- const std::vector<std::uint64_t> &tile_shape,
- std::optional<std::vector<std::uint64_t>> tile_start = {},
Write a LogicalArray to file in tiles.
If
tile_start
is not given, it is initialized with zeros.tile_start
andtile_shape
must have the same size.array
must have the same number of dimensions as tiles. In effectarray.dim()
must equaltile_shape.size()
.The array shape must be divisible by the tile shape.
See
from_file()
for further discussion on the arguments.Warning
This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.
- Parameters:
file_path – The base path of the dataset to write.
array – The array to serialize.
tile_shape – The shape of the tiles.
tile_start – The offsets into each tile from which to write.
- Throws:
std::invalid_argument – If
tile_shape
andtile_start
are not the same size.std::invalid_argument – If the array dimension does not match the tile shape.
std::invalid_argument – If the array shape is not divisible by the tile shape.
- LogicalArray from_file_by_offsets(
- const std::filesystem::path &file_path,
- const Shape &shape,
- const Type &type,
- const std::vector<std::uint64_t> &offsets,
- const std::vector<std::uint64_t> &tile_shape,
Load a LogicalArray from a file in tiles.
array
must have the same number of dimensions as tiles. In effectarray.dim()
must equaltile_shape.size()
.This routine should be used if each leaf task in a tile should read from a potentially non-uniform offset than the others. If the offset is uniform (i.e. can be deduced by the leaf task index, and the tile shape), then
from_file()
should be preferred.For example, given some array (of int32’s) stored on disk as:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
tile_shape
sets the leaf-task launch group size. For example,tile_shape = {3}
would result in each leaf-task getting assigned a contiguous triplet of the array:task_0 task_1 task_2 ____|____ _____|___ ____|____ [1, 2, 3], [4, 5, 6], [7, 8, 9]
It also sets the number of elements to read. Each leaf-task will read
tile_shape.volume() * type.size()
bytes from the file.offsets
encodes the per-leaf-task global offset in bytes into the array for each tile. Crucially, these offsets need not (and by definition shall not) be the same for each leaf task. For example, assumingsizeof(std::int32_t) = 4
:std::vector<std::uint64_t> offsets = { // task_0 reads from byte index 0 of the file (i.e. starting from element 0) 0, // task_1 reads from byte index 4 * 3 = 12 of the file (i.e. starting from element 4) 3 * sizeof(std::int32_t), // task_2 reads from byte index 4 * 7 = 28 of the file (i.e. starting from element 8) 7 * sizeof(std::int32_t), };
Note how the final offset is arbitrary. If the offsets were uniform, it would start from element 7. The resulting array would then contain:
[1, 2, 3, 4, 5, 6, 8, 9]
If the data is multi-dimensional, the task IDs for the purposes of indexing into
offsets
are linearized in C order. For example, if we have 2x2 tiles (tile_shape = {2, 2}
), the task IDs would be linearized as follows:(0, 0) -> 0 (0, 1) -> 1 (1, 0) -> 2 (1, 1) -> 3
Warning
This API is experimental. A future release may change or remove this API without warning, deprecation period, or notice. The user is nevertheless encouraged to use this API, and submit any feedback to legate@nvidia.com.
- Parameters:
file_path – The path to the file to read.
shape – The shape of the resulting array.
type – The datatype of the array.
offsets – The per-leaf-task global offsets (in bytes) into the file from which to read.
tile_shape – The shape of each tile.
- Throws:
std::system_error – If
file_path
does not exist.std::invalid_argument – If
offsets.size()
does not equal the number of partitioned array tiles.
- Returns:
LogicalArray The loaded array.