Settings#
cuPyNumeric has a number of runtime settings that can be configured through environment variables.
doctor#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_DOCTOR- Default:
False
Attempt to warn about certain usage patterns that are inefficient with cuPyNumeric.
doctor_format#
- Type:
DoctorFormat (“plain”, “csv”, or “json”)
- Env var:
CUPYNUMERIC_DOCTOR_FORMAT- Default:
‘plain’
Format for cuPyNumeric ouput: plain, json, or csv.
doctor_filename#
- Type:
str
- Env var:
CUPYNUMERIC_DOCTOR_FILENAME- Default:
None
A filename for a file to dump cuPyNumeric output to, otherwise stdout.
doctor_filename#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_DOCTOR_TRACEBACK- Default:
False
Whether cuPyNumeric Doctor output should include full tracebacks.
preload_cudalibs#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_PRELOAD_CUDALIBS- Default:
False
Preload and initialize handles of all CUDA libraries (cuBLAS, cuSOLVER, etc.) used in cuPyNumeric.
warn#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_WARN- Default:
False
Turn on warnings.
numpy_compat#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_NUMPY_COMPATIBILITY- Default:
False
cuPyNumeric will issue additional tasks to match numpy’s results and behavior. This is currently used in the following APIs: nanmin, nanmax, nanargmin, nanargmax
fallback_stacktrace#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_FALLBACK_STACKTRACE- Default:
False
Whether to dump a full stack trace whenever cuPyNumeric emits a warning about falling back to Numpy routines.
This is a read-only environment variable setting used by the runtime.
fast_math#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_FAST_MATH- Default:
False
Enable certain optimized execution modes for floating-point math operations, that may violate strict IEEE specifications. Currently this flag enables the acceleration of single-precision cuBLAS routines using TF32 tensor cores.
This is a read-only environment variable setting used by the runtime.
min_gpu_chunk#
- Type:
int
- Env var:
CUPYNUMERIC_MIN_GPU_CHUNK- Default:
65536 (test-mode default: 2)
Minimum chunk size for GPU operations.
This is a read-only environment variable setting used by the runtime.
min_cpu_chunk#
- Type:
int
- Env var:
CUPYNUMERIC_MIN_CPU_CHUNK- Default:
1024 (test-mode default: 2)
Minimum chunk size for CPU operations.
This is a read-only environment variable setting used by the runtime.
min_omp_chunk#
- Type:
int
- Env var:
CUPYNUMERIC_MIN_OMP_CHUNK- Default:
8192 (test-mode default: 2)
Minimum chunk size for OpenMP operations.
This is a read-only environment variable setting used by the runtime.
matmul_cache_size#
- Type:
int
- Env var:
CUPYNUMERIC_MATMUL_CACHE_SIZE- Default:
134217728 (test-mode default: 4096)
Force cuPyNumeric to keep temporary task slices during matmul computations smaller than this threshold. Whenever the temporary space needed during computation would exceed this value the task will be batched over ‘k’ to fulfill the requirement.
This is a read-only environment variable setting used by the runtime.
test#
- Type:
bool (“0” or “1”)
- Env var:
LEGATE_TEST- Default:
False
Enable test mode. This sets alternative defaults for various other settings.
This is a read-only environment variable setting used by the runtime.
take_default#
- Type:
str
- Env var:
CUPYNUMERIC_TAKE_DEFAULT- Default:
‘auto’
- Default algorithm for deferred array.take():
‘auto’: let cuPyNumeric decide which algorithm to use
‘index’: use advanced indexing
‘task’: use a task that broadcasts the indices
disable_bounds_checking#
- Type:
DisableBoundsChecking (“none”, “all”, or comma-separated selectors: indexing, take, take_along_axis, put)
- Env var:
CUPYNUMERIC_DISABLE_BOUNDS_CHECKING- Default:
‘none’
Disables explicit bounds checking for advanced-indexing-related operations.
‘none’: disable no targeted explicit bounds checks
‘all’: disable all targeted explicit bounds checks
- comma-separated selectors such as:
‘indexing,take,put’
to disable checks only for the named operations
use_nccl_gather#
- Type:
bool (“0” or “1”)
- Env var:
CUPYNUMERIC_USE_NCCL_GATHER- Default:
False
Enable distributed gather via the NCCL all-to-all implementation when multiple GPUs are available.