Settings#

cuPyNumeric has a number of runtime settings that can be configured through environment variables.

doctor#

Type:

bool (“0” or “1”)

Env var:

CUPYNUMERIC_DOCTOR

Default:

False

Attempt to warn about certain usage patterns that are inefficient with cuPyNumeric.

doctor_format#

Type:

DoctorFormat (“plain”, “csv”, or “json”)

Env var:

CUPYNUMERIC_DOCTOR_FORMAT

Default:

‘plain’

Format for cuPyNumeric ouput: plain, json, or csv.

doctor_filename#

Type:

str

Env var:

CUPYNUMERIC_DOCTOR_FILENAME

Default:

None

A filename for a file to dump cuPyNumeric output to, otherwise stdout.

doctor_filename#

Type:

bool (“0” or “1”)

Env var:

CUPYNUMERIC_DOCTOR_TRACEBACK

Default:

False

Whether cuPyNumeric Doctor output should include full tracebacks.

preload_cudalibs#

Type:

bool (“0” or “1”)

Env var:

CUPYNUMERIC_PRELOAD_CUDALIBS

Default:

False

Preload and initialize handles of all CUDA libraries (cuBLAS, cuSOLVER, etc.) used in cuPyNumeric.

warn#

Type:

bool (“0” or “1”)

Env var:

CUPYNUMERIC_WARN

Default:

False

Turn on warnings.

numpy_compat#

Type:

bool (“0” or “1”)

Env var:

CUPYNUMERIC_NUMPY_COMPATIBILITY

Default:

False

cuPyNumeric will issue additional tasks to match numpy’s results and behavior. This is currently used in the following APIs: nanmin, nanmax, nanargmin, nanargmax

fallback_stacktrace#

Type:

bool (“0” or “1”)

Env var:

CUPYNUMERIC_FALLBACK_STACKTRACE

Default:

False

Whether to dump a full stack trace whenever cuPyNumeric emits a warning about falling back to Numpy routines.

This is a read-only environment variable setting used by the runtime.

fast_math#

Type:

bool (“0” or “1”)

Env var:

CUPYNUMERIC_FAST_MATH

Default:

False

Enable certain optimized execution modes for floating-point math operations, that may violate strict IEEE specifications. Currently this flag enables the acceleration of single-precision cuBLAS routines using TF32 tensor cores.

This is a read-only environment variable setting used by the runtime.

min_gpu_chunk#

Type:

int

Env var:

CUPYNUMERIC_MIN_GPU_CHUNK

Default:

65536 (test-mode default: 2)

Minimum chunk size for GPU operations.

This is a read-only environment variable setting used by the runtime.

min_cpu_chunk#

Type:

int

Env var:

CUPYNUMERIC_MIN_CPU_CHUNK

Default:

1024 (test-mode default: 2)

Minimum chunk size for CPU operations.

This is a read-only environment variable setting used by the runtime.

min_omp_chunk#

Type:

int

Env var:

CUPYNUMERIC_MIN_OMP_CHUNK

Default:

8192 (test-mode default: 2)

Minimum chunk size for OpenMP operations.

This is a read-only environment variable setting used by the runtime.

matmul_cache_size#

Type:

int

Env var:

CUPYNUMERIC_MATMUL_CACHE_SIZE

Default:

134217728 (test-mode default: 4096)

Force cuPyNumeric to keep temporary task slices during matmul computations smaller than this threshold. Whenever the temporary space needed during computation would exceed this value the task will be batched over ‘k’ to fulfill the requirement.

This is a read-only environment variable setting used by the runtime.

test#

Type:

bool (“0” or “1”)

Env var:

LEGATE_TEST

Default:

False

Enable test mode. This sets alternative defaults for various other settings.

This is a read-only environment variable setting used by the runtime.

take_default#

Type:

str

Env var:

CUPYNUMERIC_TAKE_DEFAULT

Default:

‘auto’

Default algorithm for deferred array.take():
  • ‘auto’: let cuPyNumeric decide which algorithm to use

  • ‘index’: use advanced indexing

  • ‘task’: use a task that broadcasts the indices

disable_bounds_checking#

Type:

DisableBoundsChecking (“none”, “all”, or comma-separated selectors: indexing, take, take_along_axis, put)

Env var:

CUPYNUMERIC_DISABLE_BOUNDS_CHECKING

Default:

‘none’

Disables explicit bounds checking for advanced-indexing-related operations.

  • ‘none’: disable no targeted explicit bounds checks

  • ‘all’: disable all targeted explicit bounds checks

  • comma-separated selectors such as:

    ‘indexing,take,put’

    to disable checks only for the named operations

use_nccl_gather#

Type:

bool (“0” or “1”)

Env var:

CUPYNUMERIC_USE_NCCL_GATHER

Default:

False

Enable distributed gather via the NCCL all-to-all implementation when multiple GPUs are available.