CuPy – NumPylike API accelerated with CUDA¶
This is the CuPy documentation.
Overview¶
CuPy is an implementation of NumPycompatible multidimensional array on CUDA.
CuPy consists of cupy.ndarray
, the core multidimensional array class,
and many functions on it. It supports a subset of numpy.ndarray
interface.
The following is a brief overview of supported subset of NumPy interface:
 Basic indexing (indexing by ints, slices, newaxes, and Ellipsis)
 Most of Advanced indexing (except for some indexing patterns with boolean masks)
 Data types (dtypes):
bool_
,int8
,int16
,int32
,int64
,uint8
,uint16
,uint32
,uint64
,float16
,float32
,float64
 Most of the array creation routines (
empty
,ones_like
,diag
, etc.)  Most of the array manipulation routines (
reshape
,rollaxis
,concatenate
, etc.)  All operators with broadcasting
 All universal functions for elementwise operations (except those for complex numbers).
 Linear algebra functions, including product (
dot
,matmul
, etc.) and decomposition (cholesky
,svd
, etc.), accelerated by cuBLAS.  Reduction along axes (
sum
,max
,argmax
, etc.)
CuPy also includes the following features for performance:
 Userdefined elementwise CUDA kernels
 Userdefined reduction CUDA kernels
 Fusing CUDA kernels to optimize userdefined calculation
 Customizable memory allocator and memory pool
 cuDNN utilities
CuPy uses onthefly kernel synthesis: when a kernel call is required, it
compiles a kernel code optimized for the shapes and dtypes of given arguments,
sends it to the GPU device, and executes the kernel. The compiled code is
cached to $(HOME)/.cupy/kernel_cache
directory (this cache path can be
overwritten by setting the CUPY_CACHE_DIR
environment variable). It may
make things slower at the first kernel call, though this slow down will be
resolved at the second execution. CuPy also caches the kernel code sent to GPU
device within the process, which reduces the kernel transfer time on further
calls.
Tutorial¶
Basics of CuPy¶
In this section, you will learn about the following things:
 Basics of
cupy.ndarray
 The concept of current device
 hostdevice and devicedevice array transfer
Basics of cupy.ndarray¶
CuPy is a GPU array backend that implements a subset of NumPy interface. In the following code, cp is an abbreviation of cupy, as np is numpy as is customarily done:
>>> import numpy as np
>>> import cupy as cp
The cupy.ndarray
class is in its core, which is a compatible GPU alternative of numpy.ndarray
.
>>> x_gpu = cp.array([1, 2, 3])
x_gpu
in the above example is an instance of cupy.ndarray
.
You can see its creation of identical to NumPy
‘s one, except that numpy
is replaced with cupy
.
The main difference of cupy.ndarray
from numpy.ndarray
is that the content is allocated on the device memory.
Its data is allocated on the current device, which will be explained later.
Most of the array manipulations are also done in the way similar to NumPy. Take the Euclidean norm (a.k.a L2 norm) for example. NumPy has numpy.lina.g.norm to calculate it on CPU.
>>> x_cpu = np.array([1, 2, 3])
>>> l2_cpu = np.linalg.norm(x_cpu)
We can calculate it on GPU with CuPy in a similar way:
>>> x_gpu = cp.array([1, 2, 3])
>>> l2_gpu = cp.linalg.norm(x_gpu)
CuPy implements many functions on cupy.ndarray
objects.
See the reference for the supported subset of NumPy API.
Understanding NumPy might help utilizing most features of CuPy.
So, we recommend you to read the NumPy documentation.
Current Device¶
CuPy has a concept of the current device, which is the default device on which the allocation, manipulation, calculation etc. of arrays are taken place. Suppose the ID of current device is 0. The following code allocates array contents on GPU 0.
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
The current device can be changed by cupy.cuda.Device.use()
as follows:
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
>>> cp.cuda.Device(1).use()
>>> x_on_gpu1 = cp.array([1, 2, 3, 4, 5])
If you switch the current GPU temporarily, with statement comes in handy.
>>> with cp.cuda.Device(1):
... x_on_gpu1 = cp.array([1, 2, 3, 4, 5])
>>> x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
Most operations of CuPy is done on the current device. Be careful that if processing of an array on a noncurrent device will cause an error:
>>> with cp.cuda.Device(0):
... x_on_gpu0 = cp.array([1, 2, 3, 4, 5])
>>> with cp.cuda.Device(1):
... x_on_gpu0 * 2 # raises error
Traceback (most recent call last):
...
ValueError: Array device must be same as the current device: array device = 0 while current = 1
cupy.ndarray.device
attribute indicates the device on which the array is allocated.
>>> with cp.cuda.Device(1):
... x = cp.array([1, 2, 3, 4, 5])
>>> x.device
<CUDA Device 1>
Note
If the environment has only one device, such explicit device switching is not needed.
Data Transfer¶
Move arrays to a device¶
cupy.asarray()
can be used to move a numpy.ndarray
, a list, or any object
that can be passed to numpy.array()
to the current device:
>>> x_cpu = np.array([1, 2, 3])
>>> x_gpu = cp.asarray(x_cpu) # move the data to the current device.
cupy.asarray()
can accept cupy.ndarray
, which means we can
transfer the array between devices with this function.
>>> with cp.cuda.Device(0):
... x_gpu_0 = cp.ndarray([1, 2, 3]) # create an array in GPU 0
>>> with cp.cuda.Device(1):
... x_gpu_1 = cp.asarray(x_gpu_0) # move the array to GPU 1
Note
cupy.asarray()
does not copy the input array if possible.
So, if you put an array of the current device, it returns the input object itself.
If we do copy the array in this situation, you can use cupy.array()
with copy=True.
Actually cupy.asarray()
is equivalent to cupy.array(arr, dtype, copy=False).
Move array from a device to the host¶
Moving a device array to the host can be done by cupy.asnumpy()
as follows:
>>> x_gpu = cp.array([1, 2, 3]) # create an array in the current device
>>> x_cpu = cp.asnumpy(x_gpu) # move the array to the host.
We can also use cupy.ndarray.get()
:
>>> x_cpu = x_gpu.get()
Note
If you work with Chainer, you can also use to_cpu()
and
to_gpu()
to move arrays back and forth between
a device and a host, or between different devices.
Note that to_gpu()
has device
option to specify
the device which arrays are transferred.
How to write CPU/GPU agnostic code¶
The compatibility of CuPy with NumPy enables us to write CPU/GPU generic code.
It can be made easy by the cupy.get_array_module()
function.
This function returns the numpy
or cupy
module based on arguments.
A CPU/GPU generic function is defined using it like follows:
>>> # Stable implementation of log(1 + exp(x))
>>> def softplus(x):
... xp = cp.get_array_module(x)
... return xp.maximum(0, x) + xp.log1p(xp.exp(abs(x)))
UserDefined Kernels¶
CuPy provides easy ways to define two types of CUDA kernels: elementwise kernels and reduction kernels. We first describe how to define and call elementwise kernels, and then describe how to define and call reduction kernels.
Basics of elementwise kernels¶
An elementwise kernel can be defined by the ElementwiseKernel
class.
The instance of this class defines a CUDA kernel which can be invoked by the __call__
method of this instance.
A definition of an elementwise kernel consists of four parts: an input argument list, an output argument list, a loop body code, and the kernel name. For example, a kernel that computes a squared difference \(f(x, y) = (x  y)^2\) is defined as follows:
>>> squared_diff = cp.ElementwiseKernel(
... 'float32 x, float32 y',
... 'float32 z',
... 'z = (x  y) * (x  y)',
... 'squared_diff')
The argument lists consist of commaseparated argument definitions. Each argument definition consists of a type specifier and an argument name. Names of NumPy data types can be used as type specifiers.
Note
n
, i
, and names starting with an underscore _
are reserved for the internal use.
The above kernel can be called on either scalars or arrays with broadcasting:
>>> x = cp.arange(10, dtype=np.float32).reshape(2, 5)
>>> y = cp.arange(5, dtype=np.float32)
>>> squared_diff(x, y)
array([[ 0., 0., 0., 0., 0.],
[ 25., 25., 25., 25., 25.]], dtype=float32)
>>> squared_diff(x, 5)
array([[ 25., 16., 9., 4., 1.],
[ 0., 1., 4., 9., 16.]], dtype=float32)
Output arguments can be explicitly specified (next to the input arguments):
>>> z = cp.empty((2, 5), dtype=np.float32)
>>> squared_diff(x, y, z)
array([[ 0., 0., 0., 0., 0.],
[ 25., 25., 25., 25., 25.]], dtype=float32)
Typegeneric kernels¶
If a type specifier is one character, then it is treated as a type placeholder.
It can be used to define a typegeneric kernels.
For example, the above squared_diff
kernel can be made typegeneric as follows:
>>> squared_diff_generic = cp.ElementwiseKernel(
... 'T x, T y',
... 'T z',
... 'z = (x  y) * (x  y)',
... 'squared_diff_generic')
Type placeholders of a same character in the kernel definition indicate the same type. The actual type of these placeholders is determined by the actual argument type. The ElementwiseKernel class first checks the output arguments and then the input arguments to determine the actual type. If no output arguments are given on the kernel invocation, then only the input arguments are used to determine the type.
The type placeholder can be used in the loop body code:
>>> squared_diff_generic = cp.ElementwiseKernel(
... 'T x, T y',
... 'T z',
... '''
... T diff = x  y;
... z = diff * diff;
... ''',
... 'squared_diff_generic')
More than one type placeholder can be used in a kernel definition. For example, the above kernel can be further made generic over multiple arguments:
>>> squared_diff_super_generic = cp.ElementwiseKernel(
... 'X x, Y y',
... 'Z z',
... 'z = (x  y) * (x  y)',
... 'squared_diff_super_generic')
Note that this kernel requires the output argument explicitly specified, because the type Z
cannot be automatically determined from the input arguments.
Raw argument specifiers¶
The ElementwiseKernel class does the indexing with broadcasting automatically, which is useful to define most elementwise computations.
On the other hand, we sometimes want to write a kernel with manual indexing for some arguments.
We can tell the ElementwiseKernel class to use manual indexing by adding the raw
keyword preceding the type specifier.
We can use the special variable i
and method _ind.size()
for the manual indexing.
i
indicates the index within the loop.
_ind.size()
indicates total number of elements to apply the elementwise operation.
Note that it represents the size after broadcast operation.
For example, a kernel that adds two vectors with reversing one of them can be written as follows:
>>> add_reverse = cp.ElementwiseKernel(
... 'T x, raw T y', 'T z',
... 'z = x + y[_ind.size()  i  1]',
... 'add_reverse')
(Note that this is an artificial example and you can write such operation just by z = x + y[::1]
without defining a new kernel).
A raw argument can be used like an array.
The indexing operator y[_ind.size()  i  1]
involves an indexing computation on y
, so y
can be arbitrarily shaped and strode.
Note that raw arguments are not involved in the broadcasting.
If you want to mark all arguments as raw
, you must specify the size
argument on invocation, which defines the value of _ind.size()
.
Reduction kernels¶
Reduction kernels can be defined by the ReductionKernel
class.
We can use it by defining four parts of the kernel code:
 Identity value: This value is used for the initial value of reduction.
 Mapping expression: It is used for the preprocessing of each element to be reduced.
 Reduction expression: It is an operator to reduce the multiple mapped values.
The special variables
a
andb
are used for its operands.  Post mapping expression: It is used to transform the resulting reduced values.
The special variable
a
is used as its input. Output should be written to the output parameter.
ReductionKernel class automatically inserts other code fragments that are required for an efficient and flexible reduction implementation.
For example, L2 norm along specified axes can be written as follows:
>>> l2norm_kernel = cp.ReductionKernel(
... 'T x', # input params
... 'T y', # output params
... 'x * x', # map
... 'a + b', # reduce
... 'y = sqrt(a)', # postreduction map
... '0', # identity value
... 'l2norm' # kernel name
... )
>>> x = cp.arange(10, dtype='f').reshape(2, 5)
>>> l2norm_kernel(x, axis=1)
array([ 5.47722578, 15.96871948], dtype=float32)
Note
raw
specifier is restricted for usages that the axes to be reduced are put at the head of the shape.
It means, if you want to use raw
specifier for at least one argument, the axis
argument must be 0
or a contiguous increasing sequence of integers starting from 0
, like (0, 1)
, (0, 1, 2)
, etc.
Reference Manual¶
This is the official reference of CuPy, a multidimensional array on CUDA with a subset of NumPy interface.
Indices and tables¶
Reference¶
MultiDimensional Array (ndarray)¶
cupy.ndarray
is the CuPy counterpart of NumPy numpy.ndarray
.
It provides an intuitive interface for a fixedsize multidimensional array which resides
in a CUDA device.
For the basic concept of ndarray
s, please refer to the NumPy documentation.
cupy.ndarray 
Multidimensional array on a CUDA device. 
Code compatibility features¶
cupy.ndarray
is designed to be interchangeable with numpy.ndarray
in terms of code compatibility as much as possible.
But occasionally, you will need to know whether the arrays you’re handling are cupy.ndarray
or numpy.ndarray
.
One example is when invoking modulelevel functions such as cupy.sum()
or numpy.sum()
.
In such situations, cupy.get_array_module()
can be used.
cupy.get_array_module 
Returns the array module for arguments. 
Conversion to/from NumPy arrays¶
cupy.ndarray
and numpy.ndarray
are not implicitly convertible to each other.
That means, NumPy functions cannot take cupy.ndarray
s as inputs, and vice versa.
 To convert
numpy.ndarray
tocupy.ndarray
, usecupy.array()
orcupy.asarray()
.  To convert
cupy.ndarray
tonumpy.ndarray
, usecupy.asnumpy()
orcupy.ndarray.get()
.
Note that converting between cupy.ndarray
and numpy.ndarray
incurs data transfer between
the host (CPU) device and the GPU device, which is costly in terms of performance.
cupy.array 
Creates an array on the current device. 
cupy.asarray 
Converts an object to array. 
cupy.asnumpy 
Returns an array on the host memory from an arbitrary source array. 
Universal Functions (ufunc)¶
CuPy provides universal functions (a.k.a. ufuncs) to support various elementwise operations. CuPy’s ufunc supports following features of NumPy’s one:
 Broadcasting
 Output type determination
 Casting rules
CuPy’s ufunc currently does not provide methods such as reduce
, accumulate
, reduceat
, outer
, and at
.
Ufunc class¶
cupy.ufunc 
Universal function. 
Available ufuncs¶
Math operations¶
cupy.add 
Adds two arrays elementwise. 
cupy.subtract 
Subtracts arguments elementwise. 
cupy.multiply 
Multiplies two arrays elementwise. 
cupy.divide 
Elementwise true division (i.e. 
cupy.logaddexp 
Computes log(exp(x1) + exp(x2)) elementwise. 
cupy.logaddexp2 
Computes log2(exp2(x1) + exp2(x2)) elementwise. 
cupy.true_divide 
Elementwise true division (i.e. 
cupy.floor_divide 
Elementwise floor division (i.e. 
cupy.negative 
Takes numerical negative elementwise. 
cupy.power 
Computes x1 ** x2 elementwise. 
cupy.remainder 
Computes the remainder of Python division elementwise. 
cupy.mod 
Computes the remainder of Python division elementwise. 
cupy.fmod 
Computes the remainder of C division elementwise. 
cupy.absolute 
Elementwise absolute value function. 
cupy.rint 
Rounds each element of an array to the nearest integer. 
cupy.sign 
Elementwise sign function. 
cupy.exp 
Elementwise exponential function. 
cupy.exp2 
Elementwise exponentiation with base 2. 
cupy.log 
Elementwise natural logarithm function. 
cupy.log2 
Elementwise binary logarithm function. 
cupy.log10 
Elementwise common logarithm function. 
cupy.expm1 
Computes exp(x)  1 elementwise. 
cupy.log1p 
Computes log(1 + x) elementwise. 
cupy.sqrt 

cupy.square 
Elementwise square function. 
cupy.reciprocal 
Computes 1 / x elementwise. 
Trigonometric functions¶
cupy.sin 
Elementwise sine function. 
cupy.cos 
Elementwise cosine function. 
cupy.tan 
Elementwise tangent function. 
cupy.arcsin 
Elementwise inversesine function (a.k.a. 
cupy.arccos 
Elementwise inversecosine function (a.k.a. 
cupy.arctan 
Elementwise inversetangent function (a.k.a. 
cupy.arctan2 
Elementwise inversetangent of the ratio of two arrays. 
cupy.hypot 
Computes the hypoteneous of orthogonal vectors of given length. 
cupy.sinh 
Elementwise hyperbolic sine function. 
cupy.cosh 
Elementwise hyperbolic cosine function. 
cupy.tanh 
Elementwise hyperbolic tangent function. 
cupy.arcsinh 
Elementwise inverse of hyperbolic sine function. 
cupy.arccosh 
Elementwise inverse of hyperbolic cosine function. 
cupy.arctanh 
Elementwise inverse of hyperbolic tangent function. 
cupy.deg2rad 
Converts angles from degrees to radians elementwise. 
cupy.rad2deg 
Converts angles from radians to degrees elementwise. 
Bittwiddling functions¶
cupy.bitwise_and 
Computes the bitwise AND of two arrays elementwise. 
cupy.bitwise_or 
Computes the bitwise OR of two arrays elementwise. 
cupy.bitwise_xor 
Computes the bitwise XOR of two arrays elementwise. 
cupy.invert 
Computes the bitwise NOT of an array elementwise. 
cupy.left_shift 
Shifts the bits of each integer element to the left. 
cupy.right_shift 
Shifts the bits of each integer element to the right. 
Comparison functions¶
cupy.greater 
Tests elementwise if x1 > x2 . 
cupy.greater_equal 
Tests elementwise if x1 >= x2 . 
cupy.less 
Tests elementwise if x1 < x2 . 
cupy.less_equal 
Tests elementwise if x1 <= x2 . 
cupy.not_equal 
Tests elementwise if x1 != x2 . 
cupy.equal 
Tests elementwise if x1 == x2 . 
cupy.logical_and 
Computes the logical AND of two arrays. 
cupy.logical_or 
Computes the logical OR of two arrays. 
cupy.logical_xor 
Computes the logical XOR of two arrays. 
cupy.logical_not 
Computes the logical NOT of an array. 
cupy.maximum 
Takes the maximum of two arrays elementwise. 
cupy.minimum 
Takes the minimum of two arrays elementwise. 
cupy.fmax 
Takes the maximum of two arrays elementwise. 
cupy.fmin 
Takes the minimum of two arrays elementwise. 
Floating point values¶
cupy.isfinite 
Tests finiteness elementwise. 
cupy.isinf 
Tests if each element is the positive or negative infinity. 
cupy.isnan 
Tests if each element is a NaN. 
cupy.signbit 
Tests elementwise if the sign bit is set (i.e. 
cupy.copysign 
Returns the first argument with the sign bit of the second elementwise. 
cupy.nextafter 
Computes the nearest neighbor float values towards the second argument. 
cupy.modf 
Extracts the fractional and integral parts of an array elementwise. 
cupy.ldexp 
Computes x1 * 2 ** x2 elementwise. 
cupy.frexp 
Decomposes each element to mantissa and two’s exponent. 
cupy.fmod 
Computes the remainder of C division elementwise. 
cupy.floor 
Rounds each element of an array to its floor integer. 
cupy.ceil 
Rounds each element of an array to its ceiling integer. 
cupy.trunc 
Rounds each element of an array towards zero. 
ufunc.at¶
Currently, CuPy does not support at
for ufuncs in general.
However, cupy.scatter_add()
can substitute add.at
as both behave identically.
Routines¶
The following pages describe NumPycompatible routines. These functions cover a subset of NumPy routines.
Array Creation Routines¶
Basic creation routines¶
cupy.empty 
Returns an array without initializing the elements. 
cupy.empty_like 
Returns a new array with same shape and dtype of a given array. 
cupy.eye 
Returns a 2D array with ones on the diagonals and zeros elsewhere. 
cupy.identity 
Returns a 2D identity array. 
cupy.ones 
Returns a new array of given shape and dtype, filled with ones. 
cupy.ones_like 
Returns an array of ones with same shape and dtype as a given array. 
cupy.zeros 
Returns a new array of given shape and dtype, filled with zeros. 
cupy.zeros_like 
Returns an array of zeros with same shape and dtype as a given array. 
cupy.full 
Returns a new array of given shape and dtype, filled with a given value. 
cupy.full_like 
Returns a full array with same shape and dtype as a given array. 
Creation from other data¶
cupy.array 
Creates an array on the current device. 
cupy.asarray 
Converts an object to array. 
cupy.asanyarray 
Converts an object to array. 
cupy.ascontiguousarray 
Returns a Ccontiguous array. 
cupy.copy 
Creates a copy of a given array on the current device. 
Numerical ranges¶
cupy.arange 
Returns an array with evenly spaced values within a given interval. 
cupy.linspace 
Returns an array with evenlyspaced values within a given interval. 
cupy.logspace 
Returns an array with evenlyspaced values on a logscale. 
cupy.meshgrid 
Return coordinate matrices from coordinate vectors. 
Matrix creation¶
cupy.diag 
Returns a diagonal or a diagonal array. 
cupy.diagflat 
Creates a diagonal array from the flattened input. 
Array Manipulation Routines¶
Basic manipulations¶
cupy.copyto 
Copies values from one array to another with broadcasting. 
Shape manipulation¶
cupy.reshape 
Returns an array with new shape and same elements. 
cupy.ravel 
Returns a flattened array. 
Transposition¶
cupy.rollaxis 
Moves the specified axis backwards to the given place. 
cupy.swapaxes 
Swaps the two axes. 
cupy.transpose 
Permutes the dimensions of an array. 
Edit dimensionalities¶
cupy.atleast_1d 
Converts arrays to arrays with dimensions >= 1. 
cupy.atleast_2d 
Converts arrays to arrays with dimensions >= 2. 
cupy.atleast_3d 
Converts arrays to arrays with dimensions >= 3. 
cupy.broadcast 
Object that performs broadcasting. 
cupy.broadcast_arrays 
Broadcasts given arrays. 
cupy.broadcast_to 
Broadcast an array to a given shape. 
cupy.expand_dims 
Expands given arrays. 
cupy.squeeze 
Removes sizeone axes from the shape of an array. 
Changing kind of array¶
cupy.asarray 
Converts an object to array. 
cupy.asanyarray 
Converts an object to array. 
cupy.asfortranarray 
Return an array laid out in Fortran order in memory. 
cupy.ascontiguousarray 
Returns a Ccontiguous array. 
Joining arrays along axis¶
cupy.concatenate 
Joins arrays along an axis. 
cupy.stack 
Stacks arrays along a new axis. 
cupy.column_stack 
Stacks 1D and 2D arrays as columns into a 2D array. 
cupy.dstack 
Stacks arrays along the third axis. 
cupy.hstack 
Stacks arrays horizontally. 
cupy.vstack 
Stacks arrays vertically. 
Splitting arrays along axis¶
cupy.split 
Splits an array into multiple sub arrays along a given axis. 
cupy.array_split 
Splits an array into multiple sub arrays along a given axis. 
cupy.dsplit 
Splits an array into multiple sub arrays along the third axis. 
cupy.hsplit 
Splits an array into multiple sub arrays horizontally. 
cupy.vsplit 
Splits an array into multiple sub arrays along the first axis. 
Repeating part of arrays along axis¶
cupy.tile 
Construct an array by repeating A the number of times given by reps. 
cupy.repeat 
Repeat arrays along an axis. 
Rearranging elements¶
cupy.flip 
Reverse the order of elements in an array along the given axis. 
cupy.fliplr 
Flip array in the left/right direction. 
cupy.flipud 
Flip array in the up/down direction. 
cupy.reshape 
Returns an array with new shape and same elements. 
cupy.roll 
Roll array elements along a given axis. 
cupy.rot90 
Rotate an array by 90 degrees in the plane specified by axes. 
Binary Operations¶
Elementwise bit operations¶
cupy.bitwise_and 
Computes the bitwise AND of two arrays elementwise. 
cupy.bitwise_or 
Computes the bitwise OR of two arrays elementwise. 
cupy.bitwise_xor 
Computes the bitwise XOR of two arrays elementwise. 
cupy.invert 
Computes the bitwise NOT of an array elementwise. 
cupy.left_shift 
Shifts the bits of each integer element to the left. 
cupy.right_shift 
Shifts the bits of each integer element to the right. 
Bit packing¶
cupy.packbits 
Packs the elements of a binaryvalued array into bits in a uint8 array. 
cupy.unpackbits 
Unpacks elements of a uint8 array into a binaryvalued output array. 
Output formatting¶
cupy.binary_repr 
Return the binary representation of the input number as a string. 
Indexing Routines¶
cupy.c_ 
Translates slice objects to concatenation along the second axis. 
cupy.r_ 
Translates slice objects to concatenation along the first axis. 
cupy.nonzero 
Return the indices of the elements that are nonzero. 
cupy.where 
Return elements, either from x or y, depending on condition. 
cupy.ix_ 
Construct an open mesh from multiple sequences. 
cupy.take 
Takes elements of an array at specified indices along an axis. 
cupy.choose 

cupy.diag 
Returns a diagonal or a diagonal array. 
cupy.diagonal 
Returns specified diagonals. 
cupy.fill_diagonal 
Fill the main diagonal of the given array of any dimensionality. 
Input and Output¶
NPZ files¶
cupy.load 
Loads arrays or pickled objects from .npy , .npz or pickled file. 
cupy.save 
Saves an array to a binary file in .npy format. 
cupy.savez 
Saves one or more arrays into a file in uncompressed .npz format. 
cupy.savez_compressed 
Saves one or more arrays into a file in compressed .npz format. 
String formatting¶
cupy.array_repr 
Returns the string representation of an array. 
cupy.array_str 
Returns the string representation of the content of an array. 
Basen representations¶
cupy.binary_repr 
Return the binary representation of the input number as a string. 
cupy.base_repr 
Return a string representation of a number in the given base system. 
Linear Algebra¶
Matrix and vector products¶
cupy.dot 
Returns a dot product of two arrays. 
cupy.vdot 
Returns the dot product of two vectors. 
cupy.inner 
Returns the inner product of two arrays. 
cupy.outer 
Returns the outer product of two vectors. 
cupy.matmul 
Returns the matrix product of two arrays and is the implementation of the @ operator introduced in Python 3.5 following PEP465. 
cupy.tensordot 
Returns the tensor dot product of two arrays along specified axes. 
cupy.einsum 
Evaluates the Einstein summation convention on the operands. 
cupy.kron 
Returns the kronecker product of two arrays. 
Decompositions¶
cupy.linalg.cholesky 
Cholesky decomposition. 
cupy.linalg.qr 
QR decomposition. 
cupy.linalg.svd 
Singular Value Decomposition. 
Matrix eigenvalues¶

cupy.linalg.
eigh
(a, UPLO='L')[source]¶ Eigenvalues and eigenvectors of a symmetric matrix.
This method calculates eigenvalues and eigenvectors of a given symmetric matrix.
Note
Currenlty only 2D matrix is supported.
Note
CUDA >=8.0 is required.
Parameters:  a (cupy.ndarray) – A symmetric 2D square matrix.
 UPLO (str) – Select from
'L'
or'U'
. It specifies which part ofa
is used.'L'
uses the lower triangular part ofa
, and'U'
uses the upper triangular part ofa
.
Returns: Returns a tuple
(w, v)
.w
contains eigenvalues andv
contains eigenvectors.v[:, i]
is an eigenvector corresponding to an eigenvaluew[i]
.Return type: tuple of
ndarray
See also

cupy.linalg.
eigvalsh
(a, UPLO='L')[source]¶ Calculates eigenvalues of a symmetric matrix.
This method calculates eigenvalues a given symmetric matrix. Note that
cupy.linalg.eigh()
calculates both eigenvalues and eigenvectors.Note
Currenlty only 2D matrix is supported.
Note
CUDA >=8.0 is required.
Parameters:  a (cupy.ndarray) – A symmetric 2D square matrix.
 UPLO (str) – Select from
'L'
or'U'
. It specifies which part ofa
is used.'L'
uses the lower triangular part ofa
, and'U'
uses the upper triangular part ofa
.
Returns: Returns eigenvalues as a vector.
Return type: See also
Norms etc.¶
cupy.linalg.det 
Retruns the deteminant of an array. 
cupy.linalg.norm 
Returns one of matrix norms specified by ord parameter. 
cupy.linalg.matrix_rank 
Return matrix rank of array using SVD method 
cupy.linalg.slogdet 
Returns sign and logarithm of the determinat of an array. 
cupy.trace 
Returns the sum along the diagonals of an array. 
Solving linear equations¶
cupy.linalg.solve 
Solves a linear matrix equation. 
cupy.linalg.tensorsolve 
Solves tensor equations denoted by ax = b . 
cupy.linalg.inv 
Computes the inverse of a matrix. 
cupy.linalg.pinv 
Compute the MoorePenrose pseudoinverse of a matrix. 
Logic Functions¶
Truth value testing¶
cupy.all 
Tests whether all array elements along a given axis evaluate to True. 
cupy.any 
Tests whether any array elements along a given axis evaluate to True. 
Infinities and NaNs¶
cupy.isfinite 
Tests finiteness elementwise. 
cupy.isinf 
Tests if each element is the positive or negative infinity. 
cupy.isnan 
Tests if each element is a NaN. 
Array type testing¶
cupy.isscalar 
Returns True if the type of num is a scalar type. 
Logic operations¶
cupy.logical_and 
Computes the logical AND of two arrays. 
cupy.logical_or 
Computes the logical OR of two arrays. 
cupy.logical_not 
Computes the logical NOT of an array. 
cupy.logical_xor 
Computes the logical XOR of two arrays. 
Comparison operations¶
cupy.greater 
Tests elementwise if x1 > x2 . 
cupy.greater_equal 
Tests elementwise if x1 >= x2 . 
cupy.less 
Tests elementwise if x1 < x2 . 
cupy.less_equal 
Tests elementwise if x1 <= x2 . 
cupy.equal 
Tests elementwise if x1 == x2 . 
cupy.not_equal 
Tests elementwise if x1 != x2 . 
Mathematical Functions¶
Trigonometric functions¶
cupy.sin 
Elementwise sine function. 
cupy.cos 
Elementwise cosine function. 
cupy.tan 
Elementwise tangent function. 
cupy.arcsin 
Elementwise inversesine function (a.k.a. 
cupy.arccos 
Elementwise inversecosine function (a.k.a. 
cupy.arctan 
Elementwise inversetangent function (a.k.a. 
cupy.hypot 
Computes the hypoteneous of orthogonal vectors of given length. 
cupy.arctan2 
Elementwise inversetangent of the ratio of two arrays. 
cupy.deg2rad 
Converts angles from degrees to radians elementwise. 
cupy.rad2deg 
Converts angles from radians to degrees elementwise. 
cupy.degrees 
Converts angles from radians to degrees elementwise. 
cupy.radians 
Converts angles from degrees to radians elementwise. 
Hyperbolic functions¶
cupy.sinh 
Elementwise hyperbolic sine function. 
cupy.cosh 
Elementwise hyperbolic cosine function. 
cupy.tanh 
Elementwise hyperbolic tangent function. 
cupy.arcsinh 
Elementwise inverse of hyperbolic sine function. 
cupy.arccosh 
Elementwise inverse of hyperbolic cosine function. 
cupy.arctanh 
Elementwise inverse of hyperbolic tangent function. 
Rounding¶
cupy.rint 
Rounds each element of an array to the nearest integer. 
cupy.floor 
Rounds each element of an array to its floor integer. 
cupy.ceil 
Rounds each element of an array to its ceiling integer. 
cupy.trunc 
Rounds each element of an array towards zero. 
cupy.fix 
If given value x is positive, it return floor(x). 
Sums and products¶
cupy.sum 
Returns the sum of an array along given axes. 
cupy.prod 
Returns the product of an array along given axes. 
cupy.cumsum 
Returns the cumulative sum of an array along a given axis. 
cupy.cumprod 
Returns the cumulative product of an array along a given axis. 
Exponential and logarithm functions¶
cupy.exp 
Elementwise exponential function. 
cupy.expm1 
Computes exp(x)  1 elementwise. 
cupy.exp2 
Elementwise exponentiation with base 2. 
cupy.log 
Elementwise natural logarithm function. 
cupy.log10 
Elementwise common logarithm function. 
cupy.log2 
Elementwise binary logarithm function. 
cupy.log1p 
Computes log(1 + x) elementwise. 
cupy.logaddexp 
Computes log(exp(x1) + exp(x2)) elementwise. 
cupy.logaddexp2 
Computes log2(exp2(x1) + exp2(x2)) elementwise. 
Floating point manipulations¶
cupy.signbit 
Tests elementwise if the sign bit is set (i.e. 
cupy.copysign 
Returns the first argument with the sign bit of the second elementwise. 
cupy.ldexp 
Computes x1 * 2 ** x2 elementwise. 
cupy.frexp 
Decomposes each element to mantissa and two’s exponent. 
cupy.nextafter 
Computes the nearest neighbor float values towards the second argument. 
Arithmetic operations¶
cupy.negative 
Takes numerical negative elementwise. 
cupy.add 
Adds two arrays elementwise. 
cupy.subtract 
Subtracts arguments elementwise. 
cupy.multiply 
Multiplies two arrays elementwise. 
cupy.divide 
Elementwise true division (i.e. 
cupy.true_divide 
Elementwise true division (i.e. 
cupy.floor_divide 
Elementwise floor division (i.e. 
cupy.power 
Computes x1 ** x2 elementwise. 
cupy.fmod 
Computes the remainder of C division elementwise. 
cupy.mod 
Computes the remainder of Python division elementwise. 
cupy.remainder 
Computes the remainder of Python division elementwise. 
cupy.modf 
Extracts the fractional and integral parts of an array elementwise. 
cupy.reciprocal 
Computes 1 / x elementwise. 
Miscellaneous¶
cupy.clip 
Clips the values of an array to a given interval. 
cupy.sqrt 

cupy.square 
Elementwise square function. 
cupy.absolute 
Elementwise absolute value function. 
cupy.sign 
Elementwise sign function. 
cupy.maximum 
Takes the maximum of two arrays elementwise. 
cupy.minimum 
Takes the minimum of two arrays elementwise. 
cupy.fmax 
Takes the maximum of two arrays elementwise. 
cupy.fmin 
Takes the minimum of two arrays elementwise. 
Random Sampling (cupy.random
)¶
CuPy’s random number generation routines are based on cuRAND.
They cover a small fraction of numpy.random
.
The big difference of cupy.random
from numpy.random
is that cupy.random
supports dtype
option for most functions.
This option enables us to generate float32 values directly without any space overhead.
Sample random data¶
cupy.random.choice 
Returns an array of random values from a given 1D array. 
cupy.random.rand 
Returns an array of uniform random values over the interval [0, 1) . 
cupy.random.randn 
Returns an array of standard normal random values. 
cupy.random.randint 
Returns a scalar or an array of integer values over [low, high) . 
cupy.random.random_integers 
Return a scalar or an array of integer values over [low, high] 
cupy.random.random_sample 
Returns an array of random values over the interval [0, 1) . 
cupy.random.random 
Returns an array of random values over the interval [0, 1) . 
cupy.random.ranf 
Returns an array of random values over the interval [0, 1) . 
cupy.random.sample 
Returns an array of random values over the interval [0, 1) . 
cupy.random.bytes 
Returns random bytes. 
Distributions¶
cupy.random.gumbel 
Returns an array of samples drawn from a Gumbel distribution. 
cupy.random.lognormal 
Returns an array of samples drawn from a log normal distribution. 
cupy.random.normal 
Returns an array of normally distributed samples. 
cupy.random.standard_normal 
Returns an array of samples drawn from the standard normal distribution. 
cupy.random.uniform 
Returns an array of uniformlydistributed samples over an interval. 
Random number generator¶
cupy.random.seed 
Resets the state of the random number generator with a seed. 
cupy.random.get_random_state 
Gets the state of the random number generator for the current device. 
cupy.random.RandomState 
Portable container of a pseudorandom number generator. 
Permutations¶
cupy.random.shuffle 
Shuffles an array. 
Sorting, Searching, and Counting¶
cupy.sort 
Returns a sorted copy of an array with a stable sorting algorithm. 
cupy.lexsort 
Perform an indirect sort using an array of keys. 
cupy.argsort 
Returns the indices that would sort an array with a stable sorting. 
cupy.argmax 
Returns the indices of the maximum along an axis. 
cupy.argmin 
Returns the indices of the minimum along an axis. 
cupy.partition 
Returns a partially sorted copy of an array. 
cupy.count_nonzero 
Counts the number of nonzero values in the array. 
cupy.nonzero 
Return the indices of the elements that are nonzero. 
cupy.flatnonzero 
Return indices that are nonzero in the flattened version of a. 
cupy.where 
Return elements, either from x or y, depending on condition. 
Statistics¶
Order statistics¶
cupy.amin 
Returns the minimum of an array or the minimum along an axis. 
cupy.amax 
Returns the maximum of an array or the maximum along an axis. 
cupy.nanmin 
Returns the minimum of an array along an axis ignoring NaN. 
cupy.nanmax 
Returns the maximum of an array along an axis ignoring NaN. 
Means and variances¶
cupy.mean 
Returns the arithmetic mean along an axis. 
cupy.var 
Returns the variance along an axis. 
cupy.std 
Returns the standard deviation along an axis. 
Histograms¶
cupy.bincount 
Count number of occurrences of each value in array of nonnegative ints. 
External Functions¶
cupy.scatter_add 
Adds given values to specified elements of an array. 
Sparse matrix¶
CuPy supports sparse matrices using cuSPARSE. These matrices have the same interfaces of SciPy’s sparse matrices.
Sparse matrix classes¶
cupy.sparse.coo_matrix 
COOrdinate format sparse matrix. 
cupy.sparse.csr_matrix 
Compressed Sparse Row matrix. 
cupy.sparse.csc_matrix 
Compressed Sparse Column matrix. 
cupy.sparse.dia_matrix 
Sparse matrix with DIAgonal storage. 
cupy.sparse.spmatrix 
Base class of all sparse matrixes. 
Functions¶
Building sparse matrices¶
cupy.sparse.eye 
Creates a sparse matrix with ones on diagonal. 
cupy.sparse.identity 
Creates an identity matrix in sparse format. 
Identifying sparse matrices¶
cupy.sparse.issparse 
Checks if a given matrix is a sparse matrix. 
cupy.sparse.isspmatrix 
Checks if a given matrix is a sparse matrix. 
cupy.sparse.isspmatrix_csc 
Checks if a given matrix is of CSC format. 
cupy.sparse.isspmatrix_csr 
Checks if a given matrix is of CSR format. 
cupy.sparse.isspmatrix_coo 
Checks if a given matrix is of COO format. 
cupy.sparse.isspmatrix_dia 
Checks if a given matrix is of DIA format. 
NumPyCuPy Generic Code Support¶
cupy.get_array_module 
Returns the array module for arguments. 
LowLevel CUDA Support¶
Device management¶
cupy.cuda.Device 
Object that represents a CUDA device. 
Memory management¶
cupy.cuda.Memory 
Memory allocation on a CUDA device. 
cupy.cuda.MemoryPointer 
Pointer to a point on a device memory. 
cupy.cuda.alloc 
Calls the current allocator. 
cupy.cuda.set_allocator 
Sets the current allocator. 
cupy.cuda.MemoryPool 
Memory pool for all devices on the machine. 
Memory hook¶
cupy.cuda.MemoryHook 
Base class of hooks for Memory allocations. 
cupy.cuda.memory_hooks.DebugPrintHook 
Memory hook that prints debug information. 
Streams and events¶
cupy.cuda.Stream 
CUDA stream. 
cupy.cuda.Event 
CUDA event, a synchronization point of CUDA streams. 
cupy.cuda.get_elapsed_time 
Gets the elapsed time between two events. 
Profiler¶
cupy.cuda.profile 
Enable CUDA profiling during with statement. 
cupy.cuda.profiler.initialize 
Initialize the CUDA profiler. 
cupy.cuda.profiler.start 
Enable profiling. 
cupy.cuda.profiler.stop 
Disable profiling. 
cupy.cuda.nvtx.Mark 
Marks an instantaneous event (marker) in the application. 
cupy.cuda.nvtx.MarkC 
Marks an instantaneous event (marker) in the application. 
cupy.cuda.nvtx.RangePush 
Starts a nestead range. 
cupy.cuda.nvtx.RangePushC 
Starts a nestead range. 
cupy.cuda.nvtx.RangePop 
Ends a nestead range. 
Kernel binary memoization¶
cupy.memoize 
Makes a function memoizing the result for each argument and device. 
cupy.clear_memo 
Clears the memoized results for all functions decorated by memoize. 
Custom kernels¶
cupy.ElementwiseKernel 
Userdefined elementwise kernel. 
cupy.ReductionKernel 
Userdefined reduction kernel. 
Testing Modules¶
CuPy offers testing utilities to support unit testing.
They are under namespace cupy.testing
.
Standard Assertions¶
The assertions have same names as NumPy’s ones.
The difference from NumPy is that they can accept both numpy.ndarray
and cupy.ndarray
.
cupy.testing.assert_allclose 
Raises an AssertionError if objects are not equal up to desired tolerance. 
cupy.testing.assert_array_almost_equal 
Raises an AssertionError if objects are not equal up to desired precision. 
cupy.testing.assert_array_almost_equal_nulp 
Compare two arrays relatively to their spacing. 
cupy.testing.assert_array_max_ulp 
Check that all items of arrays differ in at most N Units in the Last Place. 
cupy.testing.assert_array_equal 
Raises an AssertionError if two array_like objects are not equal. 
cupy.testing.assert_array_list_equal 
Compares lists of arrays pairwise with assert_array_equal . 
cupy.testing.assert_array_less 
Raises an AssertionError if array_like objects are not ordered by less than. 
NumPyCuPy Consistency Check¶
The following decorators are for testing consistency between CuPy’s functions and corresponding NumPy’s ones.
cupy.testing.numpy_cupy_allclose 
Decorator that checks NumPy results and CuPy ones are close. 
cupy.testing.numpy_cupy_array_almost_equal 
Decorator that checks NumPy results and CuPy ones are almost equal. 
cupy.testing.numpy_cupy_array_almost_equal_nulp 
Decorator that checks results of NumPy and CuPy are equal w.r.t. 
cupy.testing.numpy_cupy_array_max_ulp 
Decorator that checks results of NumPy and CuPy ones are equal w.r.t. 
cupy.testing.numpy_cupy_array_equal 
Decorator that checks NumPy results and CuPy ones are equal. 
cupy.testing.numpy_cupy_array_list_equal 
Decorator that checks the resulting lists of NumPy and CuPy’s one are equal. 
cupy.testing.numpy_cupy_array_less 
Decorator that checks the CuPy result is less than NumPy result. 
cupy.testing.numpy_cupy_raises 
Decorator that checks the NumPy and CuPy throw same errors. 
Parameterized dtype Test¶
The following decorators offer the standard way for parameterized test with respect to single or the combination of dtype(s).
cupy.testing.for_dtypes 
Decorator for parameterized dtype test. 
cupy.testing.for_all_dtypes 
Decorator that checks the fixture with all dtypes. 
cupy.testing.for_float_dtypes 
Decorator that checks the fixture with all float dtypes. 
cupy.testing.for_signed_dtypes 
Decorator that checks the fixture with signed dtypes. 
cupy.testing.for_unsigned_dtypes 
Decorator that checks the fixture with all dtypes. 
cupy.testing.for_int_dtypes 
Decorator that checks the fixture with integer and optionally bool dtypes. 
cupy.testing.for_dtypes_combination 
Decorator that checks the fixture with a product set of dtypes. 
cupy.testing.for_all_dtypes_combination 
Decorator that checks the fixture with a product set of all dtypes. 
cupy.testing.for_signed_dtypes_combination 
Decorator for parameterized test w.r.t. 
cupy.testing.for_unsigned_dtypes_combination 
Decorator for parameterized test w.r.t. 
cupy.testing.for_int_dtypes_combination 
Decorator for parameterized test w.r.t. 
Parameterized order Test¶
The following decorators offer the standard way to parameterize tests with orders.
cupy.testing.for_orders 
Decorator to parameterize tests with order. 
cupy.testing.for_CF_orders 
Decorator that checks the fixture with orders ‘C’ and ‘F’. 
Profiling¶
time range¶
cupy.prof.TimeRangeDecorator 
Decorator to mark function calls with range in NVIDIA profiler 
cupy.prof.time_range 
A context manager to describe the enclosed block as a nested range 
Environment variables¶
Here are the environment variables CuPy uses.
CUPY_CACHE_DIR 
Path to the directory to store kernel cache.
$(HOME)/.cupy.kernel_cache is used by default.
See Overview for details. 
CUPY_CACHE_SAVE_CUDA_SOURCE 
If set to 1, CUDA source file will be saved along with compiled binary in the cache directory for debug purpose. It is disabled by default. Note: source file will not be saved if the compiled binary is already stored in the cache. 
CUPY_DUMP_CUDA_SOURCE_ON_ERROR 
If set to 1, when CUDA kernel compilation fails, CuPy dumps CUDA kernel code to standard error. It is disabled by default. 
For install¶
These environment variables are only used during installation.
CUDA_PATH 
Path to the directory containing CUDA.
The parent of the directory containing nvcc is used as default.
When nvcc is not found, /usr/local/cuda is used.
See Install CuPy with CUDA for details. 
NVCC 
Define the compiler to use when compiling CUDA files. 
Difference between CuPy and NumPy¶
The interface of CuPy is designed to obey that of NumPy. However, there are some differeneces.
Cast behavior from float to integer¶
Some casting behaviors from float to integer are not defined in C++ specification. The casting from a negative float to unsigned integer and infinity to integer is one of such eamples. The behavior of NumPy depends on your CPU architecture. This is Intel CPU result.
>>> np.array([1], dtype='f').astype('I')
array([4294967295], dtype=uint32)
>>> cupy.array([1], dtype='f').astype('I')
array([0], dtype=uint32)
>>> np.array([float('inf')], dtype='f').astype('i')
array([2147483648], dtype=int32)
>>> cupy.array([float('inf')], dtype='f').astype('i')
array([2147483647], dtype=int32)
Random methods support dtype argument¶
NumPy’s random value generator does not support dtype option and it always resturns a float32
value.
We support the option in CuPy because cuRAND, which is used in CuPy, supports any types of float values.
>>> np.random.randn(dtype='f')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: randn() got an unexpected keyword argument 'dtype'
>>> cupy.random.randn(dtype='f')
array(0.10689262300729752, dtype=float32)
Outofbounds indices¶
CuPy handles outofbounds indices differently by default from NumPy when using integer array indexing. NumPy handles them by raising an error, but CuPy wraps around them.
>>> x = np.array([0, 1, 2])
>>> x[[1, 3]] = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 3 is out of bounds for axis 1 with size 3
>>> x = cupy.array([0, 1, 2])
>>> x[[1, 3]] = 10
>>> x
array([10, 10, 2])
Duplicate values in indices¶
CuPy’s __setitem__
behaves differently from NumPy when integer arrays
reference the same location multiple times.
In that case, the value that is actually stored is undefined.
Here is an example of CuPy.
>>> a = cupy.zeros((2,))
>>> i = cupy.arange(10000) % 2
>>> v = cupy.arange(10000).astype(np.float)
>>> a[i] = v
>>> a
array([ 9150., 9151.])
NumPy stores the value corresponding to the last element among elements referencing duplicate locations.
>>> a_cpu = np.zeros((2,))
>>> i_cpu = np.arange(10000) % 2
>>> v_cpu = np.arange(10000).astype(np.float)
>>> a_cpu[i_cpu] = v_cpu
>>> a_cpu
array([ 9998., 9999.])
Reduction methods return zerodimensional array¶
NumPy’s reduction functions (e.g. numpy.sum()
) return scalar values (e.g. numpy.float32
).
However CuPy counterparts return zerodimensional cupy.ndarray
s.
That is because CuPy scalar values (e.g. cupy.float32
) are aliases of NumPy scalar values and are allocated in CPU memory.
If these types were returned, it would be required to synchronize between GPU and CPU.
If you want to use scalar values, cast the returned arrays explicitly.
>>> type(np.sum(np.arange(3)))
<type 'numpy.int64'>
>>> type(cupy.sum(cupy.arange(3)))
<type 'cupy.core.core.ndarray'>
API Compatibility Policy¶
This document expresses the design policy on compatibilities of CuPy APIs. Development team should obey this policy on deciding to add, extend, and change APIs and their behaviors.
This document is written for both users and developers. Users can decide the level of dependencies on CuPy’s implementations in their codes based on this document. Developers should read through this document before creating pull requests that contain changes on the interface. Note that this document may contain ambiguities on the level of supported compatibilities.
Versioning and Backward Compatibilities¶
The updates of CuPy are classified into three levels: major, minor, and revision. These types have distinct levels of backward compatibilities.
 Major update contains disruptive changes that break the backward compatibility.
 Minor update contains addition and extension to the APIs keeping the supported backward compatibility.
 Revision update contains improvements on the API implementations without changing any API specifications.
Note that we do not support full backward compatibility, which is almost infeasible for Pythonbased APIs, since there is no way to completely hide the implementation details.
Processes to Break Backward Compatibilities¶
Deprecation, Dropping, and Its Preparation¶
Any APIs may be deprecated at some minor updates. In such a case, the deprecation note is added to the API documentation, and the API implementation is changed to fire deprecation warning (if possible). There should be another way to reimplement the same things previously written with the deprecated APIs.
Any APIs may be marked as to be dropped in the future. In such a case, the dropping is stated in the documentation with the major version number on which the API is planned to be dropped, and the API implementation is changed to fire the future warning (if possible).
The actual dropping should be done through the following steps:
 Make the API deprecated. At this point, users should not need the deprecated API in their new application codes.
 After that, mark the API as to be dropped in the future. It must be done in the minor update different from that of the deprecation.
 At the major version announced in the above update, drop the API.
Consequently, it takes at least two minor versions to drop any APIs after the first deprecation.
API Changes and Its Preparation¶
Any APIs may be marked as to be changed in the future for changes without backward compatibility. In such a case, the change is stated in the documentation with the version number on which the API is planned to be changed, and the API implementation is changed to fire the future warning on the certain usages.
The actual change should be done in the following steps:
 Announce that the API will be changed in the future. At this point, the actual version of change need not be accurate.
 After the announcement, mark the API as to be changed in the future with version number of planned changes. At this point, users should not use the marked API in their new application codes.
 At the major update announced in the above update, change the API.
Supported Backward Compatibility¶
This section defines backward compatibilities that minor updates must maintain.
Documented Interface¶
CuPy has the official API documentation. Many applications can be written based on the documented features. We support backward compatibilities of documented features. In other words, codes only based on the documented features run correctly with minor/revisionupdated versions.
Developers are encouraged to use apparent names for objects of implementation details. For example, attributes outside of the documented APIs should have one or more underscores at the prefix of their names.
Undocumented behaviors¶
Behaviors of CuPy implementation not stated in the documentation are undefined. Undocumented behaviors are not guaranteed to be stable between different minor/revision versions.
Minor update may contain changes to undocumented behaviors. For example, suppose an API X is added at the minor update. In the previous version, attempts to use X cause AttributeError. This behavior is not stated in the documentation, so this is undefined. Thus, adding the API X in minor version is permissible.
Revision update may also contain changes to undefined behaviors. Typical example is a bug fix. Another example is an improvement on implementation, which may change the internal object structures not shown in the documentation. As a consequence, even revision updates do not support compatibility of pickling, unless the full layout of pickled objects is clearly documented.
Documentation Error¶
Compatibility is basically determined based on the documentation, though it sometimes contains errors. It may make the APIs confusing to assume the documentation always stronger than the implementations. We therefore may fix the documentation errors in any updates that may break the compatibility in regard to the documentation.
Note
Developers MUST NOT fix the documentation and implementation of the same functionality at the same time in revision updates as “bug fix”. Such a change completely breaks the backward compatibility. If you want to fix the bugs in both sides, first fix the documentation to fit it into the implementation, and start the API changing procedure described above.
Object Attributes and Properties¶
Object attributes and properties are sometimes replaced by each other at minor updates. It does not break the user codes, except the codes depend on how the attributes and properties are implemented.
Functions and Methods¶
Methods may be replaced by callable attributes keeping the compatibility of parameters and return values in minor updates. It does not break the user codes, except the codes depend on how the methods and callable attributes are implemented.
Exceptions and Warnings¶
The specifications of raising exceptions are considered as a part of standard backward compatibilities. No exception is raised in the future versions with correct usages that the documentation allows, unless the API changing process is completed.
On the other hand, warnings may be added at any minor updates for any APIs. It means minor updates do not keep backward compatibility of warnings.
Installation Compatibility¶
The installation process is another concern of compatibilities. We support environmental compatibilities in the following ways.
 Any changes of dependent libraries that force modifications on the existing environments must be done in major updates.
Such changes include following cases:
 dropping supported versions of dependent libraries (e.g. dropping cuDNN v2)
 adding new mandatory dependencies (e.g. adding h5py to setup_requires)
 Supporting optional packages/libraries may be done in minor updates (e.g. supporting h5py in optional features).
Note
The installation compatibility does not guarantee that all the features of CuPy correctly run on supported environments. It may contain bugs that only occurs in certain environments. Such bugs should be fixed in some updates.
Contribution Guide¶
This is a guide for all contributions to CuPy. The development of CuPy is running on the official repository at GitHub. Anyone that wants to register an issue or to send a pull request should read through this document.
Classification of Contributions¶
There are several ways to contribute to CuPy community:
 Registering an issue
 Sending a pull request (PR)
 Sending a question to CuPy User Group
 Writing a post about CuPy
This document mainly focuses on 1 and 2, though other contributions are also appreciated.
Release and Milestone¶
We are using GitHub Flow as our basic working process. In particular, we are using the master branch for our development, and releases are made as tags.
Releases are classified into three groups: major, minor, and revision. This classification is based on following criteria:
 Major update contains disruptive changes that break the backward compatibility.
 Minor update contains additions and extensions to the APIs keeping the supported backward compatibility.
 Revision update contains improvements on the API implementations without changing any API specification.
The release classification is reflected into the version number x.y.z, where x, y, and z corresponds to major, minor, and revision updates, respectively.
We set a milestone for an upcoming release. The milestone is of name ‘vX.Y.Z’, where the version number represents a revision release at the outset. If at least one feature PR is merged in the period, we rename the milestone to represent a minor release (see the next section for the PR types).
See also API Compatibility Policy.
Issues and PRs¶
Issues and PRs are classified into following categories:
 Bug: bug reports (issues) and bug fixes (PRs)
 Enhancement: implementation improvements without breaking the interface
 Feature: feature requests (issues) and their implementations (PRs)
 NoCompat: disrupts backward compatibility
 Test: test fixes and updates
 Document: document fixes and improvements
 Example: fixes and improvements on the examples
 Install: fixes installation script
 ContributionWelcome: issues that we request for contribution (only issues are categorized to this)
 Other: other issues and PRs
Issues and PRs are labeled by these categories. This classification is often reflected into its corresponding release category: Feature issues/PRs are contained into minor/major releases and NoCompat issues/PRs are contained into major releases, while other issues/PRs can be contained into any releases including revision ones.
On registering an issue, write precise explanations on what you want CuPy to be. Bug reports must include necessary and sufficient conditions to reproduce the bugs. Feature requests must include what you want to do (and why you want to do, if needed). You can contain your thoughts on how to realize it into the feature requests, though what part is most important for discussions.
Warning
If you have a question on usages of CuPy, it is highly recommended to send a post to CuPy User Group instead of the issue tracker. The issue tracker is not a place to share knowledge on practices. We may redirect question issues to CuPy User Group.
If you can write code to fix an issue, send a PR to the master branch. Before writing your code for PRs, read through the Coding Guidelines. The description of any PR must contain a precise explanation of what and how you want to do; it is the first documentation of your code for developers, a very important part of your PR.
Once you send a PR, it is automatically tested on Travis CI for Linux and Mac OS X, and on AppVeyor for Windows. Your PR need to pass at least the test for Linux on Travis CI. After the automatic test passes, some of the core developers will start reviewing your code. Note that this automatic PR test only includes CPU tests.
Note
We are also running continuous integration with GPU tests for the master branch. Since this service is running on our internal server, we do not use it for automatic PR tests to keep the server secure.
Even if your code is not complete, you can send a pull request as a workinprogress PR by putting the [WIP]
prefix to the PR title.
If you write a precise explanation about the PR, core developers and other contributors can join the discussion about how to proceed the PR.
Coding Guidelines¶
We use PEP8 and a part of OpenStack Style Guidelines related to general coding style as our basic style guidelines.
To check your code, use autopep8
and flake8
command installed by hacking
package:
$ pip install autopep8 hacking
$ autopep8 globalconfig .pep8 path/to/your/code.py
$ flake8 path/to/your/code.py
To check Cython code, use .flake8.cython
configuration file:
$ flake8 config=.flake8.cython path/to/your/cython/code.pyx
The autopep8
supports automatically correct Python code to conform to the PEP 8 style guide:
$ autopep8 inplace globalconfig .pep8 path/to/your/code.py
The flake8
command lets you know the part of your code not obeying our style guidelines.
Before sending a pull request, be sure to check that your code passes the flake8
checking.
Note that flake8
command is not perfect.
It does not check some of the style guidelines.
Here is a (notcomplete) list of the rules that flake8
cannot check.
 Relative imports are prohibited. [H304]
 Importing nonmodule symbols is prohibited.
 Import statements must be organized into three parts: standard libraries, thirdparty libraries, and internal imports. [H306]
In addition, we restrict the usage of shortcut symbols in our code base.
They are symbols imported by packages and subpackages of cupy
.
For example, cupy.cuda.Device
is a shortcut of cupy.cuda.device.Device
.
It is not allowed to use such shortcuts in the ``cupy`` library implementation.
Note that you can still use them in tests
and examples
directories.
Once you send a pull request, your coding style is automatically checked by TravisCI. The reviewing process starts after the check passes.
The CuPy is designed based on NumPy’s API design. CuPy’s source code and documents contain the original NumPy ones. Please note the followings when writing the document.
 In order to identify overlapping parts, it is preferable to add some remarks that this document is just copied or altered from the original one. It is also preferable to briefly explain the specification of the function in a short paragraph, and refer to the corresponding function in NumPy so that users can read the detailed document. However, it is possible to include a complete copy of the document with such a remark if users cannot summarize in such a way.
 If a function in CuPy only implements a limited amount of features in the original one, users should explicitly describe only what is implemented in the document.
Testing Guidelines¶
Testing is one of the most important part of your code. You must test your code by unit tests following our testing guidelines. Note that we are using the nose package and the mock package for testing, so install nose and mock before writing your code:
$ pip install nose mock
In order to run unit tests at the repository root, you first have to build Cython files in place by running the following command:
$ python setup.py develop
Note
When you modify *.pxd
files, before running python setup.py develop
, you must clean *.cpp
and *.so
files once with the following command, because Cython does not automatically rebuild those files nicely:
$ git clean fdx
Note
It’s not officially supported, but you can use ccache to reduce compilation time. On Ubuntu 16.04, you can set up as follows:
$ sudo aptget install ccache
$ export PATH=/usr/lib/ccache:$PATH
See ccache for details.
If you want to use ccache for nvcc, please install ccache v3.3 or later.
You also need to set environment variable NVCC='ccache nvcc'
.
Once the Cython modules are built, you can run unit tests simply by running nosetests
command at the repository root:
$ nosetests
It requires CUDA by default.
In order to run unit tests that do not require CUDA, pass attr='!gpu'
option to the nosetests
command:
$ nosetests path/to/your/test.py attr='!gpu'
Some GPU tests involve multiple GPUs.
If you want to run GPU tests with insufficient number of GPUs, specify the number of available GPUs by evalattr='gpu<N'
where N
is a concrete integer.
For example, if you have only one GPU, launch nosetests
by the following command to skip multiGPU tests:
$ nosetests path/to/gpu/test.py evalattr='gpu<2'
Tests are put into the tests/cupy_tests
and tests/install_tests
directories.
These have the same structure as that of cupy
and install
directories, respectively.
In order to enable test runner to find test scripts correctly, we are using special naming convention for the test subdirectories and the test scripts.
 The name of each subdirectory of
tests
must end with the_tests
suffix.  The name of each test script must start with the
test_
prefix.
Following this naming convention, you can run all the tests by just typing nosetests
at the repository root:
$ nosetests
Or you can also specify a root directory to search test scripts from:
$ nosetests tests/cupy_tests # to just run tests of CuPy
$ nosetests tests/install_tests # to just run tests of installation modules
If you modify the code related to existing unit tests, you must run appropriate commands.
Note
CuPy tests include typeexhaustive test functions which take long time to execute. If you are running tests on a multicore machine, you can parallelize the tests by following options:
$ nosetests processes=12 processtimeout=1000 tests/cupy_tests
The magic numbers can be modified for your usage.
Note that some tests require many CUDA compilations, which require a bit long time.
Without the processtimeout
option, the timeout is set shorter, causing timeout failures for many test cases.
There are many examples of unit tests under the tests
directory.
They simply use the unittest
package of the standard library.
Even if your patch includes GPUrelated code, your tests should not fail without GPU capability.
Test functions that require CUDA must be tagged by the cupy.testing.attr.gpu
:
import unittest
from cupy.testing import attr
class TestMyFunc(unittest.TestCase):
...
@attr.gpu
def test_my_gpu_func(self):
...
The functions tagged by the gpu
decorator are skipped if attr='!gpu'
is given.
We also have the cupy.testing.attr.cudnn
decorator to let nosetests
know that the test depends on cuDNN.
The test functions decorated by gpu
must not depend on multiple GPUs.
In order to write tests for multiple GPUs, use cupy.testing.attr.multi_gpu()
or cupy.testing.attr.multi_gpu()
decorators instead:
import unittest
from cupy.testing import attr
class TestMyFunc(unittest.TestCase):
...
@attr.multi_gpu(2) # specify the number of required GPUs here
def test_my_two_gpu_func(self):
...
Once you send a pull request, your code is automatically tested by TravisCI with –attr=’!gpu,!slow’ option. Since TravisCI does not support CUDA, we cannot check your CUDArelated code automatically. The reviewing process starts after the test passes. Note that reviewers will test your code without the option to check CUDArelated code.
Note
Some of numerically unstable tests might cause errors irrelevant to your changes. In such a case, we ignore the failures and go on to the review process, so do not worry about it.
We leverage doctest as well. You can run doctest by typing make doctest
at the docs
directory:
$ cd docs
$ make doctest
Installation Guide¶
Recommended Environments¶
We recommend these Linux distributions.
The following versions of Python can be used: 2.7.6+, 3.4.3+, 3.5.1+, and 3.6.0+.
Note
We are testing CuPy automatically with Jenkins, where all the above recommended environments are tested. We cannot guarantee that CuPy works on other environments including Windows and macOS (especially with CUDA support), even if CuPy looks running correctly.
CuPy is supported on Python 2.7.6+, 3.4.3+, 3.5.1+, 3.6.0+. CuPy uses C++ compiler such as g++. You need to install it before installing CuPy. This is typical installation method for each platform:
# Ubuntu 14.04
$ aptget install g++
# CentOS 7
$ yum install gccc++
If you use old setuptools
, upgrade it:
$ pip install U setuptools
Dependencies¶
Before installing CuPy, we recommend to upgrade setuptools
if you are using an old one:
$ pip install U setuptools
The following Python packages are required to install CuPy. The latest version of each package will automatically be installed if missing.
CUDA support
 CUDA 7.0, 7.5, 8.0
cuDNN support
 cuDNN v4, v5, v5.1, v6
NCCL support
 nccl v1.3+
Install CuPy¶
Install CuPy via pip¶
We recommend to install CuPy via pip:
$ pip install cupy
Note
All optional CUDA related libraries, cuDNN and NCCL, need to be installed before installing CuPy. After you update these libraries, please reinstall CuPy because you need to compile and link to the newer version of them.
Install CuPy from source¶
The tarball of the source tree is available via pip download cupy
or from the release notes page.
You can use setup.py
to install CuPy from the tarball:
$ tar zxf cupyx.x.x.tar.gz
$ cd cupyx.x.x
$ python setup.py install
You can also install the development version of CuPy from a cloned Git repository:
$ git clone https://github.com/cupy/cupy.git
$ cd cupy
$ python setup.py install
When an error occurs...¶
Use vvvv
option with pip
command.
That shows all logs of installation.
It may help you:
$ pip install cupy vvvv
Install CuPy with CUDA¶
You need to install CUDA Toolkit before installing CuPy.
If you have CUDA in a default directory or set CUDA_PATH
correctly, CuPy installer finds CUDA automatically:
$ pip install cupy
Note
CuPy installer looks up CUDA_PATH
environment variable first.
If it is empty, the installer looks for nvcc
command from PATH
environment variable and use its parent directory as the root directory of CUDA installation.
If nvcc
command is also not found, the installer tries to use the default directory for Ubuntu /usr/local/cuda
.
If you installed CUDA into a nondefault directory, you need to specify the directory with CUDA_PATH
environment variable:
$ CUDA_PATH=/opt/nvidia/cuda pip install cupy
If you want to use a custom nvcc
compiler (For example, to use ccache
), please set NVCC
environment variables before installing CuPy:
export NVCC='ccache nvcc'
Warning
If you want to use sudo
to install CuPy, note that sudo
command initializes all environment variables.
Please specify CUDA_PATH
environment variable inside sudo
like this:
$ sudo CUDA_PATH=/opt/nvidia/cuda pip install cupy
Install CuPy with cuDNN and NCCL¶
cuDNN is a library for Deep Neural Networks that NVIDIA provides. NCCL is a library for collective multiGPU communication. CuPy can use cuDNN and NCCL. If you want to enable these libraries, install them before installing CuPy. We recommend you to install developer library of deb package of cuDNN and NCCL.
If you want to install targz version of cuDNN, we recommend you to install it to CUDA directory.
For example if you uses Ubuntu Linux, copy .h
files to include
directory and .so
files to lib64
directory:
$ cp /path/to/cudnn.h $CUDA_PATH/include
$ cp /path/to/libcudnn.so* $CUDA_PATH/lib64
The destination directories depend on your environment.
If you want to use cuDNN or NCCL installed in other directory, please use CFLAGS
, LDFLAGS
and LD_LIBRARY_PATH
environment variables before installing CuPy:
export CFLAGS=I/path/to/cudnn/include
export LDFLAGS=L/path/to/cudnn/lib
export LD_LIBRARY_PATH=/path/to/cudnn/lib:$LD_LIBRARY_PATH
Install CuPy for developers¶
CuPy uses Cython (>=0.24).
Developers need to use Cython to regenerate C++ sources from pyx
files.
We recommend to use pip
with e
option for editable mode:
$ pip install U cython
$ cd /path/to/cupy/source
$ pip install e .
Users need not to install Cython as a distribution package of CuPy only contains generated sources.
Uninstall CuPy¶
Use pip to uninstall CuPy:
$ pip uninstall cupy
Note
When you upgrade Chainer, pip
sometimes install the new version without removing the old one in sitepackages
.
In this case, pip uninstall
only removes the latest one.
To ensure that Chainer is completely removed, run the above command repeatedly until pip
returns an error.
Reinstall CuPy¶
If you want to reinstall CuPy, please uninstall CuPy and then install it.
We recommend to use nocachedir
option as pip
sometimes uses cache:
$ pip uninstall cupy
$ pip install cupy nocachedir
When you install CuPy without CUDA, and after that you want to use CUDA, please reinstall CuPy. You need to reinstall CuPy when you want to upgrade CUDA.
Run CuPy with Docker¶
We are providing the official Docker image. Use nvidiadocker command to run CuPy image with GPU. You can login to the environment with bash, and run the Python interpreter:
$ nvidiadocker run it cupy/cupy /bin/bash
Or run the interpreter directly:
$ nvidiadocker run it cupy/cupy /usr/bin/python
FAQ¶
Warning message “cuDNN is not enabled” appears¶
You failed to build CuPy with cuDNN.
If you don’t need cuDNN, ignore this message.
Otherwise, retry to install CuPy with cuDNN.
vvvv
option helps you.
See Install CuPy with cuDNN and NCCL.
License¶
Copyright (c) 2015 Preferred Infrastructure, Inc.
Copyright (c) 2015 Preferred Networks, Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
CuPy¶
The CuPy is designed based on NumPy’s API. CuPy’s source code and documents contain the original NumPy ones.
Copyright (c) 20052016, NumPy Developers.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
 Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
 Neither the name of the NumPy Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.