Neural Network Libraries¶
Neural Network Libraries is deep learning framework that is intended to be used for research, development, and production. We aim it running everywhere like desktop PCs, HPC clusters, embedded devices and production servers.
This document describes how to use the Python API and C++ API, the contribution guide for developers, and the license term of this software. The Python API is more suitable for fast prototyping and experimentation of deep learning systems, while the C++ API is for deploying inference or training algorithms into embedded systems and servers (The documentation is not available so far. We will make it available soon). The framework is designed modularity and extensibility in mind. Community contributors can add a new operator or optimizer module of neural networks, and a specialized implementation of neural network modules for a specific target device as an extension.
Python Package¶
The Python API built on top of our C++11 core maximizes the flexibility of the design of neural networks , and encourages fast prototyping and experimentation. NNabla works on both Python>=2.7 and Python>=3.5.
Python Package Installation¶
There are three ways to install NNabla Python package.
Install From Binary¶
Installation on Linux¶
Prerequisites¶
This installation instruction describes how to install NNabla using pip on Ubuntu 16.04 (64bit).
 Required software.
 Python 2.7 or Python>=3.4: PIP
 Recommended software (for NVIDIA GPU users).
 CUDA Toolkit 9.2 / cuDNN 7.1
Note: Although this provides the instruction only on Ubuntu 16.04, you can install NNabla using pip on many Linux with little extra dependencies installed.
Installation¶
Install NNabla package via pip:
sudo pip install U nnabla
Then, check if it works by running:
python c "import nnabla"
20180626 15:20:16,759 [nnabla][INFO]: Initializing CPU extension...
If you are GPU user, follow the following instruction. Before installing NNabla extension, make sure that you have a machine/env CUDA and cuDNN are installed (See installcuda9ubuntu16).
Then,
sudo pip install U nnabla_ext_cuda
and check if all works.
python c "import nnabla_ext.cuda, nnabla_ext.cudnn"
20180626 15:20:36,085 [nnabla][INFO]: Initializing CPU extension...
20180626 15:20:36,257 [nnabla][INFO]: Initializing CUDA extension...
20180626 15:20:36,257 [nnabla][INFO]: Initializing cuDNN extension...
Note that the CUDA 9.2 and cuDNN 7.1 is fixed, and you can also install the cuda extension among the follows.
 nnablaextcuda80 (CUDA 8.0 x cuDNN 7.1)
 nnablaextcuda90 (CUDA 9.0 x cuDNN 7.1)
 nnablaextcuda91 (CUDA 9.1 x cuDNN 7.1)
 nnablaextcuda92 (CUDA 9.2 x cuDNN 7.1)
Run an Example¶
Get the examples (, and unzip) or clone NNabla Examples repository, and go to the MNIST folder.
cd nnablaexamples/mnistcollection/
Run MNIST classification.
python classification.py
Run MNIST classification with CUDA/cuDNN.
python classification.py c cudnn
FAQ¶
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cudarepoubuntu1604_9.2.881_amd64.deb
sudo dpkg i cudarepoubuntu1604_9.2.881_amd64.deb
sudo aptget update
sudo apt install y cuda
Use libgcc 5 and numpy 1.13.0 or the greater, and note that numba depends on the older numpy so please uninstall numba first (The following is for Python2).
conda create n py2 python=2.7 anaconda # if necessary
source activate py2
conda install libgcc
conda install c anaconda numpy=1.13.0
Then, you can follow the usual installation workflow.
When you got the error,
ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory
Please download cuDNN 7.1 for CUDA 9.2, put it in /usr/local/cuda/lib/ or /usr/local/cuda/lib64/ as the usual workflow, or set LD_LIBRARY_PATH as the following,
tar zxvf cudnn9.2linuxx64v7.1.tgz
export LD_LIBRARY_PATH=$(pwd)/cuda/lib64:$LD_LIBRARY_PATH
If you do not have the root privilege, please use virtualenv or Anaconda. After you downloaded cuDNN v7, do the following.
tar zxvf cudnn9.2linuxx64v7.1.tgz
export LD_LIBRARY_PATH=$(pwd)/cuda/lib64:$LD_LIBRARY_PATH
We actually tested other linux distributions and versions; Ubuntu 14.04, CentOS 6.9, 7.3, Fedora 23, 25, 26, and RHEL 7.3 on various environments; Baremetal server, AWS instance, and/or Docker machine. Thus, you can install in almost the same way described here. The details of howtoinstall for each are coming soon.
Installation on Windows¶
Prerequisites¶
We tested on Windows8.1 64bit and Windows10 64bit.
The following software are required for installation:
 Required software.
 Python 2.7 or Python>=3.5: PIP
 Microsoft Visual C++ 2015 Redistributable
 Recommended.
 CUDA Toolkit 9.2 / cuDNN 7.1 (for NVIDIA GPU users)
Setup environment¶
In this instruction, we use miniconda.
Get and install the windows binary from here
And then install required packages from command prompt.
> conda install scipy scikitimage ipython
If your network is using proxy and setup fails, configure proxy server with environment variable and try install again.
> SET HTTP_PROXY=http://(enter the address of the http proxy server here)
> SET HTTPS_PROXY=https://(enter the address of the https proxy server here)
If you are using a NVIDIA GPU, execution speed will be drastically improved by installing the following software.
To install cuDNN, copy bin, include and lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2
Install¶
Install CPU package.
> pip install nnabla
If you are using a NVIDIA GPU, you can also install the CUDA/cuDNN package.
> pip install nnabla_ext_cuda
Check for running.
> ipython
In [1]: import nnabla
20170606 21:36:07,101 [nnabla][Level 99]: Initializing CPU extension...
In [2]: exit
>
Check for running (CUDA/cuDNN).
> ipython
In [1]: import nnabla_ext.cudnn
20170616 18:42:18,881 [nnabla][Level 99]: Initializing CPU extension...
20170616 18:42:19,923 [nnabla][Level 99]: Initializing CUDA extension...
20170616 18:42:20,243 [nnabla][Level 99]: Initializing cuDNN extension...
In [2]: exit
>
Note that the CUDA 9.2 and cuDNN 7.1 is fixed, and you can also install the cuda extension among the follows.
 nnablaextcuda80 (CUDA 8.0 x cuDNN 7.1)
 nnablaextcuda90 (CUDA 9.0 x cuDNN 7.1)
 nnablaextcuda91 (CUDA 9.1 x cuDNN 7.1)
 nnablaextcuda92 (CUDA 9.2 x cuDNN 7.1)
Run an Example¶
Get the examples (, and unzip) or clone NNabla Examples repository, and go to the MNIST folder.
> cd nnablaexamples\mnistcollection
Run MNIST classification
nnablaexamples\mnistcollection > python classification.py
Run MNIST classification with CUDA/cuDNN
nnablaexamples\mnistcollection > python classification.py c cudnn
Installation on macOS¶
NOTE: Our testing coverage in terms of environments and machines on macOS is very limited. Please submit an issue if you have any trouble.
Prerequisites¶
We test the installation on macOS Sierra.
The following software are required for installation:
 Python 2.7 or Python>=3.4 (We’d recommend you to setup Python using Anaconda or Miniconda).
 pip (bundled in Conda Python)
 wheel (bundled in Conda Python)
 setuptools (bundled in Conda Python. You may need to upgrade the version of setuptools with
pip install U nodeps setuptools
.)
Install¶
pip install nnabla
NOTE: Binary package installation for the CUDA extension is not provided so far.
The following block naively checks if installation succeeds:
python c "import nnabla"
Run an Example¶
`Get<https://github.com/sony/nnablaexamples/archive/master.zip>`_ (and unzip) or clone NNabla Examples repository, and go to the MNIST folder.
cd nnablaexamples/mnistcollection/
Then, run an MNIST classification:
python classification.py
Install From Source¶
Documentation of build from source has been moved to Github repository.
Python API Tutorial¶
The following tutorial documents are automatically generated from Jupyter notebook files listed in NNabla Tutorial. If you want to run these stepbystep, follow the link and see the instruction found there.
NNabla by Examples¶
This tutorial demonstrates how you can write a script to train a neural network by using a simple hand digits classification task.
Note: This tutorial notebook requires scikitlearn and matplotlib installed in your Python environment.
First let us prepare some dependencies.
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
from nnabla.monitor import tile_images
import numpy as np
import matplotlib.pyplot as plt
import tiny_digits
%matplotlib inline
np.random.seed(0)
imshow_opt = dict(cmap='gray', interpolation='nearest')
20170626 23:09:49,971 [nnabla][INFO]: Initializing CPU extension...
The tiny_digits
module is located under this folder. It provides
some utilities for loading a handwrittendigit classification dataset
(MNIST) available in scikitlearn.
Logistic Regression¶
We will first start by defining a computation graph for logistic regression. (For details on logistic regression, see Appendix A.)
The training will be done by gradient descent, where gradients are calculated using the error backpropagation algorithm (backprop).
Preparing a Toy Dataset¶
This section just prepares a dataset to be used for demonstration of NNabla usage.
digits = tiny_digits.load_digits(n_class=10)
tiny_digits.plot_stats(digits)
Num images: 1797
Image shape: (8, 8)
Labels: [0 1 2 3 4 5 6 7 8 9]
The next block creates a dataset loader which is a generator providing images and labels as minibatches. Note that this dataset is just an example purpose and not a part of NNabla.
data = tiny_digits.data_iterator_tiny_digits(digits, batch_size=64, shuffle=True)
20170626 23:09:50,545 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:09:50,546 [nnabla][INFO]: Using DataSourceWithMemoryCache
20170626 23:09:50,546 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:09:50,547 [nnabla][INFO]: Onmemory
20170626 23:09:50,547 [nnabla][INFO]: Using DataIterator
A minibatch is as follows. img
and label
are in
numpy.ndarray
.
img, label = data.next()
plt.imshow(tile_images(img), **imshow_opt)
print "labels:", label.reshape(8, 8)
print "Label shape:", label.shape
labels: [[ 2. 8. 2. 6. 6. 7. 1. 9.]
[ 8. 5. 2. 8. 6. 6. 6. 6.]
[ 1. 0. 5. 8. 8. 7. 8. 4.]
[ 7. 5. 4. 9. 2. 9. 4. 7.]
[ 6. 8. 9. 4. 3. 1. 0. 1.]
[ 8. 6. 7. 7. 1. 0. 7. 6.]
[ 2. 1. 9. 6. 7. 9. 0. 0.]
[ 5. 1. 6. 3. 0. 2. 3. 4.]]
Label shape: (64, 1)
Preparing the Computation Graph¶
NNabla provides two different ways for backpropbased gradient descent optimization. One is with a static graph, and another is with a dynamic graph. We are going to show a static version first.
# Forward pass
x = nn.Variable(img.shape) # Define an image variable
with nn.parameter_scope("affine1"):
y = PF.affine(x, 10) # Output is 10 class
This code block shows one of the most important features in graph
building in NNabla, the parameter scope. The first line defines an
input variable x
. The second line creates a parameter scope. The
third line then applies PF.affine
 an affine transform  to x
,
and creates a variable y
holding that result. Here, the PF
(parametric_function) module provides functions that contain learnable
parameters, such as affine transforms (which contains weights),
convolution (which contains kernels) and batch normalization (which
contains transformation factors and coefficients). We will call these
functions as parametric functions. The parameters are created and
initialized randomly at function call, and registered by a name
“affine1” using parameter_scope
context.
# Building a loss graph
t = nn.Variable(label.shape) # Define an target variable
loss = F.mean(F.softmax_cross_entropy(y, t)) # Softmax Xentropy fits multiclass classification problems
The remaining lines shown above define a target variable and attach functions for loss at the end of the graph. Note that the static graph build doesn’t execute any computation, but the shapes of output variables are inferred. Therefore, we can inspect the shapes of each variable at this time:
print "Printing shapes of variables"
print x.shape
print y.shape
print t.shape
print loss.shape # empty tuple means scalar
Printing shapes of variables
(64, 1, 8, 8)
(64, 10)
(64, 1)
()
Executing a static graph¶
You can execute the computation of the graph by calling the
forward()
method in a sink variable. Inputs can be set via .d
accessor. It will borrow CPU array references as numpy.ndarray
.
# Set data
x.d = img
t.d = label
# Execute a forward pass
loss.forward()
# Showing results
print "Prediction score of 0th image:", y.d[0]
print "Loss:", loss.d
Prediction score of 0th image: [ 9.75851917 6.49118519 16.47323608 1.36296904 0.78583491
4.08872032 7.84134388 2.42956853 3.31485462 3.61868763]
Loss: 10.6016616821
The output doesn’t make sense since the network is just randomly initialized.
Backward propagation through the graph¶
The parameters registered by parameter_scope
management function can
be queried by get_parameters()
as a dict format.
print nn.get_parameters()
OrderedDict([('affine1/affine/W', <Variable((64, 10), need_grad=True) at 0x7fa0ba361d50>), ('affine1/affine/b', <Variable((10,), need_grad=True) at 0x7fa0ba361ce8>)])
Before executing backpropagation, we should initialize gradient buffers of all parameter to zeros.
for param in nn.get_parameters().values():
param.grad.zero()
Then, you can execute backprop by calling backward()
method at the
sink variable.
# Compute backward
loss.backward()
# Showing gradients.
for name, param in nn.get_parameters().items():
print name, param.shape, param.g.flat[:20] # Showing first 20.
affine1/affine/W (64, 10) [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 4.98418584e02 8.72317329e03
4.06671129e02 4.68742661e02 2.52632981e09 7.86017510e04
9.06870365e02 1.56249944e02 1.56217301e02 3.12499963e02]
affine1/affine/b (10,) [ 0.42710391 0.01852455 0.07369987 0.04687012 0.07798236 0.03664626
0.01651323 0.1249291 0.11862005 0.09374455]
Gradient is stored in grad field of Variable
. .g
accessor can be
used to access grad data in numpy.ndarray
format.
Optimizing parameters (=Training)¶
To optimize parameters, we provide solver module (aliased as S here). The solver module contains a bunch of optimizer implementations such as SGD, SGD with momentum, Adam etc. The below block creates SGD solver and sets parameters of logistic regression to it.
# Create a solver (gradientbased optimizer)
learning_rate = 1e3
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters()) # Set parameter variables to be updated.
In the next block, we demonstrate a single step of optimization loop.
solver.zero_grad()
line does equivalent to calling .grad.zero()
for all parameters as we shown above. After backward computation, we
apply weight decay, then applying gradient descent implemented in Sgd
solver class as follows
where \(\eta\) denotes learning rate.
# One step of training
x.d, t.d = data.next()
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
print loss.d
12.9438686371
Next block iterates optimization steps, and shows the loss decreases.
for i in range(1000):
x.d, t.d = data.next()
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print i, loss.d
0 12.6905069351
100 3.17041015625
200 1.60036706924
300 0.673069953918
400 0.951370298862
500 0.724424362183
600 0.361597299576
700 0.588107347488
800 0.28792989254
900 0.415006935596
Show prediction¶
The following code displays training results.
x.d, t.d = data.next() # Here we predict images from training set although it's useless.
y.forward() # You can execute a sub graph.
plt.imshow(tile_images(x.d), **imshow_opt)
print "prediction:"
print y.d.argmax(axis=1).reshape(8, 8) # Taking a class index based on prediction score.
prediction:
[[5 0 1 9 0 1 3 3]
[2 4 1 7 4 5 6 5]
[7 7 9 7 9 0 7 3]
[5 3 7 6 6 8 0 9]
[0 1 3 5 5 5 4 9]
[1 0 0 8 5 1 8 8]
[7 5 0 7 6 9 0 0]
[0 6 2 6 4 4 2 6]]
Dynamic graph construction support¶
This is another way of running computation graph in NNabla. This example doesn’t show how useful dynamic graph is, but shows a bit of flavor.
The next block just define computation graph building as functions for later use.
def logreg_forward(x):
with nn.parameter_scope("affine1"):
y = PF.affine(x, 10)
return y
def logreg_loss(y, t):
loss = F.mean(F.softmax_cross_entropy(y, t)) # Softmax Xentropy fits multiclass classification problems
return loss
To run a computation graph dynamically during creation, you use
nnabla.auto_forward()
context as you see in the below block. By
this, computation is fired immediately at functions are called. (You can
also use nnabla.set_auto_forward(auto)
to set the autoforward state
globally.)
x = nn.Variable(img.shape)
t = nn.Variable(label.shape)
x.d, t.d = data.next()
with nn.auto_forward(): # Graph are executed
y = logreg_forward(x)
loss = logreg_loss(y, t)
print "Loss:", loss.d
plt.imshow(tile_images(x.d), **imshow_opt)
print "prediction:"
print y.d.argmax(axis=1).reshape(8, 8)
Loss: 0.43071603775
prediction:
[[9 3 5 0 1 9 9 2]
[5 6 6 2 7 5 1 1]
[3 7 7 6 0 8 3 8]
[0 6 4 6 0 6 9 9]
[6 1 2 5 8 3 2 4]
[1 4 4 0 5 7 1 7]
[7 8 9 5 8 3 7 8]
[5 7 5 3 3 0 0 7]]
Backward computation can be done on a dynamically constructed graph.
solver.zero_grad()
loss.backward()
MultiLayer Perceptron (MLP)¶
In this section, you see an example of MLP graph building and training.
Before starting, we clear all parameters registered in the logistic regression example.
nn.clear_parameters() # Clear all parameters
Here is the function that builds a MLP with an arbitrary depth and width for 10 class classification.
def mlp(x, hidden=[16, 32, 16]):
hs = []
with nn.parameter_scope("mlp"): # Parameter scope can be nested
h = x
for hid, hsize in enumerate(hidden):
with nn.parameter_scope("affine{}".format(hid + 1)):
h = F.tanh(PF.affine(h, hsize))
hs.append(h)
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y, hs
# Construct a MLP graph
y, hs = mlp(x)
print "Printing shapes"
print "x:", x.shape
for i, h in enumerate(hs):
print "h{}:".format(i + 1), h.shape
print "y:", y.shape
Printing shapes
x: (64, 1, 8, 8)
h1: (64, 16)
h2: (64, 32)
h3: (64, 16)
y: (64, 10)
# Training
loss = logreg_loss(y, t) # Reuse logreg loss function.
# Copied from the above logreg example.
def training(steps, learning_rate):
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters()) # Set parameter variables to be updated.
for i in range(steps):
x.d, t.d = data.next()
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print i, loss.d
# Training
training(1000, 1e2)
0 2.42193937302
100 1.83251476288
200 1.49943637848
300 1.30751883984
400 1.00974023342
500 0.904026031494
600 0.873289525509
700 0.725554704666
800 0.614291608334
900 0.555113613605
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
def scale01(h):
return (h  h.min()) / (h.max()  h.min())
def imshow(img, title):
global gid
plt.subplot(num_plot, 1, gid)
gid += 1
plt.title(title)
plt.imshow(img, **imshow_opt)
plt.axis('off')
plt.figure(figsize=(2, 5))
imshow(x.d[0, 0], 'x')
for hid, h in enumerate(hs):
imshow(scale01(h.d[0]).reshape(1, 8), 'h{}'.format(hid + 1))
imshow(scale01(y.d[0]).reshape(2, 5), 'y')
Convolutional Neural Network with CUDA acceleration¶
Here we demonstrates a CNN with CUDA GPU acceleration.
nn.clear_parameters()
def cnn(x):
with nn.parameter_scope("cnn"): # Parameter scope can be nested
with nn.parameter_scope("conv1"):
c1 = F.tanh(PF.batch_normalization(
PF.convolution(x, 4, (3, 3), pad=(1, 1), stride=(2, 2))))
with nn.parameter_scope("conv2"):
c2 = F.tanh(PF.batch_normalization(
PF.convolution(c1, 8, (3, 3), pad=(1, 1))))
c2 = F.average_pooling(c2, (2, 2))
with nn.parameter_scope("fc3"):
fc3 = F.tanh(PF.affine(c2, 32))
with nn.parameter_scope("classifier"):
y = PF.affine(fc3, 10)
return y, [c1, c2, fc3]
To enable CUDA extension in NNabla, you have to install nnablaextcuda
package first. See the install
guide.
After installing the CUDA extension, you can easily switch to run on
CUDA by specifying a context before building a graph. We strongly
recommend using a CUDNN context that is fast. Although the context class
can be instantiated by nn.Context()
, specifying a context descriptor
might be a bit complicated for users. There for we recommend create a
context by using a helper function get_extension_context()
found in the
nnabla.ext_utils
module. NNabla officially supports cpu
and cudnn
as a context specifier passed to the first argument
(extension name). NOTE: By setting the cudnn context as a global default
context, Functions and solves created are instantiated with CUDNN
(preferred) mode. You can also specify a context using
with nn.context_scope()
. See API
reference
for details.
# Run on CUDA
from nnabla.ext_utils import get_extension_context
cuda_device_id = 0
ctx = get_extension_context('cudnn', device_id=cuda_device_id)
print "Context:", ctx
nn.set_default_context(ctx) # Set CUDA as a default context.
y, hs = cnn(x)
loss = logreg_loss(y, t)
20170626 23:09:54,555 [nnabla][INFO]: Initializing CUDA extension...
20170626 23:09:54,731 [nnabla][INFO]: Initializing cuDNN extension...
Context: Context(backend='cpucuda', array_class='CudaCachedArray', device_id='0', compute_backend='defaultcudnn')
training(1000, 1e1)
0 2.34862923622
100 1.00527024269
200 0.416576713324
300 0.240603536367
400 0.254562884569
500 0.206138283014
600 0.220851421356
700 0.161689639091
800 0.230873346329
900 0.121101222932
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
plt.figure(figsize=(2, 8))
imshow(x.d[0, 0], 'x')
imshow(tile_images(hs[0].d[0][:, None]), 'conv1')
imshow(tile_images(hs[1].d[0][:, None]), 'conv2')
imshow(hs[2].d[0].reshape(1, 8), 'fc3')
imshow(scale01(y.d[0]).reshape(2, 5), 'y')
nn.save_parameters
writes parameters registered in
parameter_scope
system in HDF5 format. We use it a later example.
path_cnn_params = "tmp.params.cnn.h5"
nn.save_parameters(path_cnn_params)
20170626 23:09:56,132 [nnabla][INFO]: Parameter save (hdf5): tmp.params.cnn.h5
Recurrent Neural Network (Elman RNN)¶
This is an example of recurrent neural network training.
nn.clear_parameters()
def rnn(xs, h0, hidden=32):
hs = []
with nn.parameter_scope("rnn"):
h = h0
# Time step loop
for x in xs:
# Note: Parameter scopes are reused over time
# which means parameters are shared over time.
with nn.parameter_scope("x2h"):
x2h = PF.affine(x, hidden, with_bias=False)
with nn.parameter_scope("h2h"):
h2h = PF.affine(h, hidden)
h = F.tanh(x2h + h2h)
hs.append(h)
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y, hs
It is not meaningful, but just a demonstration purpose. We split an image into 2 by 2 grids, and feed them sequentially into RNN.
def split_grid4(x):
x0 = x[..., :4, :4]
x1 = x[..., :4, 4:]
x2 = x[..., 4:, :4]
x3 = x[..., 4:, 4:]
return x0, x1, x2, x3
hidden = 32
seq_img = split_grid4(img)
seq_x = [nn.Variable(subimg.shape) for subimg in seq_img]
h0 = nn.Variable((img.shape[0], hidden)) # Initial hidden state.
y, hs = rnn(seq_x, h0, hidden)
loss = logreg_loss(y, t)
# Copied from the above logreg example.
def training_rnn(steps, learning_rate):
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters()) # Set parameter variables to be updated.
for i in range(steps):
minibatch = data.next()
img, t.d = minibatch
seq_img = split_grid4(img)
h0.d = 0 # Initialize as 0
for x, subimg in zip(seq_x, seq_img):
x.d = subimg
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print i, loss.d
training_rnn(1000, 1e1)
0 2.62527275085
100 0.780260562897
200 0.486522495747
300 0.289345681667
400 0.249717146158
500 0.538961410522
600 0.276877015829
700 0.159639537334
800 0.249660402536
900 0.0925596579909
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
plt.figure(figsize=(2, 8))
imshow(x.d[0, 0], 'x')
for hid, h in enumerate(hs):
imshow(scale01(h.d[0]).reshape(1, 8), 'h{}'.format(hid + 1))
imshow(scale01(y.d[0]).reshape(2, 5), 'y')
Siamese Network¶
This example show how to embed an image in a categorical dataset into 2D space using deep learning. This also demonstrates how to reuse a pretrained network.
First, we load parameters learned in the CNN example.
nn.clear_parameters()
# Loading CNN pretrained parameters.
_ = nn.load_parameters(path_cnn_params)
20170626 23:09:57,838 [nnabla][INFO]: Parameter load (<builtin function format>): tmp.params.cnn.h5
We define embedding function. Note that the network structure and parameter hierarchy is identical to the previous CNN example. That enables you to reuse the saved parameters and finetune from it.
def cnn_embed(x, test=False):
# Note: Identical configuration with the CNN example above.
# Parameters pretrained in the above CNN example are used.
with nn.parameter_scope("cnn"):
with nn.parameter_scope("conv1"):
c1 = F.tanh(PF.batch_normalization(PF.convolution(x, 4, (3, 3), pad=(1, 1), stride=(2, 2)), batch_stat=not test))
with nn.parameter_scope("conv2"):
c2 = F.tanh(PF.batch_normalization(PF.convolution(c1, 8, (3, 3), pad=(1, 1)), batch_stat=not test))
c2 = F.average_pooling(c2, (2, 2))
with nn.parameter_scope("fc3"):
fc3 = PF.affine(c2, 32)
# Additional affine for map into 2D.
with nn.parameter_scope("embed2d"):
embed = PF.affine(c2, 2)
return embed, [c1, c2, fc3]
def siamese_loss(e0, e1, t, margin=1.0, eps=1e4):
dist = F.sum(F.squared_error(e0, e1), axis=1) # Squared distance
# Contrastive loss
sim_cost = t * dist
dissim_cost = (1  t) * \
(F.maximum_scalar(margin  (dist + eps) ** (0.5), 0) ** 2)
return F.mean(sim_cost + dissim_cost)
We build two stream CNNs and compare them with the contrastive loss function defined above. Note that both CNNs have the same parameter hierarchy, which means both parameters are shared.
x0 = nn.Variable(img.shape)
x1 = nn.Variable(img.shape)
t = nn.Variable((img.shape[0],)) # Same class or not
e0, hs0 = cnn_embed(x0)
e1, hs1 = cnn_embed(x1) # NOTE: parameters are shared
loss = siamese_loss(e0, e1, t)
def training_siamese(steps):
for i in range(steps):
minibatchs = []
for _ in range(2):
minibatch = data.next()
minibatchs.append((minibatch[0].copy(), minibatch[1].copy()))
x0.d, label0 = minibatchs[0]
x1.d, label1 = minibatchs[1]
t.d = (label0 == label1).astype(np.int).flat
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print i, loss.d
learning_rate = 1e2
solver = S.Sgd(learning_rate)
with nn.parameter_scope("embed2d"):
# Only 2d embedding affine will be updated.
solver.set_parameters(nn.get_parameters())
training_siamese(2000)
# Decay learning rate
solver.set_learning_rate(solver.learning_rate() * 0.1)
training_siamese(2000)
0 0.150528043509
100 0.186870157719
200 0.149316266179
300 0.207163512707
400 0.171384960413
500 0.190256178379
600 0.138507723808
700 0.0918073058128
800 0.159692272544
900 0.0833697617054
1000 0.0839115008712
1100 0.104669973254
1200 0.0776312947273
1300 0.114788673818
1400 0.120309025049
1500 0.107732802629
1600 0.070114441216
1700 0.101728007197
1800 0.114350572228
1900 0.118794307113
0 0.0669310241938
100 0.0553173273802
200 0.0829797014594
300 0.0951051414013
400 0.128303915262
500 0.102963000536
600 0.0910559669137
700 0.0898950695992
800 0.119949311018
900 0.0603067912161
1000 0.105748720467
1100 0.108760476112
1200 0.0820947736502
1300 0.0971114039421
1400 0.0836166366935
1500 0.0899554267526
1600 0.109069615602
1700 0.0921652168036
1800 0.0759357959032
1900 0.100669950247
We visualize embedded training images as following. You see the images from the same class embedded near each other.
all_image = digits.images[:512, None]
all_label = digits.target[:512]
x_all = nn.Variable(all_image.shape)
x_all.d = all_image
with nn.auto_forward():
embed, _ = cnn_embed(x_all, test=True)
plt.figure(figsize=(16, 9))
for i in range(10):
c = plt.cm.Set1(i / 10.) # Maybe it doesn't work in an older version of Matplotlib where color map lies in [0, 256)
plt.plot(embed.d[all_label == i, 0].flatten(), embed.d[
all_label == i, 1].flatten(), '.', c=c)
plt.legend(map(str, range(10)))
plt.grid()
Appendix¶
A. Logistic Regression¶
Here we demonstrate how to train the simplest neural network, logistic regression (single layer perceptron). Logistic regression is a linear classifier \(f : {\cal R}^{D\times 1} \rightarrow {\cal R}^{K\times 1}\)
where \(\mathbf x \in {\cal R}^{D \times 1}\) is an input image flattened to a vector, \(t \in \{0, 1, \cdots, K\}\) is a target label, \(\mathbf W \in {\cal R}^{K \times D}\) is a weight matrix, \(\mathbf b \in {\cal R}^{K \times 1}\) is a bias vector and \(\mathbf \Theta \equiv \left\{\mathbf W, \mathbf b\right\}\). Loss function is defined as
where \(\mathbf X \equiv \left\{\mathbf x_1, t_1, \cdots, \mathbf x_N, t_N\right\}\) denotes a dataset the network trained on, \(\sigma(\mathbf z)\) is softmax operation defined as \(\frac{\exp(\mathbf z)}{\sum_{z \subset \mathbf z} \exp(z)}\), and \(\left[\mathbf z\right]_i\) denotes ith element of \(\mathbf z\).
NNabla Python API Demonstration Tutorial¶
Let us import nnabla first, and some additional useful tools.
# python2/3 compatibility
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
import nnabla as nn # Abbreviate as nn for convenience.
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
20170927 14:00:30,785 [nnabla][INFO]: Initializing CPU extension...
NdArray¶
NdArray is a data container of a multidimensional array. NdArray is
device (e.g. CPU, CUDA) and type (e.g. uint8, float32) agnostic, in
which both type and device are implicitly casted or transferred when it
is used. Below, you create a NdArray with a shape of (2, 3, 4)
.
a = nn.NdArray((2, 3, 4))
You can see the values held inside a
by the following. The values
are not initialized, and are created as float32 by default.
print(a.data)
[[[ 9.42546995e+24 4.56809286e41 8.47690058e38 0.00000000e+00]
[ 7.38056336e+34 7.50334969e+28 1.17078231e32 7.58387310e+31]
[ 7.87001454e12 9.84394250e12 6.85712044e+22 1.81785692e+31]]
[[ 1.84681296e+25 1.84933247e+20 4.85656319e+33 2.06176836e19]
[ 6.80020530e+22 1.69307638e+22 2.11235872e19 1.94316151e19]
[ 1.81805047e+31 3.01289097e+29 2.07004908e19 1.84648795e+25]]]
The accessor .data
returns a reference to the values of NdArray as
numpy.ndarray
. You can modify these by using the Numpy API as
follows.
print('[Substituting random values]')
a.data = np.random.randn(*a.shape)
print(a.data)
print('[Slicing]')
a.data[0, :, ::2] = 0
print(a.data)
[Substituting random values]
[[[ 0.36133638 0.22121875 1.5912329 0.33490974]
[ 1.35962474 0.2165522 0.54483992 0.61813235]
[0.13718799 0.44104072 0.51307833 0.73900551]]
[[0.59464753 2.17738533 0.28626776 0.45654735]
[ 0.73566747 0.87292582 0.41605178 0.04792296]
[0.63856047 0.31966645 0.63974309 0.61385244]]]
[Slicing]
[[[ 0. 0.22121875 0. 0.33490974]
[ 0. 0.2165522 0. 0.61813235]
[ 0. 0.44104072 0. 0.73900551]]
[[0.59464753 2.17738533 0.28626776 0.45654735]
[ 0.73566747 0.87292582 0.41605178 0.04792296]
[0.63856047 0.31966645 0.63974309 0.61385244]]]
Note that the above operation is all done in the host device (CPU).
NdArray provides more efficient functions in case you want to fill all
values with a constant, .zero
and .fill
. They are lazily
evaluated when the data is requested (when neural network computation
requests the data, or when numpy array is requested by Python) The
filling operation is executed within a specific device (e.g. CUDA GPU),
and more efficient if you specify the device setting, which we explain
later.
a.fill(1) # Filling all values with one.
print(a.data)
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
You can create an NdArray instance directly from a Numpy array object.
b = nn.NdArray.from_numpy_array(np.ones(a.shape))
print(b.data)
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
NdArray is used in Variable class, as well as NNabla’s imperative computation of neural networks. We describe them in the later sections.
Variable¶
Variable class is used when you construct a neural network. The neural network can be described as a graph in which an edge represents a function (a.k.a operator and layer) which defines operation of a minimum unit of computation, and a node represents a variable which holds input/output values of a function (Function class is explained later). The graph is called “Computation Graph”.
In NNabla, a Variable, a node of a computation graph, holds two
NdArray
s, one for storing the input or output values of a function
during forward propagation (executing computation graph in the forward
order), while another for storing the backward error signal (gradient)
during backward propagation (executing computation graph in backward
order to propagate error signals down to parameters (weights) of neural
networks). The first one is called data
, the second is grad
in
NNabla.
The following line creates a Variable instance with a shape of (2, 3,
4). It has data
and grad
as NdArray
. The flag need_grad
is used to omit unnecessary gradient computation during backprop if set
to False.
x = nn.Variable([2, 3, 4], need_grad=True)
print('x.data:', x.data)
print('x.grad:', x.grad)
x.data: <NdArray((2, 3, 4)) at 0x7f575caf4ea0>
x.grad: <NdArray((2, 3, 4)) at 0x7f575caf4ea0>
You can get the shape by:
x.shape
(2, 3, 4)
Since both data
and grad
are NdArray
, you can get a
reference to its values as NdArray with the .data
accessor, but also
it can be referred by .d
or .g
property for data
and grad
respectively.
print('x.data')
print(x.d)
x.d = 1.2345 # To avoid NaN
assert np.all(x.d == x.data.data), 'd: {} != {}'.format(x.d, x.data.data)
print('x.grad')
print(x.g)
x.g = 1.2345 # To avoid NaN
assert np.all(x.g == x.grad.data), 'g: {} != {}'.format(x.g, x.grad.data)
# Zeroing grad values
x.grad.zero()
print('x.grad (after `.zero()`)')
print(x.g)
x.data [[[ 9.42553452e+24 4.56809286e41 8.32543479e38 0.00000000e+00] [ nan nan 0.00000000e+00 0.00000000e+00] [ 3.70977305e+25 4.56809286e41 3.78350585e44 0.00000000e+00]] [[ 5.68736600e38 0.00000000e+00 1.86176378e13 4.56809286e41] [ 4.74367616e+25 4.56809286e41 5.43829710e+19 4.56809286e41] [ 0.00000000e+00 0.00000000e+00 2.93623372e38 0.00000000e+00]]] x.grad [[[ 9.42576510e+24 4.56809286e41 9.42576510e+24 4.56809286e41] [ 9.27127763e38 0.00000000e+00 9.27127763e38 0.00000000e+00] [ 1.69275966e+22 4.80112800e+30 1.21230330e+25 7.22962302e+31]] [[ 1.10471027e32 4.63080422e+27 2.44632805e+20 2.87606258e+20] [ 4.46263300e+30 4.62311881e+30 7.65000750e+28 3.01339003e+29] [ 2.08627352e10 1.03961868e+21 7.99576678e+20 1.74441223e+22]]] x.grad (after .zero()) [[[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.]] [[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.]]]
Like NdArray
, a Variable
can also be created from Numpy
array(s).
x2 = nn.Variable.from_numpy_array(np.ones((3,)), need_grad=True)
print(x2)
print(x2.d)
x3 = nn.Variable.from_numpy_array(np.ones((3,)), np.zeros((3,)), need_grad=True)
print(x3)
print(x3.d)
print(x3.g)
<Variable((3,), need_grad=True) at 0x7f572a5242c8>
[ 1. 1. 1.]
<Variable((3,), need_grad=True) at 0x7f572a5244a8>
[ 1. 1. 1.]
[ 0. 0. 0.]
Besides storing values of a computation graph, pointing a parent edge
(function) to trace the computation graph is an important role. Here
x
doesn’t have any connection. Therefore, the .parent
property
returns None.
print(x.parent)
None
Function¶
A function defines a operation block of a computation graph as we
described above. The module nnabla.functions
offers various
functions (e.g. Convolution, Affine and ReLU). You can see the list of
functions available in the API reference
guide.
import nnabla.functions as F
As an example, here you will defines a computation graph that computes the elementwise Sigmoid function outputs for the input variable and sums up all values into a scalar. (This is simple enough to explain how it behaves but a meaningless example in the context of neural network training. We will show you a neural network example later.)
sigmoid_output = F.sigmoid(x)
sum_output = F.reduce_sum(sigmoid_output)
The function API in nnabla.functions
takes one (or several)
Variable(s) and arguments (if any), and returns one (or several) output
Variable(s). The .parent
points to the function instance which
created it. Note that no computation occurs at this time since we just
define the graph. (This is the default behavior of NNabla computation
graph API. You can also fire actual computation during graph definition
which we call “Dynamic mode” (explained later)).
print("sigmoid_output.parent.name:", sigmoid_output.parent.name)
print("x:", x)
print("sigmoid_output.parent.inputs refers to x:", sigmoid_output.parent.inputs)
sigmoid_output.parent.name: Sigmoid
x: <Variable((2, 3, 4), need_grad=True) at 0x7f572a51a778>
sigmoid_output.parent.inputs refers to x: [<Variable((2, 3, 4), need_grad=True) at 0x7f572a273a48>]
print("sum_output.parent.name:", sum_output.parent.name)
print("sigmoid_output:", sigmoid_output)
print("sum_output.parent.inputs refers to sigmoid_output:", sum_output.parent.inputs)
sum_output.parent.name: ReduceSum
sigmoid_output: <Variable((2, 3, 4), need_grad=True) at 0x7f572a524638>
sum_output.parent.inputs refers to sigmoid_output: [<Variable((2, 3, 4), need_grad=True) at 0x7f572a273a48>]
The .forward()
at a leaf Variable executes the forward pass
computation in the computation graph.
sum_output.forward()
print("CG output:", sum_output.d)
print("Reference:", np.sum(1.0 / (1.0 + np.exp(x.d))))
CG output: 18.59052085876465
Reference: 18.5905
The .backward()
does the backward propagation through the graph.
Here we initialize the grad
values as zero before backprop since the
NNabla backprop algorithm always accumulates the gradient in the root
variables.
x.grad.zero()
sum_output.backward()
print("d sum_o / d sigmoid_o:")
print(sigmoid_output.g)
print("d sum_o / d x:")
print(x.g)
d sum_o / d sigmoid_o:
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
d sum_o / d x:
[[[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]]
[[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]]]
NNabla is developed by mainly focused on neural network training and
inference. Neural networks have parameters to be learned associated with
computation blocks such as Convolution, Affine (a.k.a. fully connected,
dense etc.). In NNabla, the learnable parameters are also represented as
Variable
objects. Just like input variables, those parameter
variables are also used by passing into Function
s. For example,
Affine function takes input, weights and biases as inputs.
x = nn.Variable([5, 2]) # Input
w = nn.Variable([2, 3], need_grad=True) # Weights
b = nn.Variable([3], need_grad=True) # Biases
affine_out = F.affine(x, w, b) # Create a graph including only affine
The above example takes an input with B=5 (batchsize) and D=2 (dimensions) and maps it to D’=3 outputs, i.e. (B, D’) output.
You may also notice that here you set need_grad=True
only for
parameter variables (w and b). The x is a nonparameter variable and the
root of computation graph. Therefore, it doesn’t require gradient
computation. In this configuration, the gradient computation for x is
not executed in the first affine, which will omit the computation of
unnecessary backpropagation.
The next block sets data and initializes grad, then applies forward and backward computation.
# Set random input and parameters
x.d = np.random.randn(*x.shape)
w.d = np.random.randn(*w.shape)
b.d = np.random.randn(*b.shape)
# Initialize grad
x.grad.zero() # Just for showing gradients are not computed when need_grad=False (default).
w.grad.zero()
b.grad.zero()
# Forward and backward
affine_out.forward()
affine_out.backward()
# Note: Calling backward at nonscalar Variable propagates 1 as error message from all element of outputs. .
You can see that affine_out holds an output of Affine.
print('F.affine')
print(affine_out.d)
print('Reference')
print(np.dot(x.d, w.d) + b.d)
F.affine
[[0.17701732 2.86095762 0.82298267]
[0.75544345 1.16702223 2.44841242]
[0.36278027 3.4771595 0.75681627]
[ 0.32743117 0.24258983 1.30944324]
[0.87201929 1.94556415 3.23357344]]
Reference
[[0.1770173 2.86095762 0.82298267]
[0.75544345 1.16702223 2.44841242]
[0.3627803 3.4771595 0.75681627]
[ 0.32743117 0.24258983 1.309443 ]
[0.87201929 1.94556415 3.23357344]]
The resulting gradients of weights and biases are as follows.
print("dw")
print(w.g)
print("db")
print(b.g)
dw
[[ 3.10820675 3.10820675 3.10820675]
[ 0.37446201 0.37446201 0.37446201]]
db
[ 5. 5. 5.]
The gradient of x
is not changed because need_grad
is set as
False.
print(x.g)
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
Parametric Function¶
Considering parameters as inputs of Function
enhances expressiveness
and flexibility of computation graphs. However, to define all parameters
for each learnable function is annoying for users to define a neural
network. In NNabla, trainable models are usually created by composing
functions that have optimizable parameters. These functions are called
“Parametric Functions”. The Parametric Function API provides various
parametric functions and an interface for composing trainable models.
To use parametric functions, import:
import nnabla.parametric_functions as PF
The function with optimizable parameter can be created as below.
with nn.parameter_scope("affine1"):
c1 = PF.affine(x, 3)
The first line creates a parameter scope. The second line then
applies PF.affine
 an affine transform  to x
, and creates a
variable c1
holding that result. The parameters are created and
initialized randomly at function call, and registered by a name
“affine1” using parameter_scope
context. The function
nnabla.get_parameters()
allows to get the registered parameters.
nn.get_parameters()
OrderedDict([('affine1/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
('affine1/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822f138>)])
The name=
argument of any PF function creates the equivalent
parameter space to the above definition of PF.affine
transformation
as below. It could save the space of your Python code. The
nnabla.parametric_scope
is more useful when you group multiple
parametric functions such as ConvolutionBatchNormalization found in a
typical unit of CNNs.
c1 = PF.affine(x, 3, name='affine1')
nn.get_parameters()
OrderedDict([('affine1/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
('affine1/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822f138>)])
It is worth noting that the shapes of both outputs and parameter variables (as you can see above) are automatically determined by only providing the output size of affine transformation(in the example above the output size is 3). This helps to create a graph in an easy way.
c1.shape
(5, 3)
Parameter scope can be nested as follows (although a meaningless example).
with nn.parameter_scope('foo'):
h = PF.affine(x, 3)
with nn.parameter_scope('bar'):
h = PF.affine(h, 4)
This creates the following.
nn.get_parameters()
OrderedDict([('affine1/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
('affine1/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822f138>),
('foo/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822fa98>),
('foo/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822fae8>),
('foo/bar/affine/W',
<Variable((3, 4), need_grad=True) at 0x7f572822f728>),
('foo/bar/affine/b',
<Variable((4,), need_grad=True) at 0x7f572822fdb8>)])
Also, get_parameters()
can be used in parameter_scope
. For
example:
with nn.parameter_scope("foo"):
print(nn.get_parameters())
OrderedDict([('affine/W', <Variable((2, 3), need_grad=True) at 0x7f572822fa98>), ('affine/b', <Variable((3,), need_grad=True) at 0x7f572822fae8>), ('bar/affine/W', <Variable((3, 4), need_grad=True) at 0x7f572822f728>), ('bar/affine/b', <Variable((4,), need_grad=True) at 0x7f572822fdb8>)])
nnabla.clear_parameters()
can be used to delete registered
parameters under the scope.
with nn.parameter_scope("foo"):
nn.clear_parameters()
print(nn.get_parameters())
OrderedDict([('affine1/affine/W', <Variable((2, 3), need_grad=True) at 0x7f572822f0e8>), ('affine1/affine/b', <Variable((3,), need_grad=True) at 0x7f572822f138>)])
MLP Example For Explanation¶
The following block creates a computation graph to predict one dimensional output from two dimensional inputs by a 2 layer fully connected neural network (multilayer perceptron).
nn.clear_parameters()
batchsize = 16
x = nn.Variable([batchsize, 2])
with nn.parameter_scope("fc1"):
h = F.tanh(PF.affine(x, 512))
with nn.parameter_scope("fc2"):
y = PF.affine(h, 1)
print("Shapes:", h.shape, y.shape)
Shapes: (16, 512) (16, 1)
This will create the following parameter variables.
nn.get_parameters()
OrderedDict([('fc1/affine/W',
<Variable((2, 512), need_grad=True) at 0x7f572822fef8>),
('fc1/affine/b',
<Variable((512,), need_grad=True) at 0x7f572822f9a8>),
('fc2/affine/W',
<Variable((512, 1), need_grad=True) at 0x7f572822f778>),
('fc2/affine/b',
<Variable((1,), need_grad=True) at 0x7f572822ff98>)])
As described above, you can execute the forward pass by calling forward method at the terminal variable.
x.d = np.random.randn(*x.shape) # Set random input
y.forward()
print(y.d)
[[0.05708594]
[ 0.01661986]
[0.34168088]
[ 0.05822293]
[0.16566885]
[0.04867431]
[ 0.2633169 ]
[ 0.10496549]
[0.01291842]
[0.09726256]
[0.05720493]
[0.09691752]
[0.07822668]
[0.17180404]
[ 0.11970415]
[0.08222144]]
Training a neural networks needs a loss value to be minimized by gradient descent with backprop. In NNabla, loss function is also a just function, and packaged in the functions module.
# Variable for label
label = nn.Variable([batchsize, 1])
# Set loss
loss = F.reduce_mean(F.squared_error(y, label))
# Execute forward pass.
label.d = np.random.randn(*label.shape) # Randomly generate labels
loss.forward()
print(loss.d)
1.9382084608078003
As you’ve seen above, NNabla backward
accumulates the gradients at
the root variables. You have to initialize the grad of the parameter
variables before backprop (We will show you the easiest way with
Solver
API).
# Collect all parameter variables and init grad.
for name, param in nn.get_parameters().items():
param.grad.zero()
# Gradients are accumulated to grad of params.
loss.backward()
Imperative Mode¶
After performing backprop, gradients are held in parameter variable grads. The next block will update the parameters with vanilla gradient descent.
for name, param in nn.get_parameters().items():
param.data = param.grad * 0.001 # 0.001 as learning rate
The above computation is an example of NNabla’s “Imperative Mode” for
executing neural networks. Normally, NNabla functions (instances of
nnabla.functions)
take Variable
s as their input. When at least one NdArray
is
provided as an input for NNabla functions (instead of Variable
s),
the function computation will be fired immediately, and returns an
NdArray
as the output, instead of returning a Variable
. In the
above example, the NNabla functions F.mul_scalar
and F.sub2
are
called by the overridden operators *
and =
, respectively.
In other words, NNabla’s “Imperative mode” doesn’t create a computation graph, and can be used like NumPy. If device acceleration such as CUDA is enabled, it can be used like NumPy empowered with device acceleration. Parametric functions can also be used with NdArray input(s). The following block demonstrates a simple imperative execution example.
# A simple example of imperative mode.
xi = nn.NdArray.from_numpy_array(np.arange(4).reshape(2, 2))
yi = F.relu(xi  1)
print(xi.data)
print(yi.data)
[[0 1]
[2 3]]
[[ 0. 0.]
[ 1. 2.]]
Note that inplace substitution from the rhs to the lhs cannot be done
by the =
operator. For example, when x
is an NdArray
,
writing x = x + 1
will not increment all values of x

instead, the expression on the lhs will create a new NdArray
object that different from the one originally bound by x
, and binds
the new NdArray
object to the Python variable x
on the rhs.
For inplace editing of NdArrays
, the inplace assignment operators
+=
, =
, *=
, and /=
can be used. The copy_from
method
can also be used to copy values of an existing NdArray
to another.
For example, incrementing 1 to x
, an NdArray
, can be done by
x.copy_from(x+1)
. The copy is performed with device acceleration if
a device context is specified by using nnabla.set_default_context
or
nnabla.context_scope
.
# The following doesn't perform substitution but assigns a new NdArray object to `xi`.
# xi = xi + 1
# The following copies the result of `xi + 1` to `xi`.
xi.copy_from(xi + 1)
assert np.all(xi.data == (np.arange(4).reshape(2, 2) + 1))
# Inplace operations like `+=`, `*=` can also be used (more efficient).
xi += 1
assert np.all(xi.data == (np.arange(4).reshape(2, 2) + 2))
Solver¶
NNabla provides stochastic gradient descent algorithms to optimize
parameters listed in the nnabla.solvers
module. The parameter
updates demonstrated above can be replaced with this Solver API, which
is easier and usually faster.
from nnabla import solvers as S
solver = S.Sgd(lr=0.00001)
solver.set_parameters(nn.get_parameters())
# Set random data
x.d = np.random.randn(*x.shape)
label.d = np.random.randn(*label.shape)
# Forward
loss.forward()
Just call the the following solver method to fill zero grad region, then backprop
solver.zero_grad()
loss.backward()
The following block updates parameters with the Vanilla Sgd rule (equivalent to the imperative example above).
solver.update()
Toy Problem To Demonstrate Training¶
The following function defines a regression problem which computes the norm of a vector.
def vector2length(x):
# x : [B, 2] where B is number of samples.
return np.sqrt(np.sum(x ** 2, axis=1, keepdims=True))
We visualize this mapping with the contour plot by matplotlib as follows.
# Data for plotting contour on a grid data.
xs = np.linspace(1, 1, 100)
ys = np.linspace(1, 1, 100)
grid = np.meshgrid(xs, ys)
X = grid[0].flatten()
Y = grid[1].flatten()
def plot_true():
"""Plotting contour of true mapping from a grid data created above."""
plt.contourf(xs, ys, vector2length(np.hstack([X[:, None], Y[:, None]])).reshape(100, 100))
plt.axis('equal')
plt.colorbar()
plot_true()
We define a deep prediction neural network.
def length_mlp(x):
h = x
for i, hnum in enumerate([4, 8, 4, 2]):
h = F.tanh(PF.affine(h, hnum, name="fc{}".format(i)))
y = PF.affine(h, 1, name='fc')
return y
nn.clear_parameters()
batchsize = 100
x = nn.Variable([batchsize, 2])
y = length_mlp(x)
label = nn.Variable([batchsize, 1])
loss = F.reduce_mean(F.squared_error(y, label))
We created a 5 layers deep MLP using forloop. Note that only 3 lines of the code potentially create infinitely deep neural networks. The next block adds helper functions to visualize the learned function.
def predict(inp):
ret = []
for i in range(0, inp.shape[0], x.shape[0]):
xx = inp[i:i + x.shape[0]]
# Imperative execution
xi = nn.NdArray.from_numpy_array(xx)
yi = length_mlp(xi)
ret.append(yi.data.copy())
return np.vstack(ret)
def plot_prediction():
plt.contourf(xs, ys, predict(np.hstack([X[:, None], Y[:, None]])).reshape(100, 100))
plt.colorbar()
plt.axis('equal')
Next we instantiate a solver object as follows. We use Adam optimizer which is one of the most popular SGD algorithm used in the literature.
from nnabla import solvers as S
solver = S.Adam(alpha=0.01)
solver.set_parameters(nn.get_parameters())
The following function generates data from the true system infinitely.
def random_data_provider(n):
x = np.random.uniform(1, 1, size=(n, 2))
y = vector2length(x)
return x, y
In the next block, we run 2000 training steps (SGD updates).
num_iter = 2000
for i in range(num_iter):
# Sample data and set them to input variables of training.
xx, ll = random_data_provider(batchsize)
x.d = xx
label.d = ll
# Forward propagation given inputs.
loss.forward(clear_no_need_grad=True)
# Parameter gradients initialization and gradients computation by backprop.
solver.zero_grad()
loss.backward(clear_buffer=True)
# Apply weight decay and update by Adam rule.
solver.weight_decay(1e6)
solver.update()
# Just print progress.
if i % 100 == 0 or i == num_iter  1:
print("Loss@{:4d}: {}".format(i, loss.d))
Loss@ 0: 0.6976373195648193
Loss@ 100: 0.08075223118066788
Loss@ 200: 0.005213144235312939
Loss@ 300: 0.001955194864422083
Loss@ 400: 0.0011660841992124915
Loss@ 500: 0.0006421314901672304
Loss@ 600: 0.0009330055327154696
Loss@ 700: 0.0008817618945613503
Loss@ 800: 0.0006205961108207703
Loss@ 900: 0.0009072928223758936
Loss@1000: 0.0008160348515957594
Loss@1100: 0.0011569359339773655
Loss@1200: 0.000837412488181144
Loss@1300: 0.0011542742140591145
Loss@1400: 0.0005833200993947685
Loss@1500: 0.0009848927147686481
Loss@1600: 0.0005141657311469316
Loss@1700: 0.0009339841199107468
Loss@1800: 0.000950580753851682
Loss@1900: 0.0005430278833955526
Loss@1999: 0.0007046313839964569
Memory usage optimization: You may notice that, in the above
updates, .forward()
is called with the clear_no_need_grad=
option, and .backward()
is called with the clear_buffer=
option.
Training of neural network in more realistic scenarios usually consumes
huge memory due to the nature of backpropagation algorithm, in which all
of the forward variable buffer data
should be kept in order to
compute the gradient of a function. In a naive implementation, we keep
all the variable data
and grad
living until the NdArray
objects are not referenced (i.e. the graph is deleted). The clear_*
options in .forward()
and .backward()
enables to save memory
consumption due to that by clearing (erasing) memory of data
and
grad
when it is not referenced by any subsequent computation. (More
precisely speaking, it doesn’t free memory actually. We use our memory
pool engine by default to avoid memory alloc/free overhead). The
unreferenced buffers can be reused in subsequent computation. See the
document of Variable
for more details. Note that the following
loss.forward(clear_buffer=True)
clears data
of any intermediate
variables. If you are interested in intermediate variables for some
purposes (e.g. debug, log), you can use the .persistent
flag to
prevent clearing buffer of a specific Variable
like below.
loss.forward(clear_buffer=True)
print("The prediction `y` is cleared because it's an intermediate variable.")
print(y.d.flatten()[:4]) # to save space show only 4 values
y.persistent = True
loss.forward(clear_buffer=True)
print("The prediction `y` is kept by the persistent flag.")
print(y.d.flatten()[:4]) # to save space show only 4 value
The prediction y is cleared because it's an intermediate variable. [ 2.27279830e04 6.02164946e05 5.33679675e04 2.35557582e05] The prediction y is kept by the persistent flag. [ 1.0851264 0.87657517 0.79603785 0.40098712]
We can confirm the prediction performs fairly well by looking at the following visualization of the ground truth and prediction function.
plt.subplot(121)
plt.title("Ground truth")
plot_true()
plt.subplot(122)
plt.title("Prediction")
plot_prediction()
You can save learned parameters by nnabla.save_parameters
and load
by nnabla.load_parameters
.
path_param = "paramvector2length.h5"
nn.save_parameters(path_param)
# Remove all once
nn.clear_parameters()
nn.get_parameters()
20170927 14:00:40,544 [nnabla][INFO]: Parameter save (.h5): paramvector2length.h5
OrderedDict()
# Load again
nn.load_parameters(path_param)
print('\n'.join(map(str, nn.get_parameters().items())))
20170927 14:00:40,564 [nnabla][INFO]: Parameter load (<builtin function format>): paramvector2length.h5
('fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f576328df48>)
('fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57245f2868>)
('fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f576328def8>)
('fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5727ee5c78>)
('fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297318>)
('fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5727d29908>)
('fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f57632973b8>)
('fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f57632974a8>)
('fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f57632974f8>)
('fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297598>)
Both save and load functions can also be used in a parameter scope.
with nn.parameter_scope('foo'):
nn.load_parameters(path_param)
print('\n'.join(map(str, nn.get_parameters().items())))
20170927 14:00:40,714 [nnabla][INFO]: Parameter load (<builtin function format>): paramvector2length.h5
('fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f576328df48>)
('fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57245f2868>)
('fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f576328def8>)
('fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5727ee5c78>)
('fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297318>)
('fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5727d29908>)
('fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f57632973b8>)
('fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f57632974a8>)
('fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f57632974f8>)
('fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297598>)
('foo/fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f5763297958>)
('foo/fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57632978b8>)
('foo/fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f572a51ac78>)
('foo/fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5763297c78>)
('foo/fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297a98>)
('foo/fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5763297d68>)
('foo/fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f5763297e08>)
('foo/fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f5763297ea8>)
('foo/fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f5763297f48>)
('foo/fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297cc8>)
!rm {path_param} # Clean ups
Static vs Dynamic Neural Networks in NNabla¶
NNabla allows you to define static and dynamic neural networks. Static neural networks have a fixed layer architecture, i.e., a static computation graph. In contrast, dynamic neural networks use a dynamic computation graph, e.g., randomly dropping layers for each minibatch.
This tutorial compares both computation graphs.
%matplotlib inline
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
import numpy as np
np.random.seed(0)
GPU = 0 # ID of GPU that we will use
20170626 23:10:05,832 [nnabla][INFO]: Initializing CPU extension...
Dataset loading¶
We will first setup the digits dataset from scikitlearn:
from tiny_digits import *
digits = load_digits()
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
20170626 23:10:06,042 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:06,043 [nnabla][INFO]: Using DataSourceWithMemoryCache
20170626 23:10:06,044 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:06,044 [nnabla][INFO]: Onmemory
20170626 23:10:06,045 [nnabla][INFO]: Using DataIterator
Each sample in this dataset is a grayscale image of size 8x8 and belongs
to one of the ten classes 0
, 1
, …, 9
.
img, label = data.next()
print img.shape, label.shape
(16, 1, 8, 8) (16, 1)
Network definition¶
As an example, we define a (unnecessarily) deep CNN:
def cnn(x):
"""Unnecessarily Deep CNN.
Args:
x : Variable, shape (B, 1, 8, 8)
Returns:
y : Variable, shape (B, 10)
"""
with nn.parameter_scope("cnn"): # Parameter scope can be nested
with nn.parameter_scope("conv1"):
h = F.tanh(PF.batch_normalization(
PF.convolution(x, 64, (3, 3), pad=(1, 1))))
for i in range(10): # unnecessarily deep
with nn.parameter_scope("conv{}".format(i + 2)):
h = F.tanh(PF.batch_normalization(
PF.convolution(h, 128, (3, 3), pad=(1, 1))))
with nn.parameter_scope("conv_last"):
h = F.tanh(PF.batch_normalization(
PF.convolution(h, 512, (3, 3), pad=(1, 1))))
h = F.average_pooling(h, (2, 2))
with nn.parameter_scope("fc"):
h = F.tanh(PF.affine(h, 1024))
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y
Static computation graph¶
First, we will look at the case of a static computation graph where the neural network does not change during training.
from nnabla.ext_utils import get_extension_context
# setup cuda extension
ctx_cuda = get_extension_context('cudnn', device_id=GPU) # replace 'cudnn' by 'cpu' if you want to run the example on the CPU
nn.set_default_context(ctx_cuda)
# create variables for network input and label
x = nn.Variable(img.shape)
t = nn.Variable(label.shape)
# create network
static_y = cnn(x)
static_y.persistent = True
# define loss function for training
static_l = F.mean(F.softmax_cross_entropy(static_y, t))
20170626 23:10:06,350 [nnabla][INFO]: Initializing CUDA extension...
20170626 23:10:06,571 [nnabla][INFO]: Initializing cuDNN extension...
Setup solver for training
solver = S.Adam(alpha=1e3)
solver.set_parameters(nn.get_parameters())
Create data iterator
loss = []
def epoch_end_callback(epoch):
global loss
print "[", epoch, np.mean(loss), itr, "]",
loss = []
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
20170626 23:10:07,221 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:07,224 [nnabla][INFO]: Using DataSourceWithMemoryCache
20170626 23:10:07,226 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:07,228 [nnabla][INFO]: Onmemory
20170626 23:10:07,230 [nnabla][INFO]: Using DataIterator
Perform training iterations and output training loss:
%%time
for epoch in range(30):
itr = 0
while data.epoch == epoch:
x.d, t.d = data.next()
static_l.forward(clear_no_need_grad=True)
solver.zero_grad()
static_l.backward(clear_buffer=True)
solver.update()
loss.append(static_l.d.copy())
itr += 1
print ''
[ 0 0.909297 112 ] [ 1 0.183863 111 ] [ 2 0.0723054 111 ] [ 3 0.0653021 112 ] [ 4 0.0628503 111 ] [ 5 0.0731626 111 ] [ 6 0.0319093 112 ] [ 7 0.0610926 111 ] [ 8 0.0817437 111 ] [ 9 0.0717577 112 ] [ 10 0.0241882 111 ] [ 11 0.0119452 111 ] [ 12 0.00664761 112 ] [ 13 0.00377711 111 ] [ 14 0.000605656 111 ] [ 15 0.000236613 111 ] [ 16 0.000174549 112 ] [ 17 0.000142428 111 ] [ 18 0.000126015 111 ] [ 19 0.000111144 112 ] [ 20 0.000100751 111 ] [ 21 9.03808e05 111 ] [ 22 8.35904e05 112 ] [ 23 7.73492e05 111 ] [ 24 6.91389e05 111 ] [ 25 6.74929e05 112 ] [ 26 6.08386e05 111 ] [ 27 5.62182e05 111 ] [ 28 5.33428e05 112 ] [ 29 4.94594e05 111 ]
CPU times: user 14.3 s, sys: 6.78 s, total: 21.1 s
Wall time: 21.1 s
Dynamic computation graph¶
Now, we will use a dynamic computation graph, where the neural network
is setup each time we want to do a forward/backward pass through it.
This allows us to, e.g., randomly dropout layers or to have network
architectures that depend on input data. In this example, we will use
for simplicity the same neural network structure and only dynamically
create it. For example, adding a
if np.random.rand() > dropout_probability:
into cnn()
allows to
dropout layers.
First, we setup the solver and the data iterator for the training:
nn.clear_parameters()
solver = S.Adam(alpha=1e3)
solver.set_parameters(nn.get_parameters())
loss = []
def epoch_end_callback(epoch):
global loss
print "[", epoch, np.mean(loss), itr, "]",
loss = []
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
20170626 23:10:28,449 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:28,450 [nnabla][INFO]: Using DataSourceWithMemoryCache
20170626 23:10:28,450 [nnabla][INFO]: DataSource with shuffle(True)
20170626 23:10:28,451 [nnabla][INFO]: Onmemory
20170626 23:10:28,451 [nnabla][INFO]: Using DataIterator
%%time
for epoch in range(30):
itr = 0
while data.epoch == epoch:
x.d, t.d = data.next()
with nn.auto_forward():
dynamic_y = cnn(x)
dynamic_l = F.mean(F.softmax_cross_entropy(dynamic_y, t))
solver.set_parameters(nn.get_parameters(), reset=False, retain_state=True) # this can be done dynamically
solver.zero_grad()
dynamic_l.backward(clear_buffer=True)
solver.update()
loss.append(dynamic_l.d.copy())
itr += 1
print ''
[ 0 1.04669 112 ] [ 1 0.151949 111 ] [ 2 0.093581 111 ] [ 3 0.129242 112 ] [ 4 0.0452591 111 ] [ 5 0.0343987 111 ] [ 6 0.0315372 112 ] [ 7 0.0336886 111 ] [ 8 0.0194571 111 ] [ 9 0.00923094 112 ] [ 10 0.00536065 111 ] [ 11 0.000669383 111 ] [ 12 0.000294232 112 ] [ 13 0.000245866 111 ] [ 14 0.000201116 111 ] [ 15 0.000164177 111 ] [ 16 0.00014832 112 ] [ 17 0.000131479 111 ] [ 18 0.000115171 111 ] [ 19 0.000101432 112 ] [ 20 9.06228e05 111 ] [ 21 8.7103e05 111 ] [ 22 7.79601e05 112 ] [ 23 7.59678e05 111 ] [ 24 6.64341e05 111 ] [ 25 6.22717e05 112 ] [ 26 5.8643e05 111 ] [ 27 5.35373e05 111 ] [ 28 4.96717e05 112 ] [ 29 4.65124e05 111 ]
CPU times: user 23.4 s, sys: 5.35 s, total: 28.7 s
Wall time: 28.7 s
Comparing the two processing times, we can observe that both schemes (“static” and “dynamic”) takes the same execution time, i.e., although we created the computation graph dynamically, we did not lose performance.
Mixed Precision Training¶
Introduction¶
Traditionally, for training a neural network, we used to use FP32
for weights and activations; however computation costs for trainig a
neural network rapidly increase over years as the success of deep
learning and the growing size of a neural nework. It indiates that we
need to spend much more time for training a huge size of a neural
network while we would like to do lots of trials before a product
launch. To address this problem, companys (e.g., NVIDIA) introduced an
accelarator for speeding up computation. For example, NVIDIA Volta has
Tensor
Cores
to speed up computation.
However, it uses FP16
weights, activations, gradients, and the range
of FP16
is very limited when compared to that of FP32
, meaning
that sometimes (or often) values of gradients overflow and/or underflow,
which affects the performance of a neural network or makes it collapse
during training.
Mixed precision training is one of the algorithms to circumvent that
problem while maintaining the same results that we could obtain with
FP32
networks. It is welldescribed in The Training with Mixed
Precision User
Guide
and Mixed Precision Training.
This tutorial explains how to do the mixed precision training in NNabla stepbystep.
StepbyStep Instruction¶
Basically, the mixed precision training are composed of three parts.
 Use the accelarator for computation (here we assume Tensor Cores)
 Use loss scaling to prevent underflow
 Use dynamic loss caling to prevent overflow/underflow
In NNabla, we can do the correspondinces as follows.
1. Use Tensor Cores¶
ctx = get_extension_context("cudnn", type_config="half")
2. Use loss scaling to prevent underflow¶
loss_scale = 8
loss.backward(loss_scale)
solver.scale_grad(1. / loss_scale) # do some graident clipping, etc. after this
solver.update()
3. Use dynamic loss scaling to prevent overflow/underflow¶
loss_scale = 8
scaling_factor = 2
counter = 0
interval = 2000
...
loss.backward(loss_scale, ...)
...
if solver.check_inf_or_nan_grad():
loss_scale /= scaling_factor
counter = 0
else:
solver.scale_grad(1. / loss_scale) # do some graident clipping, etc. after this
solver.update()
if counter > interval:
loss_scale *= scaling_factor
counter = 0
counter += 1
Note that currently the procedures of 2nd (Use loss scaling to prevent underflow) and 3rd (Use loss scaling to prevent overflow) are exprimental, and we are now trying to speed up the mixed precision training, so API might change for future use, especially 3rd.
Allinone Instruction¶
In the previous stepbystep example, the 3rd step is lengthy in a training loop, thus we can write a wrapper class like the following.
class DynamicLossScalingUpdater(object):
'''Dynamic Loss Scaling Updater for the mixed precision training.
Args:
solver (:obj:`nnabla.solvers.Solver`): Solver object. E.g., Momentum or Adam.
loss (:obj:`nnabla.Variable`): Loss variable from which the forward and the backward is called.
data_feeder (callable :obj:`object`, function, or lambda): Data feeder
scale (:obj:`float`): Loss scale constant. This is dynamically changing during training.
scaling_factor (:obj:`float`): Scaling factor for the dynamic loss scaling.
N (:obj:`int`): Interval, the number of iterations in training for increasing `loss scale` by `scaling_factor`.
clear_buffer (:obj:`bool`): Clears the no longer referenced variables during backpropagation to save memory.
accum_grad (:obj:`int`): Number of accumulation of gradients. Update method of the `solver` is called after the `accum_grad` number of the forward and backward is called.
weight_decay (:obj:`float`): Decay constant. Default is `None`, not applying the weight decay.
comm (:obj:`nnabla.communicators.Communicator`): Communicator when to do distributed training. Defalt is :obj:`None`.
grads (:obj:`list` of :obj:`nnabla._nd_array.NdArray`): The list of gradients to be exchanged when to do distributed training. Defalt is the empty :obj:`list`.
Attributes:
solver (:obj:`nnabla.solvers.Solver`): Solver object. E.g., Momentum or Adam.
loss (:obj:`nnabla.Variable`): Loss variable from which the forward and the backward is called.
data_feeder (callable :obj:`object`, function, lambda): Data feeder
scale (:obj:`float`): Loss scale constant. This is dynamically changing during training.
scaling_factor (:obj:`float`): Scaling factor for the dynamic loss scaling.
N (:obj:`int`): Interval, the number of iterations in training for increasing `loss scale` by `scaling_factor`.
clear_buffer (:obj:`bool`): Clears the no longer referenced variables during backpropagation to save memory.
accum_grad (:obj:`int`): Number of accumulation of gradients. Update method of the `solver` is called after the `accum_grad` number of the forward and backward is called.
weight_decay (:obj:`float`): Decay constant. Default is `None`, not applying the weight decay.
comm (:obj:`nnabla.communicators.Communicator`): Communicator when to do distributed training.
grads (:obj:`list` of :obj:`nnabla._nd_array.NdArray`): The list of gradients to be exchanged when to do distributed training.
Example:
.. codeblock:: python
solver = <Solver>
loss = <Loss Variable of Network>
data_feeder = <DataFeeder>
updater = DynamicLossScalingUpdater(solver, loss, data_feeder)
# Training iteration
for itr in range(max_iter):
# Call solver.zero_grad, data_feeder, loss.forward, loss.backward
# and solver.update with the dynamic loss scaling.
updater.update()
Reference:
https://docs.nvidia.com/deeplearning/sdk/mixedprecisiontraining/index.html#scalefactor
'''
def __init__(self, solver, loss, data_feeder=lambda x: x,
scale=8.0, scaling_factor=2.0, N=2000, clear_buffer=True,
accum_grad=1, weight_decay=None,
comm=None,
grads=[]):
self.solver = solver
self.loss = loss
self.data_feeder = data_feeder
self.scale = scale
self.scaling_factor = scaling_factor
self.N = N
self.clear_buffer = clear_buffer
self.accum_grad = accum_grad
self.weight_decay = weight_decay
self.comm = comm
self.grads = grads
self._counter = 0
self._recursive_count = 0
self._max_recursive_count = 100
def update(self):
"""Monolithic update method.
This method calls the following methods with the dynamic loss scaling.
1. solver.zerograd
2. feed data
3. loss.forward
4. loss.backward
5. comm.all_reduce (if it is specified)
6. solver.update
"""
# Initialize gradients.
self.solver.zero_grad()
# Forward and backward
for _ in range(self.accum_grad):
# feed data
self.data_feeder()
# forward
self.loss.forward(clear_no_need_grad=self.clear_buffer)
# backward with scale
self.loss.backward(self.scale, clear_buffer=self.clear_buffer)
# AllReduce
if self.comm and len(self.grads) != 0:
self.comm.all_reduce(self.grads, division=False, inplace=False)
# Check Inf/NaN in grads
if self.solver.check_inf_or_nan_grad():
self.scale /= self.scaling_factor
self._counter = 0
# Recursively call udpate function until no inf nor nan.
self._recursive_count += 1
if self._recursive_count > self._max_recursive_count:
self._recursive_count = 0
return # skip
return self.update()
self._recursive_count = 0
# Rescale grads
self.solver.scale_grad(1. / self.scale)
# Do some graident clipping, etc.
if self.weight_decay is not None:
self.solver.weight_decay(self.weight_decay)
# Update
self.solver.update()
if self._counter > self.N:
self.scale *= self.scaling_factor
self._counter = 0
self._counter += 1
Then, call the update method in a training loop:
from nnabla.experimental.mixed_precision_training import DynamicLossScalingUpdater
solver = <Solver>
loss = <Loss Variable of Network>
data_feeder = <DataFeeder>
updater = DynamicLossScalingUpdater(solver, loss, data_feeder)
# Training iteration
for itr in range(max_iter):
# Call solver.zero_grad, data_feeder, loss.forward, loss.backward
# and solver.update with the dynamic loss scaling.
updater.update()
Notice¶
In the mixedprecision training, the followings are premise:
 Solver contains
FP16
weights and theFP32
copy of weights. Solvers in NNabla holdFP32
weights and weight gradients and cast it toFP16
weights in forward pass and toFP16
weight gradients in backward pass if one setstype_config="half"
.  Reductions should be left in
FP32
, for examples, the statistics (mean and variance) computed by the batchnormalization, Mean, Sum, SoftMax, SoftMaxCrossEntropy, etc. (see The Training with Mixed Precision User Guide). In NNabla, these functions are automatically fallbacked to useFP32
.
Data Parallel Distributed Training¶
DataParallelCommunicator enables to train your neural network using multiple devices. It is normally used for gradients exchange in data parallel distributed training. Basically, there are two types of distributed trainings in Neural Network literature: Data Parallel and Model Parallel. Here we only focus on the former, Data Parallel Training. Data Parallel Distributed Training is based on the very simple equation used for the optimization of a neural network called (MiniBatch) Stochastic Gradient Descent.
In the optimization process, the objective one tries to minimize is
where \(f\) is a neural network, \(B \times N\) is the batch size, \(\ell\) is a loss function for each data point \(\mathbf{x} \in X\), and \(\mathbf{w}\) is the trainable parameter of the neural network.
When taking the derivative of this objective, one gets,
Since the derivative has linearity, one can change the objective to the sum of summations each of which is the sum of derivatives over \(B\) data points.
In data parallel distributed training, the following steps are performed according to the above equation,
 each term, summation of derivatives (gradients) divided by batch size \(B\), is computed on a separated device (typically GPU),
 take the sum over devices,
 divide the result by the number of devices, \(N\).
That is the underlying foundation of Data Parallel Distributed Training.
This tutorial shows the usage of Multi Process Data Parallel Communicator for data parallel distributed training with a very simple example.
NOTE¶
This tutorial depends on IPython Cluster, thus when you want to run the following excerpts of the scripts on Jupyter Notebook, follow this to enable mpiexec/mpirun mode, then launch a corresponding Ipython Cluster on Ipython Clusters tab.
Launch client¶
This code is only needed for this tutorial via Jupyter Notebook.
import ipyparallel as ipp
rc = ipp.Client(profile='mpi')
Prepare the dependencies¶
%%px
import os
import time
import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context
import nnabla.functions as F
from nnabla.initializer import (
calc_uniform_lim_glorot,
UniformInitializer)
import nnabla.parametric_functions as PF
import nnabla.solvers as S
import numpy as np
Define the communicator for gradients exchange.¶
%%px
extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessDataParalellCommunicator(ctx)
comm.init()
n_devices = comm.size
mpi_rank = comm.rank
device_id = mpi_rank
ctx = get_extension_context(extension_module, device_id=device_id)
Check different ranks are assigned to different devices
%%px
print("n_devices={}".format(n_devices))
print("mpi_rank={}".format(mpi_rank))
[stdout:0]
n_devices=2
mpi_rank=1
[stdout:1]
n_devices=2
mpi_rank=0
Create data points and a very simple neural network¶
%%px
# Data points setting
n_class = 2
b, c, h, w = 4, 1, 32, 32
# Data points
x_data = np.random.rand(b, c, h, w)
y_data = np.random.choice(n_class, b).reshape((b, 1))
x = nn.Variable(x_data.shape)
y = nn.Variable(y_data.shape)
x.d = x_data
y.d = y_data
# Network setting
C = 1
kernel = (3, 3)
pad = (1, 1)
stride = (1, 1)
%%px
rng = np.random.RandomState(0)
w_init = UniformInitializer(
calc_uniform_lim_glorot(C, C/2, kernel=(1, 1)),
rng=rng)
%%px
# Network
with nn.context_scope(ctx):
h = PF.convolution(x, C, kernel, pad, stride, w_init=w_init)
pred = PF.affine(h, n_class, w_init=w_init)
loss = F.mean(F.softmax_cross_entropy(pred, y))
Important notice here is that w_init
is passed to parametric
functions to let the network on each GPU start from the same values of
trainable parameters in the optimization process.
Create a solver.¶
%%px
# Solver and add parameters
solver = S.Adam()
solver.set_parameters(nn.get_parameters())
Training¶
Recall the basic usage of nnabla
API for training a neural network,
it is
 loss.forward()
 solver.zero_grad()
 loss.backward()
 solver.update()
In use of C.MultiProcessDataParalellCommunicator
, these steps are
performed in different GPUs, and the only difference from these
steps is comm.all_reduce()
. Thus, in case of
C.MultiProcessDataParalellCommunicator
training steps are as
follows,
 loss.forward()
 solver.zero_grad()
 loss.backward()
 comm.all_reduce([x.grad for x in nn.get_parameters().values()])
 solver.update()
First, forward, zero_grad, and backward,
%%px
# Training steps
loss.forward()
solver.zero_grad()
loss.backward()
Check gradients of weights once,
%%px
for n, v in nn.get_parameters().items():
print(n, v.g)
[stdout:0]
('conv/W', array([[[[ 5.0180483, 0.457942 , 2.8701296],
[ 2.0715926, 3.0698593, 1.6650047],
[2.5591214, 6.4248834, 9.881935 ]]]], dtype=float32))
('conv/b', array([8.658947], dtype=float32))
('affine/W', array([[0.93160367, 0.9316036 ],
[1.376812 , 1.376812 ],
[1.8957546 , 1.8957543 ],
...,
[0.33000934, 0.33000934],
[0.7211893 , 0.72118926],
[0.25237036, 0.25237036]], dtype=float32))
('affine/b', array([0.48865744, 0.48865741], dtype=float32))
[stdout:1]
('conv/W', array([[[[ 1.2505884 , 0.87151337, 8.685524 ],
[ 10.738419 , 14.676786 , 7.483423 ],
[ 5.612471 , 12.880402 , 19.141157 ]]]], dtype=float32))
('conv/b', array([13.196114], dtype=float32))
('affine/W', array([[1.6865108 , 1.6865108 ],
[0.938529 , 0.938529 ],
[1.028422 , 1.028422 ],
...,
[0.98217344, 0.98217344],
[0.97528917, 0.97528917],
[0.413546 , 0.413546 ]], dtype=float32))
('affine/b', array([0.7447065, 0.7447065], dtype=float32))
You can see the different values on each device, then call
all_reduce
,
%%px
comm.all_reduce([x.grad for x in nn.get_parameters().values()], division=True)
Commonly, all_reduce
only means the sum; however,
comm.all_reduce
addresses both cases: summation and summation
division.
Again, check gradients of weights,
%%px
for n, v in nn.get_parameters().items():
print(n, v.g)
[stdout:0]
('conv/W', array([[[[ 1.8837299 , 0.20678568, 5.777827 ],
[ 6.4050055 , 8.8733225 , 2.9092093 ],
[ 1.5266749 , 3.2277591 , 14.511546 ]]]], dtype=float32))
('conv/b', array([21.85506], dtype=float32))
('affine/W', array([[2.6181145, 2.6181145],
[2.315341 , 2.315341 ],
[2.9241767, 2.9241762],
...,
[1.3121828, 1.3121828],
[1.6964785, 1.6964784],
[0.6659163, 0.6659163]], dtype=float32))
('affine/b', array([1.233364 , 1.2333639], dtype=float32))
[stdout:1]
('conv/W', array([[[[ 1.8837299 , 0.20678568, 5.777827 ],
[ 6.4050055 , 8.8733225 , 2.9092093 ],
[ 1.5266749 , 3.2277591 , 14.511546 ]]]], dtype=float32))
('conv/b', array([21.85506], dtype=float32))
('affine/W', array([[2.6181145, 2.6181145],
[2.315341 , 2.315341 ],
[2.9241767, 2.9241762],
...,
[1.3121828, 1.3121828],
[1.6964785, 1.6964784],
[0.6659163, 0.6659163]], dtype=float32))
('affine/b', array([1.233364 , 1.2333639], dtype=float32))
You can see the same values over the devices because of all_reduce
.
Update weights,
%%px
solver.update()
This concludes the usage of C.MultiProcessDataParalellCommunicator
for Data Parallel Distributed Training.
Now you should have an understanding of how to use
C.MultiProcessDataParalellCommunicator
, go to the cifar10 example,
 multi_device_multi_process_classification.sh
 multi_device_multi_process_classification.py
for more details.
Python Command Line Interface¶
Nnabla has commandline interface utility whitch can do train, forward(inference), convert param and dataset, measure performance, file format converter and so on.
usage: nnabla_cli [h]
{train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,upload,create_tar,function_info,dump,nnb_template,convert,version}
...
Command line interface for NNabla(Version 1.0.0rc2, Build 180626044347)
positional arguments:
{train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,upload,create_tar,function_info,dump,nnb_template,convert,version}
train Training with NNP.
infer Do inference with NNP and binary data file input.
forward Do evaluation with NNP and test dataset.
encode_param Encode plain text to parameter format.
decode_param Decode parameter to plain text.
profile Profiling performance with NNP.
conv_dataset Convert CSV dataset to cache.
compare_with_cpu Compare performance between two nntxt.
create_image_classification_dataset
Create dataset from image files.
upload Upload dataset to Neural Network Console.
create_tar Create tar file for Neural Network COnsole.
function_info Output function info.
dump Dump network with supported format.
nnb_template Generate NNB config file template.
convert File format converter.
version Print version and build number.
optional arguments:
h, help show this help message and exit
Work with NNP¶
Training¶
usage: nnabla_cli train [h] c CONFIG [p PARAM] o OUTDIR
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
p PARAM, param PARAM
path to parameter file
o OUTDIR, outdir OUTDIR
output directory
Profile¶
usage: nnabla_cli profile [h] c CONFIG o OUTDIR
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
o OUTDIR, outdir OUTDIR
output directory
Forward¶
usage: nnabla_cli forward [h] c CONFIG [p PARAM] [d DATASET] o OUTDIR [b BATCH_SIZE]
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
p PARAM, param PARAM
path to parameter file
d DATASET, dataset DATASET
path to CSV dataset
o OUTDIR, outdir OUTDIR
output directory
b BATCH_SIZE, batch_size BATCH_SIZE
Batch size to use batch size in nnp file set 1.
Inference¶
usage: nnabla_cli infer [h] c CONFIG [o OUTPUT] [p PARAM] [b BATCH_SIZE] inputs [inputs ...]
positional arguments:
inputs
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
o OUTPUT, output OUTPUT
output file prefix
p PARAM, param PARAM
path to parameter file
b BATCH_SIZE, batch_size BATCH_SIZE
Batch size to use batch size in nnp file set 1.
Compare with CPU¶
usage: nnabla_cli compare_with_cpu [h] c CONFIG c2 CONFIG2 o OUTDIR
optional arguments:
h, help show this help message and exit
c CONFIG, config CONFIG
path to nntxt
c2 CONFIG2, config2 CONFIG2
path to cpu nntxt
o OUTDIR, outdir OUTDIR
output directory
Dataset manipulation¶
Encode parameter¶
usage: nnabla_cli encode_param [h] i INDIR [p PARAM]
optional arguments:
h, help show this help message and exit
i INDIR, indir INDIR
input directory
p PARAM, param PARAM
path to parameter file
Decode parameter¶
usage: nnabla_cli decode_param [h] [p PARAM] o OUTDIR
optional arguments:
h, help show this help message and exit
p PARAM, param PARAM
path to parameter file
o OUTDIR, outdir OUTDIR
output directory
Convert dataset¶
usage: nnabla_cli conv_dataset [h] [F] [S] [N] source destination
positional arguments:
source
destination
optional arguments:
h, help show this help message and exit
F, force force overwrite destination
S, shuffle shuffle data
N, normalize normalize data range
Create image classification dataset¶
usage: nnabla_cli create_image_classification_dataset [h] i SOURCEDIR o OUTDIR c CHANNEL w WIDTH g HEIGHT m MODE s SHUFFLE f1 FILE1 [r1 RATIO1] [f2 FILE2]
[r2 RATIO2]
optional arguments:
h, help show this help message and exit
i SOURCEDIR, sourcedir SOURCEDIR
source directory with directories for each class
o OUTDIR, outdir OUTDIR
output directory
c CHANNEL, channel CHANNEL
number of output color channels
w WIDTH, width WIDTH
width of output image
g HEIGHT, height HEIGHT
height of output image
m MODE, mode MODE shaping mode (trimming or padding)
s SHUFFLE, shuffle SHUFFLE
shuffle mode (true or false)
f1 FILE1, file1 FILE1
output file name 1
r1 RATIO1, ratio1 RATIO1
output file ratio(%) 1
f2 FILE2, file2 FILE2
output file name 2
r2 RATIO2, ratio2 RATIO2
output file ratio(%) 2
Upload dataset to Neural Network Console¶
usage: nnabla_cli upload [h] [e ENDPOINT] token filename
positional arguments:
token token for upload
filename filename to upload
optional arguments:
h, help show this help message and exit
e ENDPOINT, endpoint ENDPOINT
set endpoint uri
Create dataset archive for Neural Network Console¶
usage: nnabla_cli create_tar [h] source destination
positional arguments:
source CSV dataset
destination TAR filename
optional arguments:
h, help show this help message and exit
File format converter¶
For detailed infomation please see File format converter.
Dump content of supported format¶
usage: nnabla_cli dump [h] [I IMPORT_FORMAT] [nnpnoexpandnetwork]
FILE [FILE ...]
positional arguments:
FILE File or directory name(s) to convert.
optional arguments:
h, help show this help message and exit
I IMPORT_FORMAT, importformat IMPORT_FORMAT
[import] import format. (one of [NNP,ONNX])
nnpnoexpandnetwork
[import][NNP] expand network with repeat or recurrent.
Generate NNB config file template¶
usage: nnabla_cli nnb_template [h] [I IMPORT_FORMAT]
[nnpnoexpandnetwork] [b BATCH_SIZE]
[T DEFAULT_VARIABLE_TYPE]
FILE [FILE ...]
positional arguments:
FILE File or directory name(s) to convert.
optional arguments:
h, help show this help message and exit
I IMPORT_FORMAT, importformat IMPORT_FORMAT
[import] import format. (one of [NNP,ONNX])
nnpnoexpandnetwork
[import][NNP] expand network with repeat or recurrent.
b BATCH_SIZE, batchsize BATCH_SIZE
[export] overwrite batch size.
T DEFAULT_VARIABLE_TYPE, defaultvariabletype DEFAULT_VARIABLE_TYPE
Default type of variable
File format converter¶
usage: nnabla_cli convert [h] [I IMPORT_FORMAT] [nnpnoexpandnetwork]
[O EXPORT_FORMAT] [f] [b BATCH_SIZE]
[nnpparameterh5] [nnpparameternntxt]
[nnpexcludeparameter] [T DEFAULT_VARIABLE_TYPE]
[s SETTINGS]
FILE [FILE ...]
positional arguments:
FILE File or directory name(s) to convert.
optional arguments:
h, help show this help message and exit
I IMPORT_FORMAT, importformat IMPORT_FORMAT
[import] import format. (one of [NNP,ONNX])
nnpnoexpandnetwork
[import][NNP] expand network with repeat or recurrent.
O EXPORT_FORMAT, exportformat EXPORT_FORMAT
[export] export format. (one of [NNP,NNB,CSRC,ONNX])
f, force [export] overwrite output file.
b BATCH_SIZE, batchsize BATCH_SIZE
[export] overwrite batch size.
nnpparameterh5 [export][NNP] store parameter with h5 format
nnpparameternntxt
[export][NNP] store parameter into nntxt
nnpexcludeparameter
[export][NNP] output without parameter
T DEFAULT_VARIABLE_TYPE, defaultvariabletype DEFAULT_VARIABLE_TYPE
Default type of variable
s SETTINGS, settings SETTINGS
Settings in YAML format file.
Development¶
Generate function infomation¶
usage: nnabla_cli function_info [h] [dest]
positional arguments:
dest destination filename
optional arguments:
h, help show this help message and exit
Display version¶
usage: nnabla_cli version [h]
optional arguments:
h, help show this help message and exit
Python API Examples¶
There are a bunch of examples provided in NNabla repository. Please follow [this link](https://github.com/sony/nnabla/tree/master/examples) to see examples.
Python API Reference¶
Common¶
Logger¶
Wrapper module for logging.
You can use the logger as follows:
from utils.logger import logger
logger.debug('Log message(DEBUG)')
logger.info('Log message(INFO)')
logger.error('Log message(ERROR)')
logger.critical('Log message(CRITICAL)')
With the default settings, it should yield the following output:
$ python scripts/logger_test.py
[nnabla][ERROR]: logger_test.py : <module> : 5 : Log message(ERROR)
[nnabla][CRITICAL]: logger_test.py : <module> : 6 : Log message(CRITICAL)
$ cat /tmp/nbla.log
20170119 14:41:35,132 [nnabla][DEBUG]: scripts/logger_test.py : <module> : 3 : Log message(DEBUG)
20170119 14:41:35,132 [nnabla][INFO]: scripts/logger_test.py : <module> : 4 : Log message(INFO)
20170119 14:41:35,132 [nnabla][ERROR]: scripts/logger_test.py : <module> : 5 : Log message(ERROR)
20170119 14:41:35,132 [nnabla][CRITICAL]: scripts/logger_test.py : <module> : 6 : Log message(CRITICAL)

nnabla.logger.
logger
¶
Autoforward mode¶
NNabla provides the dynamic computation graph feature, which enables automatic forward propagation during graph construction. This can be enabled using the set_auto_forward()
function. Backpropagation shall be manually executed on the dynamically constructed graph.

nnabla.
auto_forward
(*args, **kwds)[source]¶ Context for dynamic graph execution mode.
Parameters: auto (bool) – Whether forward computation is executed during a computation graph construction. Returns: bool

nnabla.
set_auto_forward
(auto)[source]¶ Set the default mode for automatic forward propagation.
When it is set to True , forward propagation is invoked immediately when the computation graph is updted.
Parameters: auto (bool) – Whether forward computation is executed when the computation graph is updated. Returns: bool
Context¶

class
nnabla.
Context
(backend=None, array_class='', device_id='0')¶ Context is used to specify the computation engine (cpu, cuda, cudnn etc.) which the function operator modules and optimizer modules shall be ran on. The context can be set for each function, as well as set globally with functions listed in the
contextspecifier()
.Parameters:
Context Specifier API¶

nnabla.
context_scope
(*args, **kwds)[source]¶ Context as Python context.
import nnabla as nn import nnabla.functions as F x = nn.Variable([2, 3 ,4]) ctx = nnabla_ext.cuda.context('0') with context_scope(ctx): # Inside with scope, the specified context is used. with parameter_scope('w1'): l1 = F.relu(F.affine(x, 64)) with parameter_scope('w2'): l2 = F.relu(F.affine(x, 64))

nnabla.
set_default_context
(ctx)[source]¶ Set the default context.
Note
It cannot be called inside any context_scope.
Parameters: ctx (Context) – A Context.

nnabla.
get_current_context
()[source]¶ Get the current context.
It can be set using
nnabla.context_scope()
ornnabla.set_default_context()
.Returns: a current context. Return type: Context
NdArray¶

class
nnabla._nd_array.
NdArray
(shape=<???>)¶ nnabla._nd_array.NdArray
is a deviceagnostic data container for multidimensional arrays (tensors).nnabla._nd_array.NdArray
can also implictly handle data transfers across different devices (e.g. CPU to CUDA GPU, CUDA GPU to CPU). See Python API Tutorial for more details.NdArray
overrides some arithmetic operators (+
,
,*
,/
,**
). Operands can be either a scalar number,NdArray
orVariable
. An arithmetic operation containingNdArray
returnsNdArray
which stores the output of the computation immediately invoked. Also, inplace arithmetic operations (+=
,=
,*=
,/=
,**=
) are implemented. Note that=
doesn’t perform inplace substitution but just replaces the object reference. Instead, you can usecopy_from()
for inplace substitution.Parameters: shape (tuple or int) – Shape of tuple. 
cast
(self, dtype, ctx=None)¶ Inplace cast of data type of the NdArray. It returns the reference values as a numpy.ndarray only if optional parameter ctx is not given, None otherwise.
Parameters:  dtype (
numpy.dtype
) – Numpy Data type.  ctx (
nnabla.Context
, optional) – Context descriptor.
Returns: numpy.array
ifctx
is None, otherwise nothing. dtype (

copy_from
(self, NdArray arr)¶ Copy values from another NdArray object.
It returns the caller object itself.
nnabla.functions.identity()
is called internally to copy values.Parameters: arr (NdArray) – Values will be copied to the caller object. The shape of arr`
must be same as the caller object.Returns: nnabla.NdArray

data
¶ Returns the values held by this array as a
numpy.ndarray
. Note that only the references are returned, and the values are not copied. Therefore, modifying the returnednnabla._nd_array.NdArray
will affect the data contained inside the NNabla array. This method can also be called as a setter. Note that this may implicitly invoke a data transfer from device arrays to the CPU.Parameters: value ( numpy.ndarray
) –Returns:
numpy.ndarray

dtype
¶ Get dtype.
Returns:
numpy.dtype

fill
(self, value)¶ Fill all of the elements with the provided scalar value.
Note: This method is lazily evaluated. It is evaluated during the forward or backward propagation.
Parameters: value (int, float) – The value filled with.

static
from_numpy_array
(nparr)¶ Create a NdArray object from Numpy array data.
The data is initialized with the given Numpy array.
Parameters: nparr (ndarray) – Numpy multidimensional array. Returns: ~nnabla._nd_array.NdArray

ndim
¶ Number of dimensions.
Returns: int

shape
¶ Shape of the Nd array.
Returns: tuple of int

size
¶ Total size of the Nd array.
Retuns: int

size_from_axis
(self, axis=1)¶ Gets the size followed by the provided axis.
Example
a = nnabla.NdArray([10,9]) a.size_from_axis() # ==> 90 a.size_from_axis(0) # ==> 90 a.size_from_axis(1) # ==> 9 a.size_from_axis(2) # ==> 1
Parameters: axis ( int
, optional) – 1 as defaultReturns: int

strides
¶ Strides.
Returns: tuple of int

zero
(self)¶ Fill all of the elements with 0.
Note: This method is lazily evaluated. It is evaluated during the forward or backward propagation.

Variable¶

class
nnabla.
Variable
¶ Bases:
object
nnabla.Variable
is used to construct computation graphs (neural networks) together with functions in List of Functions and List of Parametric Functions . It also provides a method to execute forward and backward propagation of the network. Thennabla.Variable
class holds: Reference to the parent function in a computation graph. This provides traceability of all connections in the computation graph.
 Both data and error
signal (gradient) containers as
nnabla._nd_array.NdArray
s.  Some additional information of the computation graph.
Variable
overrides some arithmetic operators (+
,
,*
,/
,**
). Operands can be either a scalar number,NdArray
orVariable
. IfNdArray
is given as either of left or right operand, the arithmetic operation returns anNdArray
which stores the output of the computation immediately invoked. Otherwise, it returnsVariable
holds the graph connection. The computation is invoked immediately when :function:`nnabla.auto_forward` or :function:`nnabla.set_auto_forward(True)` is used.See also
Parameters:  shape (Iterable of int) – Shape of variable.
 need_grad (bool) – Flag for backprop or not.

backward
(self, grad=1, bool clear_buffer=False, communicator_callbacks=None)¶ Performs a backward propagation starting from this variable until the root variable(s) is/are reached in the function graph. The propagation will stop at a variable with need_grad=False.
Parameters:  grad (scalar,
numpy.ndarray
, ornnabla._nd_array.NdArray
) – The gradient signal value(s) of this variable. The default value 1 is used in an usual neural network training. This option is useful if you have a gradient computation module outside NNabla, and want to use it as a gradient signal of the neural network built in NNabla. Note that this doesn’t modifies the grad values of this variable.  clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory.
 communicator_callbacks (
nnabla.CommunicatorBackwardCallback
or list ofnnabla.CommunicatorBackwardCallback
) – The callback functions invoked when 1) backward computation of each function is finished and 2) all backward computation is finished.
 grad (scalar,

d
¶ Returns the values held by this variable, as a
numpy.ndarray
. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affet the data of the NNabla array. This method can be called as a setter to set the value held by this variable.Parameters: value ( numpy.ndarray
) (optional) –Returns: numpy.ndarray

data
¶ Returns the data held by this variable, as a
NdArray
. This can also be used as a setter.Parameters: ndarray (NdArray) – NdArray object. Size must be the same as this Variable. Returns: NdArray

forward
(self, bool clear_buffer=False, bool clear_no_need_grad=False)¶ Performs a forward propagation from the root node to this variable. The forward propagation is performed on a subset of variables determined by the dependency of this variable. The subset is recursively constructed by tracking variables that the variables in the subset depend on, starting from this variable, until it reaches the root variable(s) in the function graph.
Parameters:  clear_buffer (bool) – Clear the no longer referenced variables during forward propagation to save memory. This is usually set as True in an inference or a validation phase. Default is False.
 clear_no_need_grad (bool) – Clear the unreferenced variables with need_grad=False during forward propagation. True is usually used when calling this during training. This is ignored when clear_buffer=True.

static
from_numpy_array
(data, grad=None, need_grad=None)¶ Create a Variable object from Numpy array(s).
The
data
is initialized with the given Numpy array, as well asgrad
if given.The shape is also determined by the given array.
Parameters: Returns: ~nnabla.Variable

g
¶ Returns the gradient values held by this variable, as a
numpy.ndarray
. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affet the data of the NNabla array. This method can be called as a setter to set the gradient held by this variable.Parameters: value ( numpy.ndarray
) –Returns: numpy.ndarray

grad
¶ Returns the gradient held by this variable, as a
NdArray
. This can also be used as a setter.Parameters: ndarray (NdArray) – NdArray object. Size must be the same as this Variable. Returns: NdArray

info
¶ info – object
Information of the variable.

ndim
¶ Gets the number of dimensions of this variable.
Returns: int

need_grad
¶ Gets or sets a boolean indicating whether backpropagation is performed at this variable.
Parameters: b (bool) – Whether backpropagation is performed at this variable. Returns: Whether this variable requires gradient or not. Return type: bool

parent
¶ Returns the parent function of this variable. This method can also be called as a setter.
Parameters: func ( nnabla.function.Function
) –Returns: nnabla.function.Function

persistent
¶ Returns the persistent flag of this variable. If True, the variable is not cleared even if clear options in
nnabla._variable.Variable.forward()
andnnabla._variable.Variable.backward()
are enabled. This is useful when you debug the variable values, or log them. This method can also be called as a setter.Parameters: b (bool) – Returns: bool

reset_shape
(self, shape, force=False)¶ Resizes the shape of the variable to a specified shape.
Parameters:  shape (Iterable of int) – Target shape.
 force (bool) – Flag to force reshape.
Note
This method destructively changes the shape of the target variable. For safety,
reshape()
should be used instead.Returns: None

reshape
(self, shape, unlink=False)¶ Returns a new variable, where this variable is reshaped to a specified shape.
Parameters:  shape (Iterable of int) – Target shape.
 unlink (bool) – Unlink graph connection. Or, keep graph connection, i.e. the gradient will be backproped to the original variable.
Returns:

rewire_on
(self, var)¶ Rewire a successor graph of this variable on top of
var
.Parameters: var ( nnabla.Variable
) – The array elements and the parent function ofvar
is copied to`self
as references. Note that the parent function ofvar
is removed.Example
# A. Create a graph A. xa = nn.Variable((2, 8), need_grad=True) ya = F.tanh(PF.affine(xa, 10, name='a')) # B. Create a graph B. xb = nn.Variable((2, 16), need_grad=True) yb = F.tanh(PF.affine( F.tanh(PF.affine(xb, 8, name='b1')), 8, name='b2')) # C. Rewire the graph A on top of B such that # `xb>B>(yb>)xa>A>ya`. Note `yb` is gone. xa.rewire_on(yb) # D. Execute the rewired graph. xb.d = 1 ya.forward() ya.backward()

size_from_axis
(self, axis=1)¶ Gets the size followed by the provided axis.
Example
a = nnabla.Variable([10,9]) a.size_from_axis() # ==> 90 a.size_from_axis(0) # ==> 90 a.size_from_axis(1) # ==> 9 a.size_from_axis(2) # ==> 1
Parameters: axis ( int
, optional) – 1 as defaultReturns: int

unlinked
(self, need_grad=None)¶ Gets an unlinked (forgetting parent) variable that shares a Variable buffer instance.
Parameters: need_grad (bool, optional) – By default, the unlinked variable will have the same need_grad flag with this variable instance. By specifying a boolean value, the new need_grad flags will be set to the unlinked variable. Returns: nnabla._variable.Variable
Example
import numpy as np import nnabla as nn import nnabla.parametric_functions as PF x = nn.Variable.from_numpy_array(np.array([[1, 2], [3, 4]])) y = PF.affine(x, 4, name="y") z = y.unlinked() print(y.parent) # Affine print(z.parent) # z is unlinked from the parent x but shares the buffers of y. # None

visit
(self, f)¶ Visit functions recursively in forward order.
Parameters: f (function) – Function object which takes nnabla._function.Function
object as an argument.Returns: None

visit_check
(self, f)¶ Visit functions recursively in forward order.
Note
If any of evaluation of the function object returns True, the visit propagation will stop immediately, and will return True.
Parameters: f (function) – Function object which takes nnabla._function.Function
object as an argument. Returns: bool
 Returns True if any of the function object call returns True.
Functions¶
All NNabla functions are derived from the nnabla.function.Function
class.
Function¶

class
nnabla.function.
Function
¶ Function interface class.
Instances of
nnabla.function.Function
are not directly created by users. It is indirectly created by the functions available innnabla.functions
. These functions returnnnabla.Variable
(s) holding the created function instance as the parent property.
backward
(self, inputs, outputs, accum=None)¶

forward
(self, inputs, outputs)¶

grad_depends_output_data
(self, int i, int o)¶

info
¶ info – object

inplace_data
(self, int i)¶

inplace_data_with
(self, int i)¶

inplace_grad
(self, int i)¶

inplace_grad_with
(self, int i)¶

min_outputs
(self)¶

setup
(self, inputs, outputs)¶

List of Functions¶
The nnabla.functions
module provides various types of functions listed below.
These functions takes input nnabla.Variable
(s) as its leading argument(s), followed by options
specific to each function.
 Note:
 The functions can also take
NdArray
(s) as output(s) holding output values of the operation. We call this “Imperative Mode” (NdArray + Functions).
Neural Network Layers¶

nnabla.functions.
affine
(x, weight, bias=None, base_axis=1, n_outputs=1, outputs=None)[source]¶ Affine layer, also called as the fully connected layer. It calculates:
\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}\) is the input and \({\mathbf y}\) is the output.
Parameters:  x (Variable) – Input ND array with shape (\(M_0 \times ... \times M_{B1} \times D_B \times ... \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 weight (Variable) – Weight matrix with shape (\((D_B \times ... \times D_N) \times L\)) [parameter]
 bias (Variable) – Bias vector (\(L\)) [optional][parameter]
 base_axis (int) – Base axis of Affine operation. Dimensions up to base_axis is treated as sample dimension. [default=``1``]
Returns: \((B + 1)\)D array. (\(M_0 \times ... \times M_{B1} \times L\))
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
convolution
(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, n_outputs=1, outputs=None)[source]¶ ND Convolution with bias.
See references for dilated convolution (a.k.a. atrous convolution).
References
 Chen et al., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.
 Yu et al., MultiScale Context Aggregation by Dilated Convolutions.
Parameters:  x (Variable) – \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
 weight (Variable) – \((2 + N)\)D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]
 bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
 base_axis (int) – base axis \(B\). [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=``1``]
Returns: \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
depthwise_convolution
(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, multiplier=1, n_outputs=1, outputs=None)[source]¶ ND Depthwise Convolution with bias.
References
Parameters:  x (Variable) – \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
 weight (Variable) – \((1 + N)\)D array (\(C \times K_1 \times ... \times K_N\)). [parameter]
 bias (Variable) – Bias vector (\(C\)). [optional][parameter]
 base_axis (int) – base axis \(B\). [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  multiplier (int) – Number of output feature maps per input feature map. [default=``1``]
Returns: \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L'_1 \times ... \times L'_N\)).
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
deconvolution
(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, n_outputs=1, outputs=None)[source]¶ ND deconvolution, also known as transposed convolution, with bias operates backward convolution (derivative of the output w.r.t. the input) plus channelwise learned bias.
The weights are specified in the same manner as
convolution()
, as if it was an ordinary convolution function. The forward operation ofdeconvolution()
will then be operationally equivalent to the backward pass ofconvolution()
. Therefore, the number of input channels (can be seen as output channels of forward convolution) is specified in the first dimension, and the number of the output channels divided by the number of groups is specified in the second dimension.Parameters:  x (Variable) – \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
 weight (Variable) – \((2 + N)\)D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]
 bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
 base_axis (int) – base axis \(B\). [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=``1``]
Returns: \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
depthwise_deconvolution
(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, divisor=1, n_outputs=1, outputs=None)[source]¶ Depthwise deconvolution computes the transposed depthwise convolution with bias for onedimensional and twodimensional input data.
Parameters:  x (Variable) – \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
 weight (Variable) – \((1 + N)\)D array (\(C \times K_1 \times ... \times K_N\)). [parameter]
 bias (Variable) – Bias vector (\(C\)). [optional][parameter]
 base_axis (int) – base axis \(B\). [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  divisor (int) – Number of input feature maps per output feature map. [default=``1``]
Returns: \((B + 1 + N)\)D array (\(M_1 \times ... \times M_B \times C \times L'_1 \times ... \times L'_N\)).
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
max_pooling
(x, kernel, stride=None, ignore_border=True, pad=None, n_outputs=1, outputs=None)[source]¶ Max pooling. It pools the maximum values inside the scanning kernel:
\[y_{i_1, i_2} = \max_{k_1, k_2 \in K} (x_{i_1 + k_1, i_2 + k_2})\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
Parameters:  x (Variable) – Input variable.
 kernel (
tuple
ofint
) – Kernel sizes for each spatial axis.  stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=``kernel``]  ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=``True``]
 pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=``(0,) * len(kernel)``]
Returns: Maximum values variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
average_pooling
(x, kernel, stride=None, ignore_border=True, pad=None, including_pad=True, n_outputs=1, outputs=None)[source]¶ Average pooling. It pools the averaged values inside the scanning kernel:
\[y_{i_1, i_2} = \frac{1}{K_1 K_2} \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
Parameters:  x (Variable) – Input variable.
 kernel (
tuple
ofint
) – Kernel sizes for each spatial axis.  stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=``kernel``]  ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=``True``]
 pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=``(0,) * len(kernel)``]  including_pad (bool) – If true, border padding values are considered for the output. [default=``True``]
Returns: Average values variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
global_average_pooling
(x, n_outputs=1, outputs=None)[source]¶ Warning
This function is experimental suppport, so please do not actively use it.
Global average pooling. It pools an averaged value from the whole image
Parameters: x (Variable) – Input variable. Returns: Average values variable Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sum_pooling
(x, kernel, stride=None, ignore_border=True, pad=None, n_outputs=1, outputs=None)[source]¶ Sum pooling. It pools the summed values inside the scanning kernel:
\[y_{i_1, i_2} = \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
Parameters:  x (Variable) – Input variable.
 kernel (
tuple
ofint
) – Kernel sizes for each spatial axis.  stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=``kernel``]  ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=``True``]
 pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=``(0,) * len(kernel)``]
Returns: Summed values variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
unpooling
(x, kernel, n_outputs=1, outputs=None)[source]¶ Inverse operation of pooling. It spreads the input values:
\[y_{k_1 i_1 + j_1, k_2 i_2 + j_2} = x_{i_1, i_2}\]where \(_{i_1, i_2}\) is the input and \(y_{k_1 i_1 + j_1, k_2 i_2 + j_2}\) is the output.
Parameters: Returns: Spread values variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
embed
(x0, w, n_outputs=1, outputs=None)[source]¶ Embed slices of a matrix/tensor with indexing array/tensor.
Parameters: Returns: Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Neural Network Activation¶

nnabla.functions.
sigmoid
(x, n_outputs=1, outputs=None)[source]¶ Elementwise sigmoid function.
\[f(x) = \frac{1}{1 + \exp(x)},\]Parameters: x (Variable) – Input Returns: Output Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
swish
(x, n_outputs=1, outputs=None)[source]¶ Elementwise swish function, by Ramachandran et al. (2017).
\[y_i = \frac{x_i}{1 + \exp(x_i)},\]References
Parameters: x (Variable) – Input Returns: Output Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
tanh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic tangent (tanh) function.
\[y_i = \tanh (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
relu
(x, inplace=False, n_outputs=1, outputs=None)[source]¶ Elementwise Rectified Linear Unit (ReLU) function.
\[y_i = \max (0, x_i)\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
softmax
(x, axis=None, n_outputs=1, outputs=None)[source]¶ Softmax normalization. Calculates
\[y_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]along the dimension specified by axis, where \(y_i\) is the input and \(x_i\) is the output.
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
elu
(x, alpha=1.0, n_outputs=1, outputs=None)[source]¶ Elementwise Exponential Linear Unit (ELU) function.
\[\begin{split}y_i= \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i)  1) & (x \leq 0) \end{array} \right..\end{split}\]References
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
selu
(x, scale=1.05070098735548, alpha=1.673263242354377, n_outputs=1, outputs=None)[source]¶ Elementwise Scaled Exponential Linear Unit (SELU) function by Klambauer et al. (2017).
\[\begin{split}y_i= \lambda \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i)  1) & (x \leq 0) \end{array} \right..\end{split}\]The coefficients \(\lambda\) and \(\alpha\) default to the following values \(\lambda_{01}\) and \(\alpha_{01}\), respectively, provided by Klambauer et al. (2017):
\[\begin{split}\begin{array}{lll} \lambda_{01} &=& \left( 1  \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right) \sqrt{e} \right) \sqrt{2 \pi} \\ && \left( 2 \operatorname{erfc} \left( \sqrt{2} \right) e^2 + \pi \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right)^2 e \right. \\ && \left.  2(2 + \pi) \operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \sqrt{e} + \pi + 2 \right)^{1/2} \\ &\approx& 1.0507 \\ \alpha_{01} &=&  \frac {\sqrt {\frac {2}{\pi}}} {\operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \exp \left(\frac {1} {2} \right)  1} \\ &\approx& 1.67326 \end{array}\end{split}\]References
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
crelu
(x, axis=1, n_outputs=1, outputs=None)[source]¶ Elementwise Concatenated Rectified Linear Unit (CReLU) function. This function calculates the ReLU of \(x\) and \(x\) , then concatenates the results together at a specified axis, and returns the resulting array.
References
Parameters: Returns: ND array where axis dimension is doubled by concatenating.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
celu
(x, alpha=1.0, axis=1, n_outputs=1, outputs=None)[source]¶ Elementwise Concatenated Exponential Linear Unit (CELU) function. Concatenates ELU outputs of positive and negative inputs together at specified axis.
Parameters: Returns: ND array where axis dimension is doubled by concatenating.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
prelu
(x0, x1, base_axis=1, n_outputs=1, outputs=None)[source]¶ Elementwise Parametrized Rectified Linear Unit function. Calculates:
\[y_i = \max(0, x_i) + w_i \min(0, x_i)\]where negative slope \(w\) is learned and can vary across channels (an axis specified with base_axis).
Parameters: Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
leaky_relu
(x, alpha=0.1, n_outputs=1, outputs=None)[source]¶ Elementwise Leaky Rectified Linear Unit (ReLU) function.
It is defined as:
\[y_i = \alpha * \min(0, x_i) + \max (0, x_i)\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Normalization¶

nnabla.functions.
batch_normalization
(x, beta, gamma, mean, variance, axes=[1], decay_rate=0.9, eps=1e05, batch_stat=True, output_stat=False, n_outputs=None)[source]¶ Batch normalization.
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i  \mu\right)^2 \\ \hat{x}_i &=& \frac{x_i  \mu}{\sqrt{\sigma^2 + \epsilon}} \\ y_i &=& \hat{x}_i \gamma + \beta. \end{eqnarray}\end{split}\]At testing time, the mean and variance values used are those that were computed during training by moving average.
References
Parameters:  x (Variable) – ND array of input.
 beta (Variable) – ND array of beta which is learned.
 gamma (Variable) – ND array of gamma which is learned.
 mean (Variable) – ND array of running mean (modified during forward execution).
 variance (Variable) – ND array of running variance (modified during forward execution).
 axes (repeated int64) – Axes mean and variance are taken.
 decay_rate (float) – Decay rate of running mean and variance.
 eps (float) – Tiny value to avoid zero division by std.
 batch_stat (bool) – Use minibatch statistics rather than running ones.
 output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.
Returns: Retruns batch normalization output as
Variable
. Ifoutput_stat=True
, it also returns the mean and variance of the minibatchSee also
nnabla.function_bases.batch_normalization
.

nnabla.functions.
mean_subtraction
(x, mean, t, base_axis=1, update_running_mean=True)[source]¶ It subtracts the mean of the elements of the input array, and normalizes it to \(0\). Preprocessing arrays with this function has the effect of improving accuracy in various tasks such as image classification.
At training time, this function is defined as
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ y_i &=& x_i  \mu \end{eqnarray}\end{split}\]At testing time, the mean values used are those that were computed during training by moving average.
Note
The backward performs an approximated differentiation that takes into account only the latest minibatch.
Parameters:  x (Variable) – ND array of input.
 mean (Variable) – ND array of running mean (modified during forward execution).
 t (Variable) – Scalar of num of iteration of running mean (modified during forward execution).
 base_axis (int) – Base axis of Mean Subtraction operation. Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 update_running_mean (bool) – Update running mean during forward execution. [default=``True``]
Returns: ND array.
Return type: See also
nnabla.function_bases.mean_subtraction
.

nnabla.functions.
clip_by_value
(x, min, max)[source]¶ Clip inputs by values.
\[\begin{split}y = \begin{cases} max & (x > max) \\ x & (otherwise) \\ min & (x < min) \end{cases}.\end{split}\]Parameters: Returns: ND array.
Return type:

nnabla.functions.
clip_grad_by_value
(x, min, max, n_outputs=1, outputs=None)[source]¶ In forward pass, the function behaves as the identity.
In backward pass,
\[\begin{split}g_x = \begin{cases} max & (g_y > max) \\ g_y & (otherwise) \\ min & (g_y < min) \end{cases}.\end{split}\]A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to clip gradient values for each feature map,
x = nn.Variable([16, 3, 32, 32]) min = F.broadcast(nn.Variable.from_numpy_array(np.asarray([1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32)) max = F.broadcast(nn.Variable.from_numpy_array(np.asarray([1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32)) c = F.clip_grad_by_value(x, min=min, max=max) h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
Parameters:  x (Variable) – ND array of input.
 min (Variable) – ND array of minimum input value by which the gradients of the y are clipped. Note that the shape of min must be the same as x’s and the backward to min is not performed.
 max (Variable) – ND array of maximum input value by which the gradients of the y are clipped. Note that the shape of max must be the same as x’s and the backward to max is not performed.
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
clip_by_norm
(x, clip_norm, axis=None)[source]¶ ClipByNorm
\[y = N \times \frac{x}{\x\_2}.\]where \(x\) the input, \(y\) is the output, and \(N\) is clip_norm where the norm of \(x\) becomes. this is the case that axes is not set. When axes is set, the norm is computed over axes.
Parameters: Returns: ND array.
Return type:

nnabla.functions.
clip_grad_by_norm
(x, clip_norm=None, axes=None, n_outputs=1, outputs=None)[source]¶ In the forward pass, the function behaves like the identity.
In the backward pass,
\[g_x = N \times \frac{g_y}{\g_y\_2}.\]where \(g_x\) is the gradient w.r.t the input, \(g_y\) is the gradient w.r.t. the output, and \(N\) is clip_norm where the norm of \(g_y\) becomes. this is the case that axes is not set. When axes is set, the norm is computed over axes.
A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to normalize gradient values over feature axis,
x = nn.Variable([16, 3, 32, 32]) c = F.clip_grad_by_norm(x, axes=(1, )) h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
Parameters:  x (Variable) – ND array of input.
 clip_norm (float) – Clip to the norm of input to clip_norm in the backward pass. [default=``1.0``]
 axes (repeated int64) – Axes to be reduced. If empty list is given, all dimensions are reduced to scalar. This is used in the forward pass. [default=``range(x.ndim)``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Reduction¶

nnabla.functions.
sum
(x, axis=None, keepdims=False)[source]¶ Reduction along axes with sum operation.
Parameters: Returns: ND array.
Return type:

nnabla.functions.
mean
(x, axis=None, keepdims=False)[source]¶ Reduction along axes with mean operation.
Parameters: Returns: ND array.
Return type:

nnabla.functions.
max
(x, axis=None, keepdims=False)[source]¶ Reduction along axes with max operation.
Parameters: Returns: ND array.
Return type:

nnabla.functions.
min
(x, axis=None, keepdims=False)[source]¶ Reduction along axes with min operation.
Parameters: Returns: ND array.
Return type:

nnabla.functions.
prod
(x, axis=None, keepdims=False)[source]¶ Reduction along axes with product operation.
Parameters: Returns: ND array.
Return type: Note
Backward computation is not accurate in a zero value input.

nnabla.functions.
reduce_sum
(x, n_outputs=1, outputs=None)[source]¶ Reduction along an axis with sum operation.
Note
This is deprecated. Use
sum
instead.Parameters: x (Variable) – ND array. Returns: ND array Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
reduce_mean
(x, n_outputs=1, outputs=None)[source]¶ Reduction by mean along an axis.
Note
This is deprecated. Use
mean
instead.Parameters: x (Variable) – ND array Returns: ND array Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Arithmetic¶

nnabla.functions.
add2
(x0, x1, inplace=False, n_outputs=1, outputs=None)[source]¶ Elementwise addition.
\[y_i = x^{(0)}_i + x^{(1)}_i\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sub2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise subtraction.
\[y_i = x^{(0)}_i  x^{(1)}_i\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
mul2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise multiplication.
\[y_i = x^{(0)}_i x^{(1)}_i\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
div2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise division.
\[y_i = \frac{x^{(0)}_i} {x^{(1)}_i}\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
pow2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise power function.
\[y_i = {(x^{(0)}_i)} ^ {x^{(1)}_i}\]Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
add_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar addition.
\[y_i = x_i + v\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
mul_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar multiplication.
\[y_i = v x_i\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
pow_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar power function.
\[y_i = (x_i) ^ v\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
r_sub_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar subtraction.
\[y_i = v  x_i\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
r_div_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar division.
\[y_i = \frac{v}{x_i}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
r_pow_scalar
(x, val=1, n_outputs=1, outputs=None)[source]¶ Elementwise scalar power function.
\[y_i = v ^ {x_i}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Logical¶

nnabla.functions.
equal
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise ‘equal’
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = x^{(1)}_i) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
equal_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise ‘equal’ with a scalar
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = v) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
greater
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i > x^{(1)}_i) \\ 0 & (x^{(0)}_i \leq x^{(1)}_i) \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
greater_equal
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \geq x^{(1)}_i) \\ 0 & (x^{(0)}_i < x^{(1)}_i) \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
greater_equal_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \geq v \\ 0 & (x^{(0)}_i < v \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
greater_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i > v \\ 0 & (x^{(0)}_i \leq v \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
less
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i < x^{(1)}_i) \\ 0 & (x^{(0)}_i \geq x^{(1)}_i) \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
less_equal
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \leq x^{(1)}_i) \\ 0 & (x^{(0)}_i > x^{(1)}_i) \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
less_equal_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \leq v) \\ 0 & (x^{(0)}_i > v) \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
less_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i < v) \\ 0 & (x^{(0)}_i \geq v) \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_and
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise logical AND.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_and_scalar
(x0, val, n_outputs=1, outputs=None)[source]¶ Elementwise logical AND with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_not
(x0, n_outputs=1, outputs=None)[source]¶ Elementwise logical NOT operation
\[\begin{split}f(x_i) = \begin{cases} 1 & (x_i = 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: x0 (Variable) – Input variable Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_or
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise logical OR.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_or_scalar
(x0, val, n_outputs=1, outputs=None)[source]¶ Elementwise logical OR with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = 0 \;\&\; v = 0) \\ 1 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_xor
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise logical XOR.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
logical_xor_scalar
(x0, val, n_outputs=1, outputs=None)[source]¶ Elementwise logical XOR with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = 0 \;\&\; v = 0) \\ 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
not_equal
(x0, x1, n_outputs=1, outputs=None)[source]¶ Element wise ‘not equal’
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = x^{(1)}_i) \\ 1 & otherwise \end{cases}.\end{split}\]Parameters: Returns: No Description
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
not_equal_scalar
(x0, val=1, n_outputs=1, outputs=None)[source]¶ Element wise ‘not equal’ with a scalar
\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = v) \\ 1 & otherwise \end{cases}.\end{split}\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sign
(x, alpha=0.0, n_outputs=1, outputs=None)[source]¶ Elementwise sign function.
In the forward pass, it is defined as
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ 1 & (x < 0) \\ \alpha & (x = 0) \end{cases}.\end{split}\]In the backward pass, it is defined as
\[\frac{\partial f(x)}{\partial x} = 1,\]or in other words, it behaves as the identity function for the gradient in the backward pass.
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
minimum2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise minimum.
\[y_i = \min(x^{(0)}_i, x^{(1)}_i)\]Parameters: Returns: ND array of min value
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
maximum2
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise maximum.
\[y_i = \max(x^{(0)}_i, x^{(1)}_i)\]Parameters: Returns: ND array of max value
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
minimum_scalar
(x, val=1.0, n_outputs=1, outputs=None)[source]¶ Elementwise scalar minimum.
\[y_i = \min(x_i, v)\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
maximum_scalar
(x, val=1.0, n_outputs=1, outputs=None)[source]¶ Elementwise scalar maximum.
\[y_i = \max (x_i, v)\]Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Math¶

nnabla.functions.
constant
(val=0, shape=[], n_outputs=1, outputs=None)[source]¶ Generate a constantvalued array.
Parameters: Returns: ND array where all values are the specified constant.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
abs
(x, n_outputs=1, outputs=None)[source]¶ Elementwise absolute value function.
\[y_i = x_i\]Parameters: x (Variable) – Input variable Returns: Elementwise absolute variable Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
exp
(x, n_outputs=1, outputs=None)[source]¶ Elementwise natural exponential function.
\[y_i = \exp(x_i).\]Parameters: x (Variable) – Input variable Returns: Elementwise exp variable Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
log
(x, n_outputs=1, outputs=None)[source]¶ Elementwise natural logarithm function.
\[y_i = \ln(x_i).\]Parameters: x (Variable) – Input variable Returns: Elementwise log variable Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
round
(x, n_outputs=1, outputs=None)[source]¶ Elementwise round function.
In the forward pass, this function simply computes round to the nearest integer value.
\[y_i = round(x_i).\]In the backward pass, the simple StraightThrough Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]Parameters: x (Variable) – Input variable Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
ceil
(x, n_outputs=1, outputs=None)[source]¶ Elementwise ceil function.
In the forward pass, this function simply returns the smallest integer which is not less than the input.
\[y_i = ceil(x_i).\]In the backward pass, the simple StraightThrough Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]Parameters: x (Variable) – Input variable Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
floor
(x, n_outputs=1, outputs=None)[source]¶ Elementwise floor function.
In the forward pass, this function simply returns the largest integer which is not greater than the input.
\[y_i = floor(x_i).\]In the backward pass, the simple StraightThrough Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]Parameters: x (Variable) – Input variable Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
identity
(x, n_outputs=1, outputs=None)[source]¶ Identity function.
\[y = x\]Parameters: x (Variable) – ND array. Returns: ND array Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
matrix_diag
(x, n_outputs=1, outputs=None)[source]¶ Returns an array where the last two dimensions consist of the diagonal matrix.
Parameters: x (Variable) – ND array with shape (\(M_0 \times \ldots \times M_N\)). Returns: ND array with shape (\(M_0 \times \ldots \times M_N \times M_N\)). Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
matrix_diag_part
(x, n_outputs=1, outputs=None)[source]¶ Returns an array in which the values of the last dimension consist of the diagonal elements of the last two dimensions of an input array.
Parameters: x (Variable) – ND array with shape (\(M_0 \times \ldots \times M_N \times M_N\)). Returns: ND array with shape (\(M_0 \times \ldots \times M_N\)). Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
batch_matmul
(a, b, transpose_a=False, transpose_b=False, n_outputs=1, outputs=None)[source]¶ Batch matrix multiplication.
Two of batchs of matrices are multiplied for each sample in a batch. A batch of matrices is composed as […, P, Q] where the last two dimensions compose matrix dimensions, and the first dimensions up to the third last dimension are considered as batch samples.
Parameters:  a (Variable) – ND array with >= 2dim. The last two dimensions will be treated as a matrix.
 b (Variable) – ND array with >= 2dim. The last two dimensions will be treated as a matrix. The product of the size of 0th dimension through the size of the third last dimension must be same as that of the input
a
.  transpose_a (bool) – Transpose the last two axes of
a
in matrix multiplication. [default=``False``]  transpose_b (bool) – Transpose the last two axes of
b
in matrix multiplication. [default=``False``]
Returns: Output of samplewise matrix multiplication in a batch. When
a
is of a shape of [N, P, Q],b
is of a shape of [N, Q, R], and transpose options are all False, the output will be a shape of [N, P, R].Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sin
(x, n_outputs=1, outputs=None)[source]¶ Elementwise sine (sin) function.
\[y_i = \sin (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
cos
(x, n_outputs=1, outputs=None)[source]¶ Elementwise cosine (cos) function.
\[y_i = \cos (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
tan
(x, n_outputs=1, outputs=None)[source]¶ Elementwise tangent (tan) function.
\[y_i = \tan (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sinh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic sine (sinh) function.
\[y_i = \sinh (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
cosh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic cosine (cosh) function.
\[y_i = \cosh (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
tanh
(x, n_outputs=1, outputs=None)[source] Elementwise hyperbolic tangent (tanh) function.
\[y_i = \tanh (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
asin
(x, n_outputs=1, outputs=None)[source]¶ Elementwise arcsine (asin) function.
\[y_i = \arcsin (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
acos
(x, n_outputs=1, outputs=None)[source]¶ Elementwise arccosine (acos) function.
\[y_i = \arccos (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
atan
(x, n_outputs=1, outputs=None)[source]¶ Elementwise arctangent (atan) function.
\[y_i = \arctan (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
asinh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic arcsine (asinh) function.
\[y_i = \text{arcsinh} (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
acosh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic arccosine (acosh) function.
\[y_i = \text{arccosh} (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
atanh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise hyperbolic arctangent (atanh) function.
\[y_i = \text{arctanh} (x_i)\]Parameters: x (Variable) – ND array Returns: ND array with the same shape as x Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Array Manipulation¶

nnabla.functions.
concatenate
(*x, **kw)[source]¶ Concatenate a variable number of input arrays along the specified axis.
Parameters: Returns: Concatenate variable
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
split
(x, axis=0)[source]¶ Split arrays at the specified axis.
It returns a number corresponding the size of the given axis (i.e
x.shape[axis]
) ofVariable
s.Parameters: Returns: A
tuple
ofVariable
sSee also
nnabla.function_bases.split()
.

nnabla.functions.
stack
(*x, **kw)[source]¶ Joins two or more arrays on a new axis.
Note
Unlike
nnabla.functions.concatenate()
, which joins arrays on an existing axis, Stack joins arrays on a new axis.Parameters:  *x (Variable) – ND arrays. The sizes of all the arrays to be stacked must be the same. [variadic][parameter]
 axis (int) – The axis on which to concatenate arrays. Axis indices take on values 0, 1, 2, and so on from the left. For example, to stack four (3,28,28) inputs on the second axis, specify 1. In this case, the output size will be (3,4,28,28). [default=``0``]
Returns: Output
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
slice
(x, start=None, stop=None, step=None, n_outputs=1, outputs=None)[source]¶ Slice arrays along specified axis.
Parameters:  x (Variable) – ND array
 start (repeated int64) – Start indices for each axis [default=``(0,) * len(x.shape)``]
 stop (repeated int64) – Stop indices for each axis [default=``tuple(x.shape)``]
 step (repeated int64) – Step indices for each axis [default=``(1,) * len(x.shape)``]
Returns: Sliced ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
pad
(x, pad_width=None, mode='constant', constant_value=None, n_outputs=1, outputs=None)[source]¶ Pads given ND array with specified sizes of dimensions. Padding begins at the last dimension of input x and continues for the specified padding dimension.
Parameters:  x (Variable) – ND array
 pad_width (repeated int64) –
nelem tuple, where n/2 <= input dimensions and n is even. len(pad_width)/2 represents the padding dimension(e.g. 1D, 2D, 3D etc.). (Currently padding upto 3D is supported)
[default=``(0,) * len(x.shape)``]
 mode (string) –
Padding mode is one of the following.
 constant : Elements in pad region are filled with constant_value.
 replicate : Padded elements are filled with the values in nearest edges.
 reflect : Padded with the reflection of the vector mirrored on the first and last values of the vector along each axis.
(Currently only constant mode is supported)
[default=``’constant’``]
 constant_value (float) –
Constant values filled in padded regions if mode is constant.
[default=``0``]
Returns: Padded ND array (e.g. (B, C, H, W) shape) where dimension depends on pad_width. ndim() of output ND array will be same as ndim() of input ND array.
 for 1D padding :
ND input array with padding of the form (padLeft, padRight). The output ND array dimension (B, C, H, padLeft + W + padRight).
 for 2D padding :
ND input array with padding of the form (padTop, padBottom, padLeft, padRight). The output ND array dimension (B, C, padTop + H + padBottom, padLeft + W + padRight).
 for 3D padding :
ND input array with padding of the form (pasFront, padBack, padTop, padBottom, padLeft, padRight). The output ND array dimension (B, padFront + C + padBack, padTop + H + padBottom, padLeft + W + padRight).
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
transpose
(x, axes, n_outputs=1, outputs=None)[source]¶ Transposes tensor dimensions.
Parameters:  x (Variable) – ND array
 axes (repeated int64) – Source axis indices for each axis.
Returns: Transposed ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
broadcast
(x, shape, n_outputs=1, outputs=None)[source]¶ Broadcasting NDarray to the specified shape.
Parameters: Returns: Broadcasted ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
broadcast_to
(x, y, axis=None, n_outputs=1, outputs=None)[source]¶ Warning
This function is experimental suppport, so please do not actively use it.
Broadcasting NDarray to the specified buffer.
Parameters: Returns: Broadcasted ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
flip
(x, axes=None, n_outputs=1, outputs=None)[source]¶ Reverses the order of elements of the specified dimension of an array.
Parameters:  x (Variable) – ND array
 axes (repeated int64) – The index of the dimension to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB image (100,3,24,32) vertically and horizontally, specify (2,3). [default=``[len(x.shape)  1]``]
Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
shift
(x, shifts=None, border_mode='nearest', n_outputs=1, outputs=None)[source]¶ Shifts the array elements by the specified amount.
Parameters:  x (Variable) – ND array.
 shifts (repeated int64) – The amount to shift elements. For example, to shift image data to the right by 2 pixels and up 3 pixels, specify (3,2). [default=``(0,) * len(x.shape)``]
 border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. [default=``’nearest’``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
reshape
(x, shape, inplace=True, n_outputs=1, outputs=None)[source]¶ Reshapes the input variable inplace. It does not create a copy of the variable. The output variable (y) has a new shape but points to the same data as the input variable (x). This means that if the data in the output variable (y) is modified, the data in the input variable (x) also gets modified since the reshape was done inplace.
Note
This function has the same behavior as the
nnabla.Variable.reshape()
method.Parameters: Returns: Reshaped ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
one_hot
(x, shape, n_outputs=1, outputs=None)[source]¶ OneHot creates onehot vector based on input indices.
Parameters: Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Stochasticity¶

nnabla.functions.
rand
(low=0, high=1, shape=[], seed=1, n_outputs=1, outputs=None)[source]¶ Samples numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and shape of the returned Variable.
Parameters: Returns: Variable with the shape specified in the argument.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
randint
(low=0, high=1, shape=[], seed=1, n_outputs=1, outputs=None)[source]¶ Samples integer numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and shape of the returned Variable.
Parameters: Returns: Variable with the shape specified in the argument. The dtype is int32.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
randn
(mu=0, sigma=1, shape=[], seed=1, n_outputs=1, outputs=None)[source]¶ Samples numbers from a normal distribution \(x \sim N(\mu, \sigma)\) given mean \(\mu\), standard deviation \(\sigma\), and shape of the returned Variable.
Parameters: Returns: Variable with the shape specified in the argument.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
dropout
(x, p=0.5, seed=1, n_outputs=1, outputs=None)[source]¶ Dropout. Samples a number \(u\) from a uniform distribution in \([0, 1]\) , and ignores the input if \(u \leq p\).
\[\begin{split}y = \left\{ \begin{array}{ll} \frac{x}{1  p} & (u > p) \\ 0 & ({\rm otherwise}) \end{array} \right.\end{split}\]Note
Usually dropout only applied during training as below (except Bayesian dropout).
h = PF.affine(x, num_hidden) if train: h = F.dropout(h, 0.5)
Parameters: Returns: ND array with the same shape as x
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
top_k_data
(x, k, abs=False, reduce=True, base_axis=1, n_outputs=1, outputs=None)[source]¶ Select the k largest values from each sample in x to propagate unmodified and set all other values to 0. If abs is True, the k largest values are selected by magnitude. If reduce is True (the default), all feature dimensions are reduced to a single dimension of size k that propagates only the k largest values. Otherwise, if reduce is False, input and output dimensions are identical. Dimensions before base_axis are treated as number of sample dimensions and k values get selected from all elements of a sample (dimensions from base_axis) regardless of shape.
>>> import nnabla as nn, nnabla.functions as F >>> x = nn.Variable((4, 5, 6)) >>> F.top_k_data(x, 3, reduce=False).shape (4, 5, 6) >>> F.top_k_data(x, 3, reduce=True).shape (4, 3) >>> F.top_k_data(x, 3, reduce=True, base_axis=2).shape (4, 5, 3)
Parameters:  x (Variable) – ND array
 k (int) – Number of largest data values to propagate.
 abs (bool) – Determine largest data values by magnitude. [default=``False``]
 reduce (bool) – Reduce feature size to one dimension of size k. [default=``True``]
 base_axis (int) – First dimension of the sample shape. [default=``1``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
top_k_grad
(x, k, abs=False, base_axis=1, n_outputs=1, outputs=None)[source]¶ Select the k largest gradients for each sample in x to backpropagate unmodified and set all other gradients to 0. If abs is True, the k largest gradients are selected by magnitude. Dimensions before base_axis are treated as number of sample dimensions and k gradients get selected from all gradients of a sample (dimensions from base_axis) regardless of shape.
Parameters: Returns: ND array with same shape and data as x.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
random_crop
(x, shape=None, base_axis=1, seed=1, n_outputs=1, outputs=None)[source]¶ RandomCrop randomly extracts a portion of an array.
Parameters:  x (Variable) – ND array
 shape (
tuple
ofint
) – The data size to extract. For example, to randomly extract a portion of the image (3,48,48) from a 3,64,64 image, specify (3,48,48). [default=``x.shape``]  base_axis (int) – No Description [default=``1``]
 seed (int) – Random seed. When 1, seed is sampled from global random number generator. [default=``1``]
Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
random_flip
(x, axes=None, base_axis=1, seed=1, n_outputs=1, outputs=None)[source]¶ Reverses the order of elements of the specified dimension of an array at 50% probability.
Parameters:  x (Variable) – ND array
 axes (repeated int64) – The index of the axis to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB images (100, 3,24,32) vertically and horizontally at random, specify (2,3). [default=``[len(x.shape)  1]``]
 base_axis (int) – No Description [default=``1``]
 seed (int) – Random seed. When 1, seed is sampled from global random number generator. [default=``1``]
Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
random_shift
(x, shifts=None, border_mode='nearest', base_axis=1, seed=1, n_outputs=1, outputs=None)[source]¶ Randomly shifts the array elements within the specified range.
Parameters:  x (Variable) – ND array.
 shifts (repeated int64) – Max absolute amount to shift elements. For example, to shift image data horizontally by \(\pm 2\) pixels and vertically by \(\pm 3\) pixels, specify (3,2). [default=``(0,) * len(x.shape)``]
 border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. [default=``’nearest’``]
 base_axis (int) – No Description [default=``1``]
 seed (int) – Random seed. When 1, seed is sampled from global random number generator. [default=``1``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
image_augmentation
(x, shape=None, pad=(0, 0), min_scale=1.0, max_scale=1.0, angle=0.0, aspect_ratio=1.0, distortion=0.0, flip_lr=False, flip_ud=False, brightness=0.0, brightness_each=False, contrast=1.0, contrast_center=0.0, contrast_each=False, noise=0.0, seed=1, n_outputs=1, outputs=None)[source]¶ ImageAugmentation randomly alters the input image.
Parameters:  x (Variable) – ND array.
 shape (
tuple
ofint
) – The output image data size. [default=``x.shape``]  pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=``(0, 0)``]  min_scale (float) – The minimum scale ratio when randomly scaling the image. For example, to scale down to 0.8 times the size of the original image, specify “0.8”. To not apply random scaling, set both min_scale and max_scale to “1.0”. [default=``1.0``]
 max_scale (float) – The maximum scale ratio when randomly scaling the image. For example, to scale down to 2 times the size of the original image, specify “2.0”. [default=``1.0``]
 angle (float) – The rotation angle range in radians when randomly rotating the image. The image is randomly rotated in the Angle to +Angle range. For example, to rotate in a +15 degree range, specify “0.26” (15 degrees/360 degrees * 2PI). To not apply random rotation, specify “0.0”. [default=``0.0``]
 aspect_ratio (float) – The aspect ratio range when randomly deforming the image. For example, to deform aspect ratio of image from 1:1.3 to 1.3:1, specify “1.3”. To not apply random deforming, specify “1.0”. [default=``1.0``]
 distortion (float) – The distortion range when randomly distorting the image. To not apply distortion, specify “0.0”. [default=``0.0``]
 flip_lr (bool) – Whether to randomly flip the image horizontally at 50% probability. [default=``False``]
 flip_ud (bool) – Whether to randomly flip the image vertically at 50% probability. [default=``False``]
 brightness (float) – The absolute range of values to randomly add to the brightness. A random value in the Brightness to +Brightness range is added to the brightness. For example, to vary the brightness in the 0.05 to +0.05 range, specify “0.05”. To not apply random addition to brightness, specify “0.0”. [default=``0.0``]
 brightness_each (bool) – Whether to apply the random addition to brightness (as specified by brightness) to each color channel. True: brightness is added based on a different random number for each channel. False: brightness is added based on a random number common to all channels. [default=``False``]
 contrast (float) – The range in which to randomly vary the image contrast. The contrast is varied in the 1/Contrast times to Contrast times range. The output brightness is equal to (input  contrast_center) * contrast + contrast_center. For example, to vary the contrast in the 0.91 times to 1.1 times range, specify “1.1”. To not apply random contrast variation, specify “1.0”. [default=``1.0``]
 contrast_center (float) – Intensity center used for applying contrast. [default=``0.0``]
 contrast_each (bool) – Whether to apply the random contrast variation (as specified by contrast) to each color channel. True: contrast is varied based on a different random number for each channel. False: contrast is varied based on a random number common to all channels. [default=``False``]
 noise (float) – Sigma of normal random number to be added. [default=``0.0``]
 seed (int) – Random seed. When 1, seed is sampled from global random number generator. [default=``1``]
Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Loss Functions¶

nnabla.functions.
sigmoid_cross_entropy
(x, target, n_outputs=1, outputs=None)[source]¶ Elementwise cross entropy between x and the target variables, passed to a sigmoid function.
\[y_i =  \left(x^{(1)}_i \ln \left(\sigma \left(x^{(0)}_i \right)\right) + \ \left(1  x^{(1)}_i\right) \ln \left(1  \sigma \left(x^{(0)}_i \ \right)\right)\right)\]where \(\sigma(s)=\frac{1}{1+\exp(s)}\).
Note
SigmoidCrossEntropy is equivalent to Sigmoid+BinaryCrossEntropy, but computing them at once has the effect of reducing computational error.
Parameters: Returns: ND array of elementwise losses.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_cross_entropy
(x, target, n_outputs=1, outputs=None)[source]¶ Elementwise cross entropy between x and the target variables.
\[y_i =  \left(x^{(1)}_i * \ln \left(x^{(0)}_i\right) + \left(1  \ x^{(1)}_i\right) * \ln \left(1  x^{(0)}_i\right)\right).\]Parameters: Returns: ND array of elementwise losses.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
softmax_cross_entropy
(x, target, axis=None, n_outputs=1, outputs=None)[source]¶ Elementwise cross entropy between the variables and the variables of a label given by a category index with Softmax normalization.
\[y_{j} = \ln \left(\frac{\exp(x_{j,t_j})}{\sum_{i'} \exp(x_{j,i'})}\right)\]along dimension specified by axis (\(i\) is the axis where normalization is performed on).
Note
SoftmaxCrossEntropy is equivalent to Softmax+CategoricalCrossEntropy, but computing them at once has the effect of reducing computational error.
Parameters: Returns: ND array of elementwise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
categorical_cross_entropy
(x, target, axis=None, n_outputs=1, outputs=None)[source]¶ Elementwise cross entropy between x and the target t where targets are given by a category index.
\[y_{j} = \ln \left( x_{j, t_j} \right)\]along dimension specified by axis (\(i\) is the axis where normalization is performed on).
Parameters: Returns: ND array of elementwise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
squared_error
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise squared error
\[y_i = \left(x^{(0)}_i  x^{(1)}_i\right)^2.\]Parameters: Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
absolute_error
(x0, x1, n_outputs=1, outputs=None)[source]¶ Elementwise absolute error
\[y_i =  x^{(0)}_i  x^{(1)}_i .\]Parameters: Returns: ND array.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
huber_loss
(x0, x1, delta=1.0, n_outputs=1, outputs=None)[source]¶ Elementwise Huber loss
\[\begin{split}y_i= \left\{ \begin{array}{ll} d^2 & (d < \delta)\\ \delta (2 d  \delta) & ({\rm otherwise}) \end{array} \right.\end{split}\]where \(d = x^{(0)}_i  x^{(1)}_i\)
Parameters: Returns: ND array of elementwise losses.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
epsilon_insensitive_loss
(x0, x1, epsilon, n_outputs=1, outputs=None)[source]¶ Elementwise Epsilon Insensitive Loss
\[\begin{split}y_i= \left\{ \begin{array}{ll}  x^{(0)}_i  x^{(1)}_i   \epsilon & if \ \  x^{(0)}_i  x^{(1)}_i  > \epsilon \\ 0 & otherwise \end{array} \right.\end{split}\]Parameters: Returns: ND array of elementwise losses.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
kl_multinomial
(p, q, base_axis=1, n_outputs=1, outputs=None)[source]¶ The Kullback Leibler Divergence for multinomial distributions.
\[D = \sum_i p_i \log \left( \frac{p_i}{q_i} \right)\]Parameters: Returns: Kullback Leibler divergence \(KL(p \parallel q)\).
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Quantized Neural Network Layers¶

nnabla.functions.
binary_sigmoid
(x, n_outputs=1, outputs=None)[source]¶ Elementwise binary sigmoid function. In the forward pass, it computes
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ 0 & ({\rm otherwise})\end{cases},\end{split}\]but in the backward pass, a straightthrough approximation of the gradient is used, i.e.,
\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (x \geq 1) \\ \frac{1}{2} & ({\rm otherwise}) \end{cases}.\end{split}\]References
Parameters: x (Variable) – Input . Returns: Output. Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_tanh
(x, n_outputs=1, outputs=None)[source]¶ Elementwise binary tanh function. In the forward pass, it computes
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ 1 & ({\rm otherwise}) \end{cases},\end{split}\]but in the backward pass, a straightthrough approximation of the gradient is used, i.e.,
\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (x \geq 1) \\ 1 & ({\rm otherwise}) \end{cases}.\end{split}\]References
Parameters: x (Variable) – Input . Returns: Output. Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_connect_affine
(x, weight, binary_weight, bias=None, base_axis=1, n_outputs=1, outputs=None)[source]¶ This function provides a BinaryConnect affine layer. It computes in the forward pass
\[y_j = \sum_{i} sign(w_{j,i}) x_i,\]i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.
This function should be used together with
batch_normalization()
.Note
1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).
2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.
References
Parameters: Returns: Output.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_connect_convolution
(x, weight, binary_weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, n_outputs=1, outputs=None)[source]¶ This function provides a BinaryConnect convolution layer. It computes in the forward pass
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j},\]i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.
This function should be used together with
batch_normalization()
.Reference
Note
1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).
2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.
Parameters:  x (Variable) – Input.
 weight (Variable) – Weight. [parameter]
 binary_weight (Variable) – Binarized weight. [parameter]
 bias (Variable) – Bias. [optional][parameter]
 base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=``1``]
Returns: Output
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_weight_affine
(x, weight, binary_weight, alpha, bias=None, base_axis=1, n_outputs=1, outputs=None)[source]¶ This function provides a Binary Weight Network affine layer. It computes in the forward pass
\[y_j = \frac{1}{\\mathbf{w}_j\_{\ell_1}} \sum_{i} sign(w_{j,i}) x_i\]i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_j = \frac{1}{\\mathbf{w}_j\_{\ell_1}}\).
Reference
Note
1) If you would like to share the binary weights with other layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).
2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.
Parameters: Returns: Output.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
binary_weight_convolution
(x, weight, binary_weight, alpha, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, n_outputs=1, outputs=None)[source]¶ This function provides a Binary Weight Network convolution layer. It computes in the forward pass
\[y_{n, a, b} = \frac{1}{\\mathbf{w}_n\_{\ell_1}} \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_n = \frac{1}{\\mathbf{w}_n\_{\ell_1}}\).
Reference
Note
1) If you would like to share the binary weights between other standard layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).
2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.
Parameters:  x (Variable) – Input.
 weight (Variable) – Weight. [parameter]
 binary_weight (Variable) – Binarized weight. [parameter]
 alpha (Variable) – Alpha. [parameter]
 bias (Variable) – Bias. [optional][parameter]
 base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 pad (
tuple
ofint
) – Padding sizes for dimensions. [default=``(0,) * (len(x.shape)  (base_axis+1))``]  stride (
tuple
ofint
) – Stride sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=``(1,) * (len(x.shape)  (base_axis+1))``]  group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=``1``]
Returns: Output
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
fixed_point_quantize
(x, sign=True, n=8, delta=0.0625, quantize=True, ste_fine_grained=True, outputs=None)[source]¶ Fixed Point Quantize
Parameters:  x (Variable) – An input variable.
 sign (bool) – Indicate the signed number or the unsigned number. Default is true.
 n (int) – Bit width used. Note that sign consumes one bit. \(n1\) is used for number representation in signed case.
 delta (float) – Step size.
 quantize (bool) – If true, quantize input, otherwise not.
 ste_fine_grained (bool) – If true, STE is not 1.
Returns: ND array.
Return type: See also
nnabla.function_bases.fixed_point_quantize
.In the forward pass,
\[\begin{split}\begin{equation} q_i= \left\{ \begin{array}{ll} max & if \ \ \ x_i > max \\ sign(x_i) \times floor(x_i \delta^{1} + 2^{1}) \times \delta & if \ \ min \le x_i \le max \\ min & if \ \ x_i < min \\ \end{array} \right., \end{equation}\end{split}\]where \(\delta\) is the step size, \((min, max) :=( (2^{n1}  1)\delta, (2^{n1}  1)\delta)\) if \(sign\) is true, \((min, max) := (0, (2^n  1) \delta)\) otherwise, and \(n\) is the total bitwidth used.
In the backward pass when using ste_fine_grained as false,
\[\begin{equation} \frac{\partial q_i}{\partial x_i} = 1. \end{equation}\]In the backward pass when using ste_fine_grained as true,
\[\begin{split}\begin{equation} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > max \\ 1 & if \ \ min \le x_i \le max \\ 0 & if \ \ x_i < min \\ \end{array} \right.. \end{equation}\end{split}\]Note
Quantized values are stored as floating point number, since this function is for simulation purposes.

nnabla.functions.
pow2_quantize
(x, sign=True, with_zero=True, n=8, m=1, quantize=True, ste_fine_grained=True, outputs=None)[source]¶ Pow2 Quantize
Parameters:  x (Variable) – An input variable.
 sign (bool) – Indicate the signed number or the unsigned number. Default is true.
 with_zero (bool) – Indicate using zero as a quantized value. Default is true. Note that zero consumes one bit.
 n (int) – Bit width used. Note that sign consumes one bit. \(n1\) is used for number representation in signed case. Default is 8.
 m (int) – \(2^m\) is the upper bound of the dynamic range and \(2^m\) is the lower bound, \(m \in \mathcal{Z}\). Default is 1.
 quantize (bool) – If true, quantize input, otherwise not.
 ste_fine_grained (bool) – If true, STE is not 1.
Returns: ND array.
Return type: See also
nnabla.function_bases.pow2_quantize
.In the forward pass of signed case,
\[\begin{split}q_i= \left\{ \begin{array}{ll} max_{+} & if \ \ \overline{q_i} > max_{+} \\ \overline{q_i} & if \ \ min_{+} \le \overline{q_i} \le max_{+} \\ min_{+} & if \ \ 0 \le \overline{q_i} < min_{+} \\ min_{} & if \ \ min_{} < \overline{q_i} < 0 \\ \overline{q_i} & if \ \ max_{} \le \overline{q_i} \le min_{}\\ max_{} & if \ \ \overline{q_i} < max_{} \\ \end{array} \right.,\end{split}\]where
\[\begin{split}&& max_{+} = 2^{m}, min_{+} = 2^{m  (2^{n1}  1)},\\ && max_{} = 2^{m}, min_{} = 2^{m  (2^{n1}  1)},\\ && \overline{q_i} = sign(x_i) \times 2^{round(\log_2 x_i)}.\end{split}\]This quantization uses the geometric mean between two poweroftwo numbers as quantization threshold.
In the forward pass of unsigned case,
\[\begin{split}q_i= \left\{ \begin{array}{ll} max & if \ \ \overline{q_i} > max \\ \overline{q_i} & if \ \ min \le \overline{q_i} \le max \\ min & if \ \ 0 < \overline{q_i} < min \\ \end{array} \right.,\end{split}\]where
\[\begin{split}&& max = 2^{m}, min = 2^{m  (2^{n}  1)},\\ && \overline{q_i} = 2^{int(\log_2 x_i)}.\end{split}\]When using with_zero as true, a pruning threshold is used to round an input to 0 or \(min\). The pruning threshold is defined in this function as the following,
\[pruning\ threshold = min \times 2^{\frac{1}{2}}.\]If an absolute value of the input is lesser than this value, the input is rounded to 0, otherwise \(min\).
In the backward pass when using ste_fine_grained as false,
\[\frac{\partial q_i}{\partial x_i} = 1.\]In the backward pass when using ste_fine_grained as true,
\[\begin{split}\frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \overline{q_i} > max_{+} \\ 1 & if \ \ otherwise \\ 0 & if \ \ \overline{q_i} < max_{} \\ \end{array} \right..\end{split}\]
Unsupported, Special Use¶

nnabla.functions.
vat_noise
(x, w, base_axis=1, eps=1.0, n_outputs=1, outputs=None)[source]¶ Noise for virtual adversarial training.
This layer is a special layer for GUI network designing, specialized for getting the noise of virtual adversarial training.
In the backward process, the weight parameter will be replaced with the gradient.
Forward
\[y_i = \frac{\epsilon x_i}{\sqrt{\sum_k x_k^2 + c}}\]Backward
\[\delta x_i = 0\]\[w_i = \epsilon \delta y_i\]Note
This layer is a special layer for GUI network designing.
References
Parameters:  x (Variable) – ND array of noise input. Noise is standard Gaussian noise initially, but the next step, fed back gradient variable.
 w (Variable) – ND array for keep gradient values.
 base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=``1``]
 eps (float) – Noise norm (l2) factor. [default=``1.0``]
Returns: ND array
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
unlink
(x, n_outputs=1, outputs=None)[source]¶ This function behaves as an identity function on the forward pass, and deletes the gradient for the background pass.
This layer is a special layer for GUI network designing, used for getting zero backward operation by adding this layer.
Forward
\[y_i = x_i\]Backward
\[\delta x_i = 0\]Note
This layer is a special layer for GUI network designing.
Parameters: x (Variable) – ND array. Returns: ND array. Return type: Variable Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
sink
(*x, **kw)[source]¶ Creates a dummy variable used to call forward or backward function of multiple variables at one place.
This takes any numbers of input variables with any shape, and creates a single 0shape outputs. The forward pass does nothing. The backward pass set ones to the input grads if one_input_grad is set as true.
Note
sink
can only be called at the very end of the graph, andgrad
of input variables are clearedwheny.backward(clear_buffer=True)
is called.Parameters: Returns: Dummy variable.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Image Object Detection¶

nnabla.functions.
nms_detection2d
(x, thresh=None, nms=None, nms_per_class=None, n_outputs=1, outputs=None)[source]¶ NonMaximum Suppression (NMS) to 2D Object detector output. The input is a 3dimensional tensor with shape of
(B, N, 5 + C)
whereB
denotes batch size,N
denotes the number of detection box candidates, andC
denotes the number of classes of object detection.5 + C
consists of the box coordinatesx, y, w, h
in normalized coordinates (size of each x and y are 1.0), objectness (learned to predict IoU value to ground truth box), and the classprobabilities ofC
classes.It outputs a tensor with the same dimensions as the input, where all values are copied from the input to the output, except the class probabilities are multiplied by objectness, and possibly suppressed to 0 by NMS. During NMS, all of combination of pairs of bounding boxes is compared. For each pair, the bounding box with a lower detection score (described below) is suppressed if the overlap ratio (the IoU) is greater than the value of
nms
.There are two suppression modes for NMS.
1. Suppress by class probability (
nms_per_class
isTrue
): For each bounding box, the detection score is calculated byobjectness * probability[class_id]
for each class. The suppression is done for each class independently.2. Suppress by objectness (
nms_per_class
isFalse
): The suppression is done for each bounding box usingobjectness
as a detection score. All class probabilities becomes 0 for every suppressed boxes.References
Parameters: Returns: A 3dim array with the same dimensions with the input.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Validation¶

nnabla.functions.
top_n_error
(x, target, axis=None, n=1, n_outputs=1, outputs=None)[source]¶ Top N error along the dimension specified by the axis, the element of outputs is
\[\begin{split}y_i = \left \{ \begin{array}{l} 1 \ (x_i \ is \ not \ within \ Nth \ place) \\ 0 \ (x_i \ is \ within \ Nth \ place) \end{array} \right.\end{split}\]Parameters:  x (Variable) – Probabilities ND array. \(D_1 \times ... \times D_i \times ... \times D_N\)
 target (Variable) – ND array of labels. \(D_1 \times ... \times 1 \times ... \times D_N\)
 axis (int) – Axis on which the top N error is calculated. [default=``len(x.shape)  1``]
 n (int) – top N [default=``1``]
Returns: Elementwise error ND array. (\(D_1 \times ... \times 1 \times ... \times D_N\))
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Spectral Operation¶

nnabla.functions.
fft
(x, signal_ndim, normalized=False, n_outputs=1, outputs=None)[source]¶ Complextocomplex Descrete Fourier Transform,
\[X_{k_1, \ldots, k_d} = \sum_{n_1=0}^{N_11} \dots \sum_{n_d=0}^{N_d1} x_{n_1, \ldots, n_d} \exp\left(2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]where
\[k_i = 0, \ldots, N_i  1.\]This function now supports 1D, 2D, and 3D DFT with or without the leading batch dimentsion(s).
The input is expected to be complexvalued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F from nnabla.ext_utils import get_extension_context ctx = get_extension_context("cudnn") nn.set_default_context(ctx) # Example for a batched 2DFFT and 2DIFFT (batchsize: 2, datasize: 4x3) x_data = np.random.rand(2, 4, 3) + 1j * np.random.rand(2, 4, 3) x = nn.Variable.from_numpy_array(np.stack([np.real(x_data), np.imag(x_data)], axis=3)) y = F.fft(x, signal_ndim=2, normalized=True) z = F.ifft(y, signal_ndim=2, normalized=True) z.forward() np.allclose(z.d[..., 0] + 1j*z.d[...,1], x_data)
Parameters: Returns: FFT transformed signal.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.
ifft
(x, signal_ndim, normalized=False, n_outputs=1, outputs=None)[source]¶ Complextocomplex inverse Descrete Fourier Transform,
\[X_{k_1, \ldots, k_d} = \frac{1}{\prod_{i=1}^{d} N_i} \sum_{n_1=0}^{N_11} \dots \sum_{n_d=0}^{N_d1} x_{n_1, \ldots, n_d} \exp\left(2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]where
\[k_i = 0, \ldots, N_i  1.\]This function now supports 1D, 2D, and 3D DFT with or without the leading batch dimentsion(s).
The input is expected to be complexvalued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.
Parameters: Returns: IFFT transformed signal.
Return type: Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Parametric Functions¶
In NNabla, trainable models are created by composing functions that have optimizable parameters.
These functions are called parametric functions.
Parametric functions are provided by nnabla.parametric_functions
.
 See also:
 Python API Tutorial.
Parameter Management API¶
The parameters registered by List of Parametric Functions can be managed using APIs listed in this section.

nnabla.parameter.
parameter_scope
(name)[source]¶ Grouping parameters registered by parametric functions listed in
nnabla.parametric_functions
.Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.functions as F with nn.parameter_scope('conv1'): conv_out1 = PF.convolution(x, 32, (5, 5)) bn_out1 = PF.batch_normalization(conv_out1) act_out1 = F.relu(bn_out1) with nn.parameter_scope('conv2'): conv_out2 = PF.convolution(act_out1, 64, (3, 3)) bn_out2 = PF.batch_normalization(conv_out2) act_out2 = F.relu(bn_out2)
Nesting with blocks allows you to nest parameter scopes. This can also be done by using “/” inside the parameter names.
Example:
with nn.parameter_scope('network1'): with nn.parameter_scope('conv1'): conv_out1 = PF.convolution(x, 32, (5, 5)) bn_out1 = PF.batch_normalization(conv_out1) act_out1 = F.relu(bn_out1) with nn.parameter_scope('conv2'): conv_out2 = PF.convolution(act_out1, 64, (3, 3)) bn_out2 = PF.batch_normalization(conv_out2) act_out2 = F.relu(bn_out2)
is equivalent to
with nn.parameter_scope('network1/conv1'): conv_out1 = PF.convolution(x, 32, (5, 5)) bn_out1 = PF.batch_normalization(conv_out1) act_out1 = F.relu(bn_out1) with nn.parameter_scope('network1/conv2'): conv_out2 = PF.convolution(act_out1, 64, (3, 3)) bn_out2 = PF.batch_normalization(conv_out2) act_out2 = F.relu(bn_out2)

nnabla.parameter.
get_parameters
(params=None, path='', grad_only=True)[source]¶ Get parameter Variables under the current parameter scope.
Parameters: Returns: Return type:

nnabla.parameter.
save_parameters
(path)[source]¶ Save all parameters into a file with the specified format.
Currently hdf5 and protobuf formats are supported.
Parameters: path – path or file object

nnabla.parameter.
load_parameters
(path)[source]¶ Load parameters from a file with the specified format.
Parameters: path – path or file object

nnabla.parameter.
get_parameter_or_create
(name, shape=None, initializer=None, need_grad=True)[source]¶ Returns an existing parameter variable with the provided name. If a variable with the provided name does not exist, a new variable with the provided name is returned.
Parameters:  name (str) – The name under the current scope. If it already exists, the name is queried from the parameter manager.
 shape (
tuple
ofint
) – Shape of created parameter. The shape of the specified parameter must match with this shape. The default is None which is only valid if initializer is given as annumpy.ndarray
.  initializer (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – An initialization function to be applied to the parameter.numpy.ndarray
can also be given to initialize parameters from numpy array data.  need_grad (bool) – The value for need_grad . The default is True.
List of Parametric Functions¶
Parametric functions are provided by nnabla.parametric_functions
, as listed below.
Like functions listed in List of Functions, they take Variable
(s) as
first argument(s) followed by options specific to a parametric function. In addition,
they register parameter Variable
(s) into the parameter scope.
All parametric functions listed below are decorated with the following decorator.

nnabla.parametric_functions.
parametric_function_api
(scope_name=None)[source]¶ Decorator for parametric functions.
The decorated function is always called under a parameter scope
scope_name
. Also, the decorator adds an additional argumentname
(str
, default isNone
) at the end. Ifname
is specified, the scopescope_name
comes under a scopename
. This feature could reduce vertical space usage of the source code. Any parametric function should be decoreated by this.Parameters: scope_name (str, optional) – The original function will be called under a parameter scope named by scope_name
.Returns: A decorated parametric function. Return type: function
See Parameter Management API to know how to query and manipulate registered variables.
Here is the list of parametric functions.

nnabla.parametric_functions.
affine
(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ The affine layer, also known as the fully connected layer. Computes
\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf A}, {\mathbf b}\) are constants.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))f
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = affine(<args>)

nnabla.parametric_functions.
convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ ND Convolution with a bias term.
For Dilated Convolution (a.k.a. Atrous Convolusion), refer to:
 Chen et al., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. https://arxiv.org/abs/1606.00915
 Yu et al., MultiScale Context Aggregation by Dilated Convolutions. https://arxiv.org/abs/1511.07122
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: ND array.
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = convolution(<args>)

nnabla.parametric_functions.
depthwise_convolution
(inp, kernel, pad=None, stride=None, dilation=None, multiplier=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ ND Deptwise Convolution with a bias term.
Reference:
 Chollet: Chollet, Francois. “Xception: Deep Learning with Depthwise Separable Convolutions. https://arxiv.org/abs/1610.02357
Parameters:  inp (Variable) – ND array.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  multiplier (
int
) – Number of output feature maps per input feature map.  w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: ND array.
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = depthwise_convolution(<args>)

nnabla.parametric_functions.
deconvolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Deconvolution layer.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of deconvolution kernels (which is equal to the number of output channels). For example, to apply deconvolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply deconvolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: ND array.
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = deconvolution(<args>)

nnabla.parametric_functions.
depthwise_deconvolution
(inp, kernel, pad=None, stride=None, dilation=None, divisor=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Depthwise deconvolution computes the transposed depthwise convolution for onedimensional and twodimensional input data.
Parameters:  inp (Variable) – ND array.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  divisor (
int
) – Number of input feature maps per output feature map.  w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: ND array.
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = depthwise_deconvolution(<args>)

nnabla.parametric_functions.
batch_normalization
(inp, axes=[1], decay_rate=0.9, eps=1e05, batch_stat=True, output_stat=False, fix_parameters=False, name=None)[source]¶ Batch normalization layer.
\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i  \mu\right)^2\\ \hat{x}_i &=& \frac{x_i  \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.
Parameters:  inp (Variable) – ND array of input.
 axes (
tuple
ofint
) – Axes mean and variance are taken.  decay_rate (float) – Decay rate of running mean and variance.
 eps (float) – Tiny value to avoid zero division by std.
 batch_stat (bool) – Use minibatch statistics rather than running ones.
 output_stat (bool) – Output batch mean and variance.
 fix_parameters (bool) – When set to True, the beta and gamma will not be updated.
Returns: ND array.
Return type: References
 Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = batch_normalization(<args>)

nnabla.parametric_functions.
mean_subtraction
(inp, base_axis=1, update_running_mean=True, fix_parameters=False, name=None)[source]¶ Mean subtraction layer.
It subtracts the mean of the elements of the input array, and normalizes it to \(0\). Preprocessing arrays with this function has the effect of improving accuracy in various tasks such as image classification.
At training time, this function is defined as
\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i \\ y_i &=& x_i  \mu \end{array}\end{split}\]At testing time, the mean values used are those that were computed during training by moving average.
Note
The backward performs an approximated differentiation that takes into account only the latest minibatch.
Parameters:  inp (Variable) – ND array of input.
 base_axis (int) – Base axis of Mean Subtraction operation. Dimensions up to base_axis is treated as sample dimension.
 update_running_mean (bool) – When set to True, the running mean will not be updated.
 fix_parameters (bool) – dummy parameter. This argument dose not affect anything.
Returns: ND array.
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = mean_subtraction(<args>)

nnabla.parametric_functions.
embed
(inp, n_inputs, n_features, fix_parameters=False, name=None)[source]¶ Embed.
Embed slices a matrix/tensor with indexing array/tensor
Parameters: Returns: Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = embed(<args>)

nnabla.parametric_functions.
prelu
(inp, base_axis=1, shared=True, fix_parameters=False, name=None)[source]¶ Parametrized Rectified Linear Unit function defined as
\[y_i = \max(0, x_i) + w_i \min(0, x_i)\]where nagative slope \(w\) is learned and can vary accros channels (an axis specified with base_axis).
Parameters: Returns: ND array.
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = prelu(<args>)

nnabla.parametric_functions.
svd_affine
(inp, n_outmaps, r, base_axis=1, uv_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ SVD affine is a low rank approximation of the affine layer. It can be seen as two consecutive affine layers with a bottleneck. It computes:
\[{\mathbf y} = {\mathbf U} {\mathbf V} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf U}, {\mathbf V}, {\mathbf b}\) are constants.
The weights \({\mathbf U}\) and \({\mathbf V}\) are aproximated with singular value decomposition (SVD) of the original weight matrix \({\mathbf W}\) and by selecting the \({R}\) dominant singular values and the corresponding singular vectors. Therefore the low rank \({R}\) is the size of the bottleneck.
If uv_init is a numpy array, \({\mathbf U}\) and \({\mathbf V}\) are computed such that uv_init is approximated by \({\mathbf{UV}}\). If uv_init is None or an initializer, the product of \({\mathbf U}\) and \({\mathbf V}\) approximates the random initialization.
If \({\mathbf U}\) and \({\mathbf V}\) exist in the context, they take precedence over uv_init.
Suppose the weight of the affine is of \({I \times O}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as
\[R = \left\lfloor \frac{(1  CR)OI}{O + I} \right\rfloor.\]Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (int or tuple) – Number of output neurons per data.
 r (int) – rank of the factorized layer (size of the bottleneck)
 base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 uv_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = svd_affine(<args>)

nnabla.parametric_functions.
svd_convolution
(inp, outmaps, kernel, r, pad=None, stride=None, dilation=None, uv_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ SVD convolution is a low rank approximation of the convolution layer. It can be seen as a depth wise convolution followed by a 1x1 convolution.
The flattened kernels for the ith input map are expressed by their low rank approximation. The kernels for the ith input \({\mathbf W_i}\) are aproximated with the singular value decomposition (SVD) and by selecting the \({R}\) dominant singular values and the corresponding singular vectors.
\[{\mathbf W_{:,i,:}} ~ {\mathbf U_i} {\mathbf V_i}.\]\({\mathbf U}\) contains the weights of the depthwise convolution with multiplier \({R}\) and \({\mathbf V}\) contains the weights of the 1x1 convolution.
If uv_init is a numpy array, \({\mathbf U}\) and \({\mathbf V}\) are computed such that uv_init is approximated by \({\mathbf{UV}}\). If uv_init is None or an initializer, the product of \({\mathbf U}\) and \({\mathbf V}\) approximates the random initialization.
If \({\mathbf U}\) and \({\mathbf V}\) exist in the context, they take precedence over uv_init.
Suppose the kernel tensor of the convolution is of \({O \times I \times K \times K}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as
\[R = \left\lfloor \frac{(1  CR)OIK^2}{I(O + K^2)} \right\rfloor.\]Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (tuple) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3, 5).
 r (int) – Rank of the factorized layer.
 pad (tuple) – Padding sizes (int) for dimensions.
 stride (tuple) – Stride sizes (int) for dimensions.
 dilation (tuple) – Dilation sizes (int) for dimensions.
 uv_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = svd_convolution(<args>)

nnabla.parametric_functions.
cpd3_convolution
(inp, outmaps, kernel, r, pad=None, stride=None, dilation=None, oik_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, max_iter=500, stopping_criterion=1e05, name=None)[source]¶ CP convolution is a low rank approximation of a convolution layer. A 3D tensor containing the parameter is built by collapsing the ND kernels into 1D, then the tensor is decomposed into three matrices. The decomposed layer can be seen as linear combinations of the input feature maps to \({R}\) feature maps followed by a depthwise convolution and followed by linear combinations of the feature maps to compute the output feature maps.
The CP decomposition allows to approximate the kernel tensor by \({R}\) rank1 tensors of the form:
\[\sum_{r=1}^{R} \lambda_r {\mathbf{o}^{(r)} \otimes \mathbf{i}^{(r)} \otimes \mathbf{k}^{(r)}},\]where \({\lambda}_r\) is the nomalization coefficient and \({\otimes}\) is the outer product.
If oik_init is a numpy array, U and V are computed so that uv_init can be approximates from UV If oik_init is None or an initializer, the product of U and V approximate the randomly initialized array
If O, I and K exist in context, they are used to initialize the layer and oik_init is not used.
Suppose the kernel tensor of the affine is of \({I \times O}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as
\[R = \left\lfloor \frac{(1  CR)OIK^2}{O + I + K^2} \right\rfloor.\]References
 Lebedev, Vadim, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky, “Speedingup convolutional neural networks using finetuned cpdecomposition.”, arXiv preprint arXiv:1412.6553 (2014).
 Marcella Astrid, SeungIk Lee, “CPdecomposition with Tensor Power Method for Convolutional Neural Networks Compression”, BigComp 2017.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  r (int) – rank of the factorized layer
 pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  oik_init (numpy array or
nnabla.initializer.BaseInitializer
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 max_iter (int) – Max iteration of the ALS.
 stopping_criterion (float) – Threshold for stopping the ALS. If the value is negative, the convergence check is ignored; in other words, it may reduce the computation time.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = cpd3_convolution(<args>)

nnabla.parametric_functions.
binary_connect_affine
(inp, n_outmaps, base_axis=1, w_init=None, wb_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Binary Connect Affine, multiplierless innerproduct.
Binary Connect Affine is an affine function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_i = \sum_{i} sign(w_i) x_i.\]Therefore \(sign(w_i)\) is either \(1\) or \(1\) and the inner product simplifies to addition.
This function should be used together with Batch Normalization.
References
M. Courbariaux, Y. Bengio, and J.P. David. “BinaryConnect: Training Deep Neural Networks with binary weights during propagations.” Advances in Neural Information Processing Systems. 2015.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)
2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for binary weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
Returns: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_connect_affine(<args>)

nnabla.parametric_functions.
binary_connect_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, wb_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Binary Connect Convolution, multiplierless innerproduct.
Binary Connect Convolution is the convolution function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]Therefore \(sign(w_i)\) is either \(1\) or \(1\) and the inner product simplifies to addition.
This function should be used together with BatchNormalization.
References
M. Courbariaux, Y. Bengio, and J.P. David. “BinaryConnect: Training Deep Neural Networks with binary weights during propagations.” Advances in Neural Information Processing Systems. 2015.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)
2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for binary weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_connect_convolution(<args>)

nnabla.parametric_functions.
binary_weight_affine
(inp, n_outmaps, base_axis=1, w_init=None, wb_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Binary Weight Affine, multiplierless innerproduct with a scale factor.
Binary Weight Affine is the affine function, but the inner product in this function is the following,
\[y_j = \frac{1}{\\mathbf{w}_j\_{\ell_1}} \sum_{i} sign(w_{ji}) x_i\]Therefore \(sign(w_{ji})\) is either \(1\) or \(1\) and the inner product simplifies to addition followed by scaling factor \(\alpha = \frac{1}{\\mathbf{w}_j\_{\ell_1}}\). The number of :\(\alpha\) is the outmaps of the affine function.
References
Rastegari, Mohammad, et al. “XNORNet: ImageNet Classification Using Binary Convolutional Neural Networks.” arXiv preprint arXiv:1603.05279 (2016).
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)
2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.
 n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the weight.  wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the binary weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the bias.  fix_parameters (bool) – When set to True, the weight and bias will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_weight_affine(<args>)

nnabla.parametric_functions.
binary_weight_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, wb_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Binary Weight Convolution, multiplierless innerproduct with a scale factor.
Binary Weight Convolution is the convolution function, but the inner product in this function is the following,
\[y_{n, a, b} = \frac{1}{\\mathbf{w}_n\_{\ell_1}} \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]Therefore \(sign(w_{n, m, i, j})\) is either \(1\) or \(1\) and the inner product simplifies to addition followed by scaling factor \(\alpha = \frac{1}{\\mathbf{w}_n\_{\ell_1}}\). The number of \(n\) is the number of outmaps of the convolution function.
References
Rastegari, Mohammad, et al. “XNORNet: ImageNet Classification Using Binary Convolutional Neural Networks.” arXiv preprint arXiv:1603.05279 (2016).
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)
2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for binary weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_weight_convolution(<args>)

nnabla.parametric_functions.
inq_affine
(inp, n_outmaps, base_axis=1, num_bits=4, inq_iterations=(), selection_algorithm='random', seed=1, w_init=None, i_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Incremental Network Quantization Affine Layer
During training, the weights are sequentially quantized to poweroftwo values, which allows the training of a multiplierless network.
Using inq_iterations, one can specify after how many forward passes half of the learnable weights are fixed and quantized to powersoftwo. After reaching the last value in inq_iterations, all weights are fixed.
For more details, please refer to the reference.
Reference: Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with lowprecision weights. <https://arxiv.org/abs/1702.03044>
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.
 n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 num_bits (int) – Number of bits per weight. Value has to be larger than 1 as one bit is already used to code the value “0”
 inq_iterations (tuple of int) – Tuple of iteration numbers at which we fix half of the weights.
 selection_algorithm (str) – Chooses algorithm that is used to decide which weights are fixed. (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly)
 seed (int) – Random seed for INQ algorithm
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the weight.  i_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the indicators (0 … learnable, 1 … fixed).  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the bias.  fix_parameters (bool) – When set to True, the weight and bias will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = inq_affine(<args>)

nnabla.parametric_functions.
inq_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, num_bits=4, inq_iterations=(), selection_algorithm='random', seed=1, w_init=None, i_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Incremental Network Quantization Convolution Layer
During training, the weights are sequentially quantized to poweroftwo values, which allows the training of a multiplierless network.
Using inq_iterations, one can specify after how many forward passes half of the learnable weights are fixed and quantized to powersoftwo. After reaching the last value in inq_iterations, all weights are fixed.
For more details, please refer to the reference.
Reference: Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with lowprecision weights. <https://arxiv.org/abs/1702.03044>
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.
 n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 num_bits (int) – Number of bits per weight. Value has to be larger than 1 as one bit is already used to code the value “0”
 inq_iterations (tuple of int) – Tuple of iteration numbers at which we fix half of the weights.
 selection_algorithm (str) – Chooses algorithm that is used to decide which weights are fixed. (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly)
 seed (int) – Random seed for INQ algorithm
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the weight.  i_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the indicators (0 … learnable, 1 … fixed).  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the bias.  fix_parameters (bool) – When set to True, the weight and bias will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = inq_convolution(<args>)

nnabla.parametric_functions.
fixed_point_quantized_affine
(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, n_w=8, delta_w=0.0625, ste_fine_grained_w=True, quantize_b=True, sign_b=True, n_b=8, delta_b=0.0625, ste_fine_grained_b=True, name=None)[source]¶ FixedPoint Quantized Affine.
FixedPoint Quantized Affine is the affine function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_j = \sum_{i} Q(w_{ji}) x_i,\]where \(Q(w_{ji})\) is the fixedpoint quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 quantize_w (bool) – Quantize weights if True.
 sign_w (bool) – Use signed quantization if True.
 n_w (int) – Bit witdh used for weight.
 delta_w (float) – Step size for weight.
 ste_fine_grained_w (bool) – STE is finegrained if True.
 quantize_b (bool) – Quantize bias if True.
 n_b (int) – Bit witdh used for bias.
 delta_w – Step size for bias.
 ste_fine_grained_b (bool) – STE is finegrained if True.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = fixed_point_quantized_affine(<args>)

nnabla.parametric_functions.
fixed_point_quantized_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, n_w=8, delta_w=0.0625, ste_fine_grained_w=True, quantize_b=True, sign_b=True, n_b=8, delta_b=0.0625, ste_fine_grained_b=True, name=None)[source]¶ FixedPoint Quantized Convolution.
FixedPoint Quantized Convolution is the convolution function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]where \(Q(w_{n, m, i, j})\) is the fixedpoint quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 quantize_w (bool) – Quantize weights if True.
 quantize_bias (bool) – Quantize bias if True.
 sign_w (bool) – Use signed quantization if True.
 n_w (int) – Bit witdh used for weight.
 delta_w (float) – Step size for weight.
 ste_fine_grained_w (bool) – STE is finegrained if True.
 quantize_b (bool) – Quantize bias if True.
 n_b (int) – Bit witdh used for bias.
 delta_w – Step size for bias.
 ste_fine_grained_b (bool) – STE is finegrained if True.
Returns: ND array.
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = fixed_point_quantized_convolution(<args>)

nnabla.parametric_functions.
pow2_quantized_affine
(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, with_zero_w=False, n_w=8, m_w=2, ste_fine_grained_w=True, quantize_b=True, sign_b=True, with_zero_b=False, n_b=8, m_b=2, ste_fine_grained_b=True, name=None)[source]¶ Pow2 Quantized Affine.
Pow2 Quantized Affine is the affine function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_j = \sum_{i} Q(w_{ji}) x_i,\]where \(Q(w_{ji})\) is the powerof2 quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) Quantized values are stored as floating point number for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 quantize_w (bool) – Quantize weights if True.
 sign_w (bool) – Use signed quantization if True.
 with_zero_w (bool) – Indicate using zero as a quantized value. Default is false.
 n_w (int) – Bit witdh used for weight.
 m_w (int) – \(2^m\) is upper bound and \(2^m\) is lower bound for weights. Default is 2.
 ste_fine_grained_w (bool) – STE is finegrained if True.
 quantize_b (bool) – Quantize bias if True.
 with_zero_b (bool) – Indicate using zero as a quantized value. Default is false.
 n_b (int) – Bit witdh used for bias.
 m_b (int) – \(2^m\) is upper bound and \(2^m\) is lower bound for bias. Default is 2.
 ste_fine_grained_b (bool) – STE is finegrained if True.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pow2_quantized_affine(<args>)

nnabla.parametric_functions.
pow2_quantized_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, with_zero_w=False, sign_w=True, n_w=8, m_w=2, ste_fine_grained_w=True, quantize_b=True, with_zero_b=False, sign_b=True, n_b=8, m_b=2, ste_fine_grained_b=True, name=None)[source]¶ Pow2 Quantized Convolution.
Pow2 Quantized Convolution is the convolution function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]where \(Q(w_{n, m, i, j})\) is the powerof2 quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) Quantized values are stored as floating point number for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 quantize_w (bool) – Quantize weights if True.
 sign_w (bool) – Use signed quantization if True.
 n_w (int) – Bit witdh used for weight.
 m_w (int) – \(2^m\) is upper bound and \(2^m\) is lower bound for weights. Default is 2.
 ste_fine_grained_w (bool) – STE is finegrained if True.
 quantize_b (bool) – Quantize bias if True.
 sign_b (bool) – Use signed quantization if True.
 n_b (int) – Bit witdh used for bias.
 m_b (int) – \(2^m\) is upper bound and \(2^m\) is lower bound for bias. Default is 2.
 ste_fine_grained_b (bool) – STE is finegrained if True.
Returns: ND array.
Return type: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pow2_quantized_convolution(<args>)

nnabla.parametric_functions.
lstm
(x, h, c, state_size, w_init=None, b_init=None, fix_parameters=False, name=None)[source]¶ Long ShortTerm Memory.
Long ShortTerm Memory, or LSTM, is a building block for recurrent neural networks (RNN) layers. LSTM unit consists of a cell and input, output, forget gates whose functions are defined as following:
\[\begin{split}f_t&&=\sigma(W_fx_t+U_fh_{t1}+b_f) \\ i_t&&=\sigma(W_ix_t+U_ih_{t1}+b_i) \\ o_t&&=\sigma(W_ox_t+U_oh_{t1}+b_o) \\ c_t&&=f_t\odot c_{t1}+i_t\odot\tanh(W_cx_t+U_ch_{t1}+b_c) \\ h_t&&=o_t\odot\tanh(c_t).\end{split}\]References
S. Hochreiter, and J. Schmidhuber. “Long ShortTerm Memory.” Neural Computation. 1997.
Parameters:  x (Variable) – Input ND array with shape (batch_size, input_size).
 h (Variable) – Input ND array with shape (batch_size, state_size).
 c (Variable) – Input ND array with shape (batch_size, state_size).
 state_size (int) – Internal state size is set to state_size.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
Returns: Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = lstm(<args>)

class
nnabla.parametric_functions.
LSTMCell
(batch_size, state_size, h=None, c=None)[source]¶ 
__call__
(x, w_init, b_init, fix_parameters)[source]¶ Updates h and c by calling lstm function.
Parameters:  x (Variable) – Input ND array with shape (batch_size, input_size).
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.

Parameter Initializer¶
Some of the parametric functions optionally takes parameter initializer listed below.

class
nnabla.initializer.
BaseInitializer
[source]¶ Base class of the parameter initializer.

__call__
(shape)[source]¶ Generates an array with an initializer.
Parameters: shape ( tuple
ofint
) –numpy.ndarray
with the shape created.Returns: Array. Return type: numpy.ndarray
Note
Subclasses of
BaseInitializer
must override this method.


class
nnabla.initializer.
ConstantInitializer
(value=0)[source]¶ Bases:
nnabla.initializer.BaseInitializer
Generates a constant valued array.
Parameters: value (float) – A constant value. Example
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I
x = nn.Variable([60,1,28,28]) w = I.ConstantInitializer(0.1) b = I.ConstantInitializer() # this generates constant valued array of default value 0 h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name=’conv’

class
nnabla.initializer.
NormalInitializer
(sigma=1.0, rng=None)[source]¶ Bases:
nnabla.initializer.BaseInitializer
Generates a random array from a specified normal distribution.
\[\mathbf x \sim {\cal N} (\mathbf 0  \sigma^2 \mathbf I)\]Parameters:  sigma (float) – \(\sigma\).
 rng (numpy.random.RandomState) – Random number generator.
Example
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I
x = nn.Variable([60,1,28,28]) w = I.NormalInitializer(5e5) b = I.NormalInitializer(0.0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name=’conv’)

class
nnabla.initializer.
UniformInitializer
(lim=(1, 1), rng=None)[source]¶ Bases:
nnabla.initializer.BaseInitializer
Generates a random array from a specified uniform distribution.
\[\mathbf x \sim {\cal U} (a, b)\]Parameters:  lim (
tuple
offloat
) – A tuple of two floats, \((a, b)\).  rng (numpy.random.RandomState) – Random number generator.
Example
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I
x = nn.Variable([60,1,28,28]) w = I.UniformInitializer() # this generates uniform distribution within the defalut range of (1,1) b = I.UniformInitializer((0.5,0.5)) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name=’conv’)
 lim (

nnabla.initializer.
calc_normal_std_he_forward
(inmaps, outmaps, kernel=(1, 1))[source]¶ Calculates the standard deviation proposed by He et al.
\[\sigma = \sqrt{\frac{2}{NK}}\]Parameters: Example
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I
x = nn.Variable([60,1,28,28]) s = I.calc_normal_std_he_forward(x.shape[1],64) w = I.NormalInitializer(s) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name=’conv’)
References

nnabla.initializer.
calc_normal_std_he_backward
(inmaps, outmaps, kernel=(1, 1))[source]¶ Calculates the standard deviation of He et al. (backward case).
\[\sigma = \sqrt{\frac{2}{MK}}\]Parameters: Example
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I
x = nn.Variable([60,1,28,28]) s = I.calc_normal_std_he_backward(x.shape[1],64) w = I.NormalInitializer(s) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name=’conv’)
References

nnabla.initializer.
calc_normal_std_glorot
(inmaps, outmaps, kernel=(1, 1))[source]¶ Calculates the standard deviation proposed by Glorot et al.
\[\sigma = \sqrt{\frac{2}{NK + M}}\]Parameters: Example
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I
x = nn.Variable([60,1,28,28]) s = I.calc_normal_std_glorot(x.shape[1],64) w = I.NormalInitializer(s) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name=’conv’)
References